Mapping hurricane search data from Google Trends!

This is a quick introduction on how to get and visualize google search data with both time and geographical components using the packages gtrendsR, maps and ggplot2. In this example, we will look at search interest for named hurricanes that hit the US mainland starting with ‘Katrina’. We’ll also explore different ways of using colour palettes in ggplot2.

First, we load the required packages. Note that we use devtools to download a developer version of gtrendsR.

if (!require('gtrendsR')) 
devtools::install_github('PMassicotte/gtrendsR')
## Indlæser krævet pakke: gtrendsR

if(!require("pacman")) install.packages("pacman")
## Indlæser krævet pakke: pacman
pacman::p_load(gtrendsR,maps,ggplot2,lettercase,viridis,pals,scico,ggrepel)

Let’s first look at how the impact of hurricanes Katrina (August 2005) and Harvey (August 2017) are reflected in how Americans have used these names as google search items over time. We’ll also add change some of the plot settings by creating our own ggplot theme.

The gprop argument controls whether we want general web, news, image or youtube searches.
The time argument is set to “all” and will gather data between 2004 and the time the code is run.
Focusing on searches made in the us, we’ll set the geo argument to “US”.

NOTE: adding a line for Hurricane Irma used to work just fine, but it currently seems to mess up the output and only show straight lines. Using the plot function returns a ggplot object, which we can then customize. However, I was unable to suppress the default plot.

my_theme <- function() {
    theme_bw() +
    theme(panel.background = element_blank()) +
    theme(plot.background = element_rect(fill = "seashell")) +
    theme(panel.border = element_blank()) +                     # facet border
    theme(strip.background = element_blank()) +                 # facet title background
    theme(plot.margin = unit(c(.5, .5, .5, .5), "cm")) +
    theme(panel.spacing = unit(3, "lines")) +
    theme(panel.grid.major = element_blank()) +
    theme(panel.grid.minor = element_blank()) +
    theme(legend.background = element_blank()) +
    theme(legend.key = element_blank()) +
    theme(legend.title = element_blank())
  }

hurricanes = gtrends(c("katrina","harvey"),time="all", gprop = "web", geo = c("US"))

plot(hurricanes) + 
  my_theme() +
  geom_line(size=1.5) +
  scale_colour_viridis_d(option = "C", begin = .2, end = .7)

To understand what is actually measured on the y-axis, have a look here:
https://support.google.com/trends/answer/4365533?hl=en

The above plot shows clear spikes around the time when Katrina and Harvey hit the US. We could also plot more cyclical data.

cycles = gtrends(c("spring break","vacation"),time="2008-01-01 2018-01-01", gprop = "web", geo = c("US"))

plot(cycles) +
  my_theme() +
  geom_line(size=1.5) +
  scale_colour_viridis_d(option = "C", begin = .2, end = .7)

As you can see in the output below, the gtrends function actually returns a list of data frames with various kinds of data.

hurricanes = gtrends(c("Katrina","Harvey","Irma"),time="all", gprop = "web", geo = c("US"))

for(df in hurricanes){
  print(head(df))
  }
##         date hits keyword geo gprop category
## 1 2004-01-01   <1 Katrina  US   web        0
## 2 2004-02-01   <1 Katrina  US   web        0
## 3 2004-03-01   <1 Katrina  US   web        0
## 4 2004-04-01    1 Katrina  US   web        0
## 5 2004-05-01   <1 Katrina  US   web        0
## 6 2004-06-01   <1 Katrina  US   web        0
## NULL
##               location hits keyword geo gprop
## 1            Louisiana  100 Katrina  US   web
## 2          Mississippi   73 Katrina  US   web
## 3             Virginia   50 Katrina  US   web
## 4 District of Columbia   37 Katrina  US   web
## 5              Alabama   37 Katrina  US   web
## 6                Texas   31 Katrina  US   web
##                location hits keyword geo gprop
## 1  Roanoke-Lynchburg VA  100 Katrina  US   web
## 2    Biloxi-Gulfport MS   86 Katrina  US   web
## 3        New Orleans LA   82 Katrina  US   web
## 4        Baton Rouge LA   54 Katrina  US   web
## 5            Jackson MS   46 Katrina  US   web
## 6 Hattiesburg-Laurel MS   45 Katrina  US   web
##      location hits keyword geo gprop
## 1   Brookneal  100 Katrina  US   web
## 2 New Orleans   56 Katrina  US   web
## 3    Leesburg   43 Katrina  US   web
## 4 Baton Rouge   37 Katrina  US   web
## 5     Dearing   NA Katrina  US   web
## 6 Santa Clara   32 Katrina  US   web
## NULL
##   subject related_queries               value geo keyword category
## 1     100             top   katrina hurricane  US Katrina        0
## 2      99             top           hurricane  US Katrina        0
## 3      41             top        katrina kaif  US Katrina        0
## 4      16             top new orleans katrina  US Katrina        0
## 5      15             top         new orleans  US Katrina        0
## 6      12             top      katrina bowden  US Katrina        0

Now, let’s compare the amount of interest in Hurricane Harvey for each US state.

harvey = gtrends(c("Harvey"), gprop = "web",time="2017-08-18 2017-08-25", geo = c("US"))
harvey = harvey$interest_by_region
harvey$region = sapply(harvey$location,tolower) #change to lower case for merging with map data
statesMap = map_data("state")
harveyMerged = merge(statesMap,harvey,by="region")

On the fourth line in the code above, we fetch an empty map for plotting our geographical data. On the fifth line, we merge our google trends data with the underlying map data. Both data sets contain a column called “region”, which will be used to merge the data frames. Note that the region labels need to be identical. therefore, on line 3, we change the capitalized state names to lowercase. Now we can plot the data! But first we’ll modify our ggplot theme for plotting maps.

my_theme2 = function() {
  my_theme() +
  theme(axis.title = element_blank()) +
  theme(axis.text = element_blank()) +
  theme(axis.ticks = element_blank())
}

ggplot() +
  geom_polygon(data=harveyMerged,aes(x=long,y=lat,group=group,fill=log(hits)),colour="white") +
  scale_fill_gradientn(colours = rev(scico(15, palette = "tokyo")[2:7])) +
  my_theme2() +
  ggtitle("Google search interest for Hurricane Harvey\nin each state from the week prior to landfall in the US") 

Note that we have plotted the log-transformed hits variable. We then use the same procedure for plotting regional searches for Hurricane Irma, except we add a label for each state, change the colours and add the coord_fixed argument for a nicer looking map. Placing the state names is a bit tricky. I’ve used a simple solution that centers the names. However, some of the smaller eastern state names would overlap when using geom_text(). Luckily, geom_text_repel() from the ggrepel package takes care of this issue.

irma = gtrends(c("irma"), gprop = "web", time="2017-09-03 2017-09-10",geo = c("US"))
irma = irma$interest_by_region
statesMap = map_data("state")
irma$region = sapply(irma$location,tolower)
irmaMerged = merge(statesMap ,irma,by="region")

regionLabels <- aggregate(cbind(long, lat) ~ region, data=irmaMerged, 
                          FUN=function(x) mean(range(x)))

irmaMerged %>%
  ggplot() +
  geom_polygon(aes(x=long,y=lat,group=group,fill=log(hits)),colour="white") +
  my_theme2() +
  geom_text_repel(data=regionLabels, aes(long, lat, label = str_title_case(region)), size=3) +
  coord_fixed(1.3) +
  ggtitle("Google search interest for Hurricane Irma\nin each state from the week prior to landfall in the US") +
  scale_fill_gradientn(colours = ocean.tempo(15)[2:10])

Lastly, we,’ll plot searches for Hurricane Katrina by state, but focusing on searches between its formation and dissipation.

katrina = gtrends(c("katrina"), gprop = "web", time="2005-08-23 2005-08-31", geo = c("US"))
katrina = katrina$interest_by_region
statesMap = map_data("state")
katrina$region = sapply(katrina$location,tolower)
katrinaMerged = merge(statesMap ,katrina,by="region")

regionLabels <- aggregate(cbind(long, lat) ~ region, data=katrinaMerged, 
                          FUN=function(x) mean(range(x)))

katrinaMerged %>% ggplot() +
  geom_polygon(aes(x=long,y=lat,group=group,fill=log(hits)),colour="white") +
  scale_fill_continuous(low="ivory",high="midnightblue") +
  guides(fill = "colorbar") +
  geom_text_repel(data=regionLabels, aes(long, lat, label = str_title_case(region)), size=3) +
  my_theme2() +
  coord_fixed(1.3) +
  ggtitle("Google search interest for Hurricane Katrina\nin each state between formation and dissipation")

BONUS: In which US states do people search for “guns” the most?

guns = gtrends(c("guns"), gprop = "web", time="all", geo = c("US"))
guns = guns$interest_by_region
statesMap = map_data("state")
guns$region = sapply(guns$location,tolower)
gunsMerged = merge(statesMap,guns,by="region")

regionLabels <- aggregate(cbind(long, lat) ~ region, data=gunsMerged, 
                          FUN=function(x) mean(range(x)))

gunsMerged %>% ggplot() +
  geom_polygon(aes(x=long,y=lat,group=group,fill=log(hits)),colour="white") +
  geom_text_repel(data=regionLabels, aes(long, lat, label = str_title_case(region)), size=3) +
  my_theme2() +
  coord_fixed(1.3) +
  scale_fill_distiller(palette = "Reds") +
  ggtitle("Google search interest for guns in each state")

Next
Previous
comments powered by Disqus