Mapping hurricane search data from Google Trends!

This is a quick introduction on how to get and visualize google search data with both time and geographical components using the packages gtrendsR, maps and ggplot2. In this example, we will look at search interest for named hurricanes that hit the US mainland starting with ‘Katrina’. We’ll also explore different ways of using colour palettes in ggplot2.

First, we load the required packages. Note that we use devtools to download a developer version of gtrendsR.

if (!require('gtrendsR')) 
devtools::install_github('PMassicotte/gtrendsR')
## Loading required package: gtrendsR

if(!require("pacman")) install.packages("pacman")
## Loading required package: pacman
pacman::p_load(gtrendsR,maps,ggplot2,lettercase,viridis,pals,scico,ggrepel)

Let’s first look at how the impact of hurricanes Katrina (August 2005) and Harvey (August 2017) are reflected in how Americans have used these names as google search items over time. We’ll also add change some of the plot settings by creating our own ggplot theme.

The gprop argument controls whether we want general web, news, image or youtube searches.
The time argument is set to “all” and will gather data between 2004 and the time the code is run.
Focusing on searches made in the us, we’ll set the geo argument to “US”.

NOTE: adding a line for Hurricane Irma used to work just fine, but it currently seems to mess up the output and only show straight lines. Using the plot function returns a ggplot object, which we can then customize. However, I was unable to suppress the default plot.

my_theme <- function() {
    theme_bw() +
    theme(panel.background  = element_blank(),
          plot.background   = element_rect(fill = "seashell"),
          panel.border      = element_blank(),                # facet border
          strip.background  = element_blank(),                # facet title background
          plot.margin       = unit(c(.5, .5, .5, .5), "cm"),
          panel.spacing     = unit(3, "lines"),
          panel.grid.major  = element_blank(),
          panel.grid.minor  = element_blank(),
          legend.background = element_blank(),
          legend.key        = element_blank(),
          legend.title      = element_blank())
  }

hurricanes <- gtrends(c("katrina","harvey"),
                      time  ="all", 
                      gprop = "web", 
                      geo   = c("US"))

plot(hurricanes) + 
  my_theme() +
  geom_line(size=1.5) +
  scale_colour_viridis_d(option = "C", begin = .2, end = .7)

To understand what is actually measured on the y-axis, have a look here:
https://support.google.com/trends/answer/4365533?hl=en

The above plot shows clear spikes around the time when Katrina and Harvey hit the US. We could also plot more cyclical data.

cycles <- gtrends(c("spring break","vacation"),
                  time  = "2008-01-01 2018-01-01",
                  gprop = "web", 
                  geo   = c("US"))

plot(cycles) +
  my_theme() +
  geom_line(size=1.5) +
  scale_colour_viridis_d(option = "C", begin = .2, end = .7)

As you can see in the output below, the gtrends function actually returns a list of data frames with various kinds of data.

hurricanes <- gtrends(c("Katrina","Harvey","Irma"),
                     time  = "all",
                     gprop = "web", 
                     geo   = c("US"))

for(df in hurricanes){
  print(head(df))
  }
##         date hits keyword geo gprop category
## 1 2004-01-01   <1 Katrina  US   web        0
## 2 2004-02-01   <1 Katrina  US   web        0
## 3 2004-03-01   <1 Katrina  US   web        0
## 4 2004-04-01    1 Katrina  US   web        0
## 5 2004-05-01   <1 Katrina  US   web        0
## 6 2004-06-01   <1 Katrina  US   web        0
## NULL
##               location hits keyword geo gprop
## 1            Louisiana  100 Katrina  US   web
## 2          Mississippi   72 Katrina  US   web
## 3             Virginia   48 Katrina  US   web
## 4              Alabama   35 Katrina  US   web
## 5 District of Columbia   35 Katrina  US   web
## 6                Texas   31 Katrina  US   web
##                location hits keyword geo gprop
## 1  Roanoke-Lynchburg VA  100 Katrina  US   web
## 2    Biloxi-Gulfport MS   92 Katrina  US   web
## 3        New Orleans LA   84 Katrina  US   web
## 4        Baton Rouge LA   55 Katrina  US   web
## 5 Hattiesburg-Laurel MS   48 Katrina  US   web
## 6            Jackson MS   43 Katrina  US   web
##      location hits keyword geo gprop
## 1   Brookneal  100 Katrina  US   web
## 2 New Orleans   53 Katrina  US   web
## 3    Leesburg   40 Katrina  US   web
## 4 Baton Rouge   35 Katrina  US   web
## 5     Dearing   NA Katrina  US   web
## 6 Santa Clara   30 Katrina  US   web
## NULL
##   subject related_queries               value geo keyword category
## 1     100             top   katrina hurricane  US Katrina        0
## 2      99             top           hurricane  US Katrina        0
## 3      41             top        katrina kaif  US Katrina        0
## 4      15             top         new orleans  US Katrina        0
## 5      15             top new orleans katrina  US Katrina        0
## 6      12             top      katrina bowden  US Katrina        0

Now, let’s compare the amount of interest in Hurricane Harvey for each US state.

harvey <- gtrends(c("Harvey"), 
                  gprop = "web",
                  time  = "2017-08-18 2017-08-25", 
                  geo   = c("US"))

harvey        <- harvey$interest_by_region
harvey$region <- sapply(harvey$location,tolower) #change to lower case for merging with map data
statesMap     <- map_data("state")
harveyMerged  <- merge(statesMap,harvey,by="region")

On the fourth line in the code above, we fetch an empty map for plotting our geographical data. On the fifth line, we merge our google trends data with the underlying map data. Both data sets contain a column called “region”, which will be used to merge the data frames. Note that the region labels need to be identical. therefore, on line 3, we change the capitalized state names to lowercase. Now we can plot the data! But first we’ll modify our ggplot theme for plotting maps.

my_theme2 <- function() {
  my_theme() +
  theme(axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank())
}

ggplot() +
  geom_polygon(data=harveyMerged,aes(x=long,y=lat,group=group,fill=log(hits)),colour="white") +
  scale_fill_gradientn(colours = rev(scico(15, palette = "tokyo")[2:7])) +
  my_theme2() +
  ggtitle("Google search interest for Hurricane Harvey\nin each state from the week prior to landfall in the US") 

Note that we have plotted the log-transformed hits variable. We then use the same procedure for plotting regional searches for Hurricane Irma, except we add a label for each state, change the colours and add the coord_fixed argument for a nicer looking map. Placing the state names is a bit tricky. I’ve used a simple solution that centers the names. However, some of the smaller eastern state names would overlap when using geom_text(). Luckily, geom_text_repel() from the ggrepel package takes care of this issue.

irma <- gtrends(c("irma"), 
                gprop = "web",
                time  = "2017-09-03 2017-09-10",
                geo   = c("US"))

irma         <- irma$interest_by_region
statesMap    <- map_data("state")
irma$region  <- sapply(irma$location,tolower)
irmaMerged   <- merge(statesMap ,irma,by="region")

regionLabels <- aggregate(cbind(long, lat) ~ region, data=irmaMerged, 
                          FUN=function(x) mean(range(x)))

irmaMerged %>%
  ggplot() +
  geom_polygon(aes(x=long,y=lat,group=group,fill=log(hits)),colour="white") +
  my_theme2() +
  geom_text_repel(data=regionLabels, aes(long, lat, label = str_title_case(region)), size=3) +
  coord_fixed(1.3) +
  ggtitle("Google search interest for Hurricane Irma\nin each state from the week prior to landfall in the US") +
  scale_fill_gradientn(colours = ocean.tempo(15)[2:10])

Lastly, we,’ll plot searches for Hurricane Katrina by state, but focusing on searches between its formation and dissipation.

katrina        <- gtrends(c("katrina"), 
                          gprop = "web", 
                          time="2005-08-23 2005-08-31", 
                          geo = c("US"))

katrina        <- katrina$interest_by_region
statesMap      <- map_data("state")
katrina$region <- sapply(katrina$location,tolower)
katrinaMerged  <- merge(statesMap ,katrina,by="region")

regionLabels   <- aggregate(cbind(long, lat) ~ region, data=katrinaMerged, 
                          FUN=function(x) mean(range(x)))

katrinaMerged %>% ggplot() +
  geom_polygon(aes(x=long,y=lat,group=group,fill=log(hits)),colour="white") +
  scale_fill_continuous(low="ivory",high="midnightblue") +
  guides(fill = "colorbar") +
  geom_text_repel(data=regionLabels, aes(long, lat, label = str_title_case(region)), size=3) +
  my_theme2() +
  coord_fixed(1.3) +
  ggtitle("Google search interest for Hurricane Katrina\nin each state between formation and dissipation")

BONUS: In which US states do people search for “guns” the most?

guns         <- gtrends(c("guns"), gprop = "web", time="all", geo = c("US"))
guns         <- guns$interest_by_region
statesMap    <- map_data("state")
guns$region  <- sapply(guns$location,tolower)
gunsMerged   <- merge(statesMap,guns,by="region")

regionLabels <- aggregate(cbind(long, lat) ~ region, data=gunsMerged, 
                          FUN=function(x) mean(range(x)))

gunsMerged %>% ggplot() +
  geom_polygon(aes(x=long,y=lat,group=group,fill=log(hits)),colour="white") +
  geom_text_repel(data=regionLabels, aes(long, lat, label = str_title_case(region)), size=3) +
  my_theme2() +
  coord_fixed(1.3) +
  scale_fill_distiller(palette = "Reds") +
  ggtitle("Google search interest for guns in each state")

comments powered by Disqus