GeoCoding,R, and The Rolling Stones – Part 2

Posted: March 20, 2013 in GeoCoding XML processing
Tags: , ,

Welcome to Part 2 of the GeoCoding, R, and the Rolling Stones blog. Let’s apply some of the things we learned in Part 1 to a practical real world example.

Mapping the Stones – A Real Example

The Rolling Stones have toured for many years. You can go to Wikipedia and see information on the various tours. Here we focus only on the dates and concerts for the 1975 “Tour of the Americas”. I’ve scraped off the information from the Wikipedia page and put it into a data frame. The idea here is that we will GeoCode each city and obtain a latitude and longitude and then use it to create an interactive map of the tour using the Google Charting Tools.

If you want your own copy of this data frame then do the following:

url = "http://steviep42.bitbucket.org/data/stones75.csv"
stones75 = read.csv(url)

Here are the first 10 rows of the data frame. The format is really simple:

head(stones75,10)
           Date        City         State                Venue
1   1 June 1975 Baton Rouge     Louisiana  LSU Assembly Center
2   3 June 1975 San Antonio         Texas    Convention Center
3   4 June 1975 San Antonio         Texas    Convention Center
4   6 June 1975 Kansas City      Missouri    Arrowhead Stadium
5   8 June 1975   Milwaukee     Wisconsin       County Stadium
6   9 June 1975  Saint Paul     Minnesota         Civic Center
7  11 June 1975      Boston Massachusetts        Boston Garden
8  14 June 1975   Cleveland          Ohio    Municipal Stadium
9  15 June 1975     Buffalo      New York  Memorial Auditorium
10 17 June 1975     Toronto       Ontario   Maple Leaf Gardens

Okay let’s process the cities. Like before we’ll use the sapply command to get back the data after which we’ll use cbind to attach the results to the data frame. We might get some warnings about row names when we do this but don’t worry about it. After all “you can’t always get what you want”.

hold = data.frame(t(sapply(paste(stones75$City,stones75$State,sep=","),myGeo)))
stones75 = cbind(stones75,hold)
 
head(stones75,10)

           Date        City         State                Venue  lat   lon
1   1 June 1975 Baton Rouge     Louisiana  LSU Assembly Center 30.5 -91.1
2   3 June 1975 San Antonio         Texas    Convention Center 29.4 -98.5
3   4 June 1975 San Antonio         Texas    Convention Center 29.4 -98.5
4   6 June 1975 Kansas City      Missouri    Arrowhead Stadium 39.1 -94.6
5   8 June 1975   Milwaukee     Wisconsin       County Stadium 43.0 -87.9
6   9 June 1975  Saint Paul     Minnesota         Civic Center 45.0 -93.1
7  11 June 1975      Boston Massachusetts        Boston Garden 42.4 -71.1
8  14 June 1975   Cleveland          Ohio    Municipal Stadium 41.5 -81.7
9  15 June 1975     Buffalo      New York  Memorial Auditorium 42.9 -78.9
10 17 June 1975     Toronto       Ontario   Maple Leaf Gardens 43.7 -79.4

Great ! So now we have the lat and lon for each city. As you might notice in the data frame the Stones played several nights in the same city so we should probably keep track of this.


stones75[9:18,]

           Date          City        State                 Venue
9  15 June 1975       Buffalo     New York   Memorial Auditorium
10 17 June 1975       Toronto      Ontario    Maple Leaf Gardens
11 22 June 1975 New York City     New York Madison Square Garden
12 23 June 1975 New York City     New York Madison Square Garden
13 24 June 1975 New York City     New York Madison Square Garden
14 25 June 1975 New York City     New York Madison Square Garden
15 26 June 1975 New York City     New York Madison Square Garden
16 27 June 1975 New York City     New York Madison Square Garden
17 29 June 1975  Philadelphia Pennsylvania          The Spectrum
18  1 July 1975         Largo     Maryland        Capital Center

As you can see above, they made a six night stand at the famous Madison Square Garden arena in New York City. Our programming should check for duplicate city names before we bug Google to get information that we already have. But that is left as an assignment for you.

Creating a Map of the Tour Using googleVis

Anyway let’s now build a map of the tour dates. For this example we will use a package called “googleVis”. You might not know that Google has a number of mapping services for which R APIs exist. Look at the table at the end of this section, which lists existing packages for interfacing programmatically with the various Google mapping and chart services. You can find these packages on CRAN. In our case we’ll need to install googleVis. After that we can create a map.

install.packages("googleVis",dependencies=TRUE)
library(googleVis)

The cool thing about the googleVis package is that we get back a map in a web browser that has scroll bars and zoom tools. Additionally we can use information from the data frame to annotate the chart we plan to create. So, for example, for each tour stop that the band made we can put in meta info like the name of the venue they played as well as the date.

We have to do this in a way that accommodates the requirements of googleVis. This means we have to read through the googleVis manual pages and play around with the examples. However, hopefully I’m presenting a pretty good example here so you don’t have to immerse yourself in the manual (at least not yet).

The first thing we need to do is to create a single column for the Latitude and Longitude because goolgeVis wants this. This is easy to do. Let’s take the existing stones75 data frame and change it:

head(stones75)

         Date        City     State                Venue  lat   lon
1 1 June 1975 Baton Rouge Louisiana  LSU Assembly Center 30.5 -91.1
2 3 June 1975 San Antonio     Texas    Convention Center 29.4 -98.5
3 4 June 1975 San Antonio     Texas    Convention Center 29.4 -98.5
4 6 June 1975 Kansas City  Missouri    Arrowhead Stadium 39.1 -94.6
5 8 June 1975   Milwaukee Wisconsin       County Stadium 43.0 -87.9
6 9 June 1975  Saint Paul Minnesota         Civic Center 45.0 -93.1

stones75$LatLon = paste(round(stones75$lat,1),round(stones75$lon,1),sep=":")
stones75 = stones75[,-5:-6]  # Remove the old lat and lon columns

head(stones75)
         Date        City     State                Venue     LatLon
1 1 June 1975 Baton Rouge Louisiana  LSU Assembly Center 30.5:-91.1
2 3 June 1975 San Antonio     Texas    Convention Center 29.4:-98.5
3 4 June 1975 San Antonio     Texas    Convention Center 29.4:-98.5
4 6 June 1975 Kansas City  Missouri    Arrowhead Stadium 39.1:-94.6
5 8 June 1975   Milwaukee Wisconsin       County Stadium   43:-87.9
6 9 June 1975  Saint Paul Minnesota         Civic Center   45:-93.1

Next up we can create a column in our data frame that contains all the information we want to use to annotate each concert date. This can include HTML tags to better format the output. As an example the statement below creates a new column in the data frame called “Tip”, that has the following info: the Stop number on the tour, the Venue where it was held, and the Date of the concert. Once we have a map we can click on the “pin” for each location and see the annotation info.

stones75$Tip = paste(rownames(stones75),stones75$Venue,stones75$Date,"<BR>",sep=" ")

# Now we can create a chart !  

# Click on the Atlanta locator and you'll see that it was the 37th stop of the tour. 
# The show took place at The Omni on July 30th, 1975

stones.plot = gvisMap(stones75,"LatLon","Tip")
plot(stones.plot)

googlemap
Refining the Plot Annotations
Pretty cool huh ? We can also zoom in on different parts of the map. The gvisMap function has a number of options that would allow us to draw a line between the cities, select a different type of map, and adopt certain zoom levels by default. So what else could / should we do ?

Well we have a problem here in that the Stones played more than one show in several cities but we don’t take that into account when we are building the annotation data. What we might want to do is to process the data frame and, for those cities that had multiple shows, (e.g. New York), we can capture all the meta data in one go. We saw this before with the New York dates.

stones75[9:18,]

           Date          City        State                 Venue
9  15 June 1975       Buffalo     New York   Memorial Auditorium
10 17 June 1975       Toronto      Ontario    Maple Leaf Gardens
11 22 June 1975 New York City     New York Madison Square Garden
12 23 June 1975 New York City     New York Madison Square Garden
13 24 June 1975 New York City     New York Madison Square Garden
14 25 June 1975 New York City     New York Madison Square Garden
15 26 June 1975 New York City     New York Madison Square Garden
16 27 June 1975 New York City     New York Madison Square Garden
17 29 June 1975  Philadelphia Pennsylvania          The Spectrum
18  1 July 1975         Largo     Maryland        Capital Center

Currently our plot has only the last New York show information. But we want to have the info for all NYC shows. Here is one way to approach this problem. Note that there are probably more elegant ways to clean up the data but this will do the job for now.

test = stones75     # Create some temporary work variables
str=""
tmpdf = list()
ii = 1

repeat {                  # Loop through the copy of the stones75 data frame 

   hold = test[test$Venue == test[ii,4],"Tip"]
   
   # Do we have a multi-city stand ?
   
      if (length(hold) > 1) {   
        str = paste(hold,collapse="")
        test[ii,6] = str
        tmpdf[[ii]] = test[ii,]
        str=""
        
    # We "jump over" cities that we've already processed 
   
        ii = ii + length(hold)  
     
    # Here we process the "one night stands"        
    
   } else {                       
        tmpdf[[ii]] = test[ii,]
        ii = ii + 1
   }
if (ii > 42) break
}

tmpdf = tmpdf[!sapply(tmpdf,is.null)]    # Remove NULL list elements
stones = do.call(rbind,tmpdf)                # Bind the list back into a data frame

stones.plot = gvisMap(stones,"LatLon","Tip")
plot(stones.plot)

googmap2

Okay. Now depending on your background in R you might think that was a lot of work, (or maybe not). In either case this is fairly typical of what we have to do to clean up and/or consolidate data to get it into a format that is suitable for use with the package we are using. Don’t think that this type of effort is peculiar to googleVis because other packages would require a comparable level of processing also. Welcome to the real world of data manipulation.

Anyway let’s take a look at the new plot. At first cut it seems just like the old one but click on the New York locator and you will now see that all the info for all Madison Square Garden is present. Shows number 11 through 16 took place in NYC.

R packages for interfacing with Google

Here is a table that lists the other R packages that exist to interface with various Google services. Each one of these is worth investigation. Keep in mind that similar accommodations exist for other languages so if you prefer to do your coding in Perl or Python then you could work with the Google APIs also.

PACKAGE DESCRIPTION
googleVis Create web pages with interactive charts based on R data frames
plotGoogleMaps Plot HTML output with Google Maps API and your own data
RgoogleMaps Overlays on Google map tiles in R
animation A Gallery of Animations in Statistics and Utilities
gridSVG Export grid graphics as SVG
SVGAnnotation Tools for Post-Processing SVG Plots Created in R
RSVGTipsDevice An R SVG graphics device with dynamic tips and hyperlink
iWebPlots Interactive web-based plots
Advertisements
Comments
  1. Thanks! This is great, perfect for what I am doing! I would love to hear more about setting up web pages using R, and this is a great start!

  2. st.yoni says:

    Hi, The CSV file is no longer available. Any why you can restore it?

    • Steve says:

      Hi, I’ve updated the link such that it works now. Thanks

      • st.yoni says:

        Thank you! I actually combined this example with the SF restaurants rating in order to have a working data set. Thank you so much for these posts!

      • Steve says:

        Glad to hear it. FYI – The url for the Stones touring data has been changed to “http://steviep42.bitbucket.org/data/stones75.csv” Please update your code accordingly. I’ve changed it in the blog so if you copy/paste from there you should be okay.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s