In my last post I showed how using the package httr
, You can access the RateBeer API to get information about beers made by a brewery. When I left off, I showed a problem - the API only shows 10 beers at a time.
Today, I’m going to show how we can get more beers at once. After that, I’m going to show how we can use the API, rvest
, and purrr
to get beers from all the brewers around me.
Updating the call to the API
Setting the first:
argument
In the last post, I didn’t mention one of the arguments that can be used when making beersByBrewer
query. Besides the argument for the brewerID, we can also use the first
argument to specify how many beers we want to see from a brewer. As you will see below, one of the changes I make to the call to the API includes seting the value for the argument first
to 999
.
Ninety-nine beers on the wall? Let’s make it Nine hundred and ninety-nine.
Turning the call to the API into a function.
As you will see later on, it will be handy to have the call to the API as a function. Below, I declare that function:
get_beers_from_brewer <- function(brewer_id, api_key) {
URL <- "https://api.ratebeer.com/v1/api/graphql"
POST(URL,
body = list(
query = paste0(
"query{
beersByBrewer(brewerId: ", brewer_id, ", first: 999) {
totalCount
items{
name
abv
averageRating
ratingCount
isRetired
style{
name
}
brewer {
id
name
streetAddress
city
state {
name
}
zip
}
}
}
}"),
variables = "{}",
operationName = NULL),
encode = "json", # tells httr to encode the body of the request as json
add_headers("content-type" = "application/json",
"Accept" = "application/json",
"x-api-key" = api_key))
}
Finding Breweries
Okay, so now, I have an easy way to get information about the beers that breweries make. All I need to do now is point the API to the brewer ID of each brewery I want information on.
Remember, the brewer id can be found by looking at the url for that brewery and is in the form:
https://www.ratebeer.com/brewers/<BREWERY_NAME_HERE>/<BREWER_ID_HERE>
The thing is, I like data and want A LOT of it. There’s no way I’m going to hand sort through urls to try to find breweries. Can’t this be automated?
YES!
RateBeer maintains lists of breweries by state. For example, the breweries in Pennsylvania can be found here. Using the package rvest
, we can pull information about all of the breweries in the state, and then use purrr
to iterate over that list, using the function, get_beers_from_brewer()
.
rvest
is a package to make harvesting information from the web easy. Below, you see how, in three steps, I have a list of urls that point to all of the breweries in the state.
library(rvest)
brewery_list_url <- "https://www.ratebeer.com/breweries/pennsylvania/38/213/"
brewery_ids <- read_html(brewery_list_url) %>%
html_nodes("#brewerTable a:nth-child(1)") %>%
html_attr('href')
To explain what happened in the previous code chunk:
read_html()
loads the url for the list of breweris in PA. This is the same thing that happens if you click this link.html_nodes()
searches for every place in the html file we navigated to that has a link to a brewery. The text in the argument for the function points to the css selector for where breweries can be found on the page. I found this selector using selectorgadget. This function gives me a list of the each time that selector shows up.html_attr()
searches that list for an attribute of the specified type. In this case I specified a hyperlink.
As I said, this outputs a list of urls. As an example, the url for Yards Brewing looks like this:
/brewers/yards-brewing-company/166/
Now, I can take this list of urls, and use the function purrr::map_chr()
to get a list of brewer IDs. As a note, I use map_chr
because it flattens the list of IDs into a single character vector. I highly suggest that you check out the rest of the map_*
functions.
Below, I take each item in the list of urls, split each url at any "/"
and then use map_chr()
to select the fourth element, the brewer ID.
brewery_ids <- brewery_ids %>%
strsplit("/") %>%
map_chr(4)
Now I have a list of brewery IDs that I can feed into get_beers_from_brewer()
Getting ALL THE BEERS!
To iterate through the list, we’ll use map()
and some functions from jsonlite
, a package that can parse JSON. Then we’ll use map()
to work through the levels of the response, from it’s highest level (“data”) down to the actual data frame (held in the named object, “items”).
brewery_beer_df <- brewery_ids %>%
map(function(brewer_id){
Sys.sleep(1) # the API restricts to 1 request per second.
get_beers_from_brewer(brewer_id, API_key)
}) %>%
# Here we use jsonlite functions to turn the response of the request into
# json
map(content, type = "text") %>%
# and then turn that into an r object which has dfs in it from each
# brewery.
map(fromJSON, flatten = TRUE) %>%
# working down through the response levels.
map("data") %>% map("beersByBrewer") %>% map_dfr("items")
The object brewery_beer_df
is the data frame with all the beers from breweries we requested.