Quote of the Day
Ladies and gentlemen, this is your captain speaking. We have a small problem. All four engines have stopped. We are doing our damnedest to get them going again. I trust you are not in too much distress.
— Pilot message on British Airways flight 9 from Singapore to Australia as it flew next to an erupting volcano near Indonesia, and ash shut down all four of the 747’s engines.
Introduction
Figure 1: Label of the Most Bitter Beer in the Dataset. (Link)
I gave a seminar last week on the use of Python with Pandas using Jupyter notebooks. When I give a seminar, I always have a worked example to illustrate the points that I am trying to make. This attendance at this seminar had a number of young men in it, so I decided to focus on a topic near and dear to their hearts – beer. I was amazed at the amount of beer knowledge showed by some individuals in the audience. Clearly, beer is a major part of their lives. I myself know NOTHING about beer, so I had to educate myself. The data provide interesting to analyze, and I will probably be augmenting my seminar with additional data over time.
My objectives during this seminar were on how to use Python with Pandas to:
- illustrate how to import beer and brewery data files into a DataFrames
- cleanup the imported data
- illustrate how to merge (i.e. join) tables
- graph the data
- analyze the data using pivot tables
The presentation was well received, and I felt the information would be worth posting here.
I should mention that I have reduced my blogging rate because I have been busy preparing to build a retirement home in Northern Minnesota. I have never contracted a custom home before and the effort has been enormous and all-consuming. I will discuss my construction project in series of posts once I get everything kicked off – no heavy equipment moves in Northern Minnesota until the ground thaws.
For those of you who like to work along, I include my Jupyter notebook, data files, and an Excel version of this analysis here.
Background
Definitions
- Microbrewery
- A brewery that produces small amounts of beer, typically much smaller than large-scale corporate breweries, and is independently owned. Such breweries are generally characterized by their emphasis on quality, flavor and brewing technique. The definition of a small amount of beer varies by state. For example, Missouri limits a microbrewer to no more than 10,000 barrels a year. (Link)
- Craft Brewer
- A term for the developments succeeding the microbrewing movement of the late 20th century. The definition is not entirely consistent but typically applies to relatively small, independently-owned commercial breweries that employ traditional brewing methods and emphasize flavor and quality. The term is usually reserved for breweries established since the 1970s but may be used for older breweries with a similar focus. For tax purposes, craft breweries produce less than two million barrels of beer a year. (Link)
- International Bittering Units (IBU)
- A metric used to approximately quantify the bitterness of beer. An IBU is measured in parts-per-million (ppm) of isohumolone, the main chemical compound derived from hops that makes beer taste bitter. Isohumulone is created when the alpha acids in hops isomerize, or breakdown, in the boil. (Link)
- Alcohol By Volume (ABV)
- ABV is defined as the number of millilitres of pure ethanol present in 100 millilitres (3.4 US fl oz) of solution at 20 °C (68 °F). The number of millilitres of pure ethanol is the mass of the ethanol divided by its density at 20 °C, which is 0.78924 g/ml. The ABV standard is used worldwide. (Link)
- Proof
- Proof is a measure of the content of ethanol (alcohol) in an alcoholic beverage. The term was originally used in the United Kingdom and was equal to about 1.75 times the alcohol by volume (ABV). The UK now uses the ABV standard instead of alcohol proof. In the United States, alcohol proof is defined as twice the percentage of ABV. (Link)
- Isohumulone
- An alpha acid generated when hops breaks down during boiling. (Link)
Analysis
Data Source
All the data for this analysis comes from this web site. I must applaud the author of this web site for the excellent illustration he provided on the work involved in scraping data from a complex web page. I do much web scraping myself, and his description of how to gather the data is the best I have seen.
Breweries with the Most Labels
As I looked at the data, I noticed that some breweries had an enormous number of beer names in their portfolio. So I generated Figure 1, which is a table of top ten breweries by number of beer names (also known as labels).
Figure 2: Craft Breweries with the Most Labels.
Beers By Alcohol Content (ABV)
The alcohol content of beer comes up occasionally and this data set allowed me to look at the distribution of alcohol content by volume (i.e. ABV). First, I looked at the top ten beers by ABV (Figure 3). Note that many beers had ABV values of 9.9%, so the beers you see in the table simply sorted higher than the others.
Figure 3: Top 10 Craft Beers By ABV.
The beers in Figure 3 are actually outliers. The bulk of craft beer ABV values are in the 5% range (Figure 4).
Figure 4: ABV Histogram.
My preferred way to view empirical distributions is with a violin plot, which I show for ABV values in Figure 5. The violin plot also allows you to show quartiles, which are shown as dashed lines.
Figure 5: ABV Violin Plot.
Cities With The Most Craft Breweries
Figure 6 shows the top ten cities by craft brewery number. I was not surprised at all to see Portland OR and Bend OR on the list.
Figure 6: Cities with Most Craft Breweries.
Top 10 Beers By Bitterness (IBUs)
Figure 7 shows the top ten beers by bitterness. After I presented this table, staff members started talking about how they could obtain a small stock of these beers. I am floored that there are people who seek out bitter beer.
Figure 7: Top Ten Beers By IBU.
Conclusion
The reviews of the seminar were good! My team uses Python for its automation language, but has not been using Pandas or Jupyter notebooks – they still use Excel to present their test data. I am hoping that my presentation motivates them to consider Jupyter notebooks and Panda so that all their work can be done without switching software.
I added Figure 8 because I found it interesting that some smaller brews are becoming commonly available in many states. For example, Summit products are available everywhere in Minnesota.
Figure 8: Beers Most Likely on a Menu By State. (Link)