Torpedo Engine Technology for a Venus Space Probe?


Quote of the Day

Nothing great in the world has been accomplished without passion.

Georg Wilhelm Friedrich Hegel, philosopher. I know people who have gone their entire lives without finding a passion. I am fortunate that I found mine early on.

Figure 1: Current US Navy Torpedoes.

Figure 1: Current US Navy Torpedoes. (Source)

I just finished reading an interesting article on a NASA proposal for a Venus space probe that uses power generation technology developed for a US Navy torpedo program back in the 1980s. Like many spacecraft, torpedoes need power generation systems that are small, generate massive power for a short period, and must be storable for years with the ability to turn on almost instantly with high reliability.

Back in the 1980s, the US Navy faced the threat of deep diving, high-speed Alfa-class submarines operated by the Soviet Navy. Standard torpedo power systems at that time either used seawater batteries (eg. Royal Navy Sting Ray) or Otto-fuel driven internal combustion engines (e.g Torpedo MK 48). Both of these technologies have their advantages and disadvantages.

  • Seawater Battery
    • Advantages: electrical propulsion is quiet, very reliable, no exhaust to remove, storable long-term
    • Disadvantages: low energy density
  • Otto-Fuel
    • Advantages: high energy density, storable long-term
    • Disadvantages: noisy, exhaust must be removed

The need to remove engine exhaust is a problem for a deep-diving torpedo. As the torpedo goes deeper, the engine must work against the back pressure of the ocean in order to remove the exhaust. This means that an Otto-fueled torpedo goes slower as it goes deeper. This creates an intrinsic limit on the operating depth of these torpedoes. The Alfa-class submarine could dive so deep that it was possible for it to go below the operating depth of many torpedoes.

The US Navy's solution was developed by the University of Pennsylvania in the form of the Stored Chemical Energy Propulsion System (SCEPS). The SCEPS engine uses the chemical reaction between lithium (fuel) and sulfur hexafluoride (oxidizer) to generate heat that can drive a boiler and turbine (Equation 1).

Eq. 1 \displaystyle 8\ \text{Li}\ \text{+}\ \text{S}{{\text{F}}_{6}}\to \text{6}\ \text{LiF}\ \text{+}\ \text{L}{{\text{i}}_{\text{2}}}\text{S}\ \text{+}\ \text{heat}

A key advantage of this chemical reaction is that its reaction products occupy less volume than the input fuel and oxidizer.

Historically, NASA would have used plutonium-238 thermelectric batteries for its space applications. However, the US has lost its plutonium-238 production capacity (article, article) and is now going through multi-year process to rebuild that capacity. While this rebuilding is going on, the SCEPS engine may provide a useful alternative power source.

This is not the first time torpedo technology has been used in spacecraft. For example, a friend of mine managed a thermal battery design group that built the batteries for the Galileo Probe that descended into Jupiter's atmosphere. Very similar batteries have been used in torpedoes for decades because they can be stored for years, generate tremendous power for a short period of time, and are very reliable.

I should mention that the SCEPS engine ended up not being deployed on any active US Navy torpedo because of issues that are not relevant to a space deployment.

For those of you interested in the history of US torpedo development, Figure 2 is excellent (large photo, ~6 MB).

Figure 2: Pictorial History of US Torpedo Development.



Posted in Naval History, Space | Leave a comment

One Drink Per Hour Can Get You Drunk


Quote of the Day

Your time is limited, so don't waste it living someone else's life.

— Steve Jobs

Figure 1: Example BAC Chart. (Source)

Figure 1: Example BAC Chart. (Source)

Back in 2011, I wrote a blog post that goes into the details on how the Blood Alcohol Content (BAC) versus drinks/time/gender are computed (Figure 1). These charts tell people how much they can drink and still stay under the BAC limits for driving under the influence. These charts are for a typical drinker and the actual BAC value will vary by person.

A reader, Megan Sweeny, wrote a very interesting comment to my original post that included a link to a  Google sheet she created that applied the Widmark formula to a person that drinks at a fixed rate to investigate the validity of the following rule of thumb:

Your BAC level will remain within safe limits if you consume only one standard drink per hour.

The basis of most BAC charts is the Widmark formula. Megan used the Widmark formula to test the rule of the thumb mathematically. Her approach was to:

  • Assume the person takes a drink at the top to the hour (1:00, 2:00, 3:00, etc).
  • Compute the BAC one hour after consumption.
  • Draw charts to illustrate how the drinker's BAC varies with time.

Megan's model show that a 150 lb male following the rule of thumb eventually will exceed the legal limit. Since I use this blog as a repository for my Excel work, I have made an Excel version of Megan's work that you can access here.

From my standpoint, no one should ever drive after drinking. However, this was an excellent use of spreadsheet to model a common rule-of-thumb.

Nice work Megan.




Posted in General Mathematics, Health | 3 Comments

Countries That Are Graying Fastest


Quote of the Day

To see what is in front of one’s nose needs a constant struggle.

— George Orwell

Figure 1: UN Projections on Median Country Ages By 2050.

Figure 1: UN Projections on Median Country
Ages By 2050. (Source)

Aging is something that many US firms are needing to address because of the tidal wave of baby boomer retirements (blog post). As a director in a high-tech firm, I have been working hard to address the need for passing knowledge from workers near retirement age to younger workers (blog post). However, aging is not just a problem for companies as the median age of the world is increasing. I have been reading quite a few articles lately on the effect of aging populations on quite a few developed nations: South Korea, China, Germany, and Italy. These nations are facing the problems associated with aging populations: high medical expenses, providing elder care, paying pensions, etc.

Figure 1 shows data from the UN on how they project how the median ages will change by 2050 for some selected countries. This chart made me curious as to how the median age is changing per year in some countries form which I buy components. I grabbed median age data from the CIA World Factbook for the years from 2004 to 2016 (biannual data), did a least‑squared fit on the data to determine the rate of aging, and filtered the data for the countries in which I was interested.

Figure 2 shows my selection of developed nations ordered by their rates of aging. One country that is is important to the optics business is South Korea, which is the fastest aging developed country in the world. The sparklines show the data for which the slopes were calculated.

Figure 2: Countries on My List Ranked by Rates of Aging.

Figure 2: Countries on My List Ranked by Rates of Aging.

Not all countries are aging. Figure 3 shows the top ten countries in terms of growing younger.

Figure 3: Countries with Declining Median Ages.

Figure 3: Countries with Declining Median Ages.

For those who are interested, my workbook and data files are here. No macros here – just Excel and Power Query.







Posted in History Through Spreadsheets | Leave a comment

Old Naval Mines Still Floating Around


Quote of the Day

There is greatness hidden in every talent, career, or skill be it photography, designing, art, music, and many more. But it can only be realized by people who decide on a certain day at a certain time to take their passions further.

Israelmore Ayivor

Figure 1: Old Training Mine Washes Up on North Carolina Beach.

Figure 1: Old Training Mine Washes Up on North Carolina Beach. (Source)

I just read a news article about an old moored training mine washing up on a North Carolina beach (Figure 1). I am amazed at the number of mines that still wash up on beaches around the world. Figure 1 shows the mine that washed up on a North Carolina beach recently. Stories like this seem to happen frequently after severe storms. Fortunately, this training unit (i.e. inert) did not pose a hazard to the people who came upon it, but some explosive-laden mines still wash up on beaches (example).

Figure 2: Moored Mines As Portrayed in WW2 Movies.

Figure 2: Moored Mines As Portrayed in the Movie Destination Tokyo.  (Source)

This particular mine is similar to the type often portrayed in WW2 submarine movies (Figure 2). It looks like a floating metal sphere covered in "horns" that is moored to the ocean bottom. The horns are called Hertz horns (sometimes spelled Herz) and actually are a form of electric battery. When the horns are bent, such as when a ship collides with the mine, a vial containing sulfuric acid breaks, releasing the acid into a dry lead-acid battery (Figure 3). With the battery now filled with electrolyte, it begins generating a voltage which can be used to fire a detonator.

Figure 3: Schematic of a Hertz Horn.

Figure 3: Schematic of a Hertz Horn. (Source)

This type of detonator can remain functional for decades. The Hertz horns were invented in 1868, which give you an idea of how old naval mine technology is. It is estimated that 235,000 sea mines were laid during WW2 and some missed being cleared. While I consider this type of mine obsolete, they are still commonly used. Modern mines are much more capable, in some cases even being able to fire torpedoes (e.g US Navy Captor). They have onboard sonar systems that can track ships as they approach.

The mines washing up on beaches have been floating free in the ocean for various reasons. During WW2, minesweepers used paravanes (i.e. towed underwater gliders) to cut the mooring cables of the mines. With their mooring lines cut, the mines would float to the surface and a sailor would shoot a rifle at the mine. Their plan was for the bullets to either hit the Hertz horn and detonate the mine, or to put so many bullet holes in the mine that it would sink. Some floating mines were missed during this process and simply floated off to become a hazard later on. Other mines were missed and never had their mooring lines cut – eventually the action of wind, waves, and corrosion would break the mine free. Also, these old mines have been laid as part of other conflicts, like the Korean War, and in the Persian Gulf to impede oil shipments. These mines have also ended up floating in the ocean.

Naval weapons are very powerful and even old ones pose a serious safety threat because explosives become unstable with age. One of these mines washed up on a beach and was mistaken for an old buoy– children were playing by it. It makes me shudder just thinking of it (Figure 4). Part of my concern for these kids is driven by a close call of my own involving an experimental torpedo. It is the only time in my life that I hoped my wife would be able to find my dental records.

Figure 4: Unidentified Sea Mine Near Children.

Figure 4: Unidentified Sea Mine Near Children. (Source)

Posted in Military History | Leave a comment

WW2 Tank Production Comparison Between Combatants


Quote of the Day

At least you figured it out. My first two husbands never did.

— Female employee told me this after I mentioned that my wife's nickname for me was "The Paycheck" – this nickname was a joke between my wife and me.


Figure 1: Soviet T34/85 Medium Tank.

Figure 1: Soviet T34/85 Medium Tank.
The capabilities of this tank and its predecessor
(T34/76) shocked the Panzers.  (Source)

I occasionally watch WW2 documentaries on television, and one of these programs provided a bit of history on the Battle of Kursk. This battle may have been the greatest clash of armored forces in history. The coverage highlighted the role of medium tanks, like the Soviet T34 (Figure 1) and the German Panzer Mk V (aka Panther [Figure 2]), and heavy tanks, like the German Panzer Mk VI (aka Tiger I). One of the themes of the documentary was the massive size of Soviet armored forces compared to their German opponents. This discussion piqued my curiosity about the number of tanks produced by the major combatants.

I performed a quick web search and soon found that the Wikipedia has data for the Soviet, German, and American armor production in an odd, but usable form. The articles had all the data I wanted but in separate tables that would need to be cleaned up separately, consolidated, and then turned into pivot tables. I very much wanted to include UK tank data, but what I found had a weird format and decided to skip including it.


Points of Interest

Figure 2: Panzer_Mk_V_Panther

Figure 2: German Panzer Mk V –
Panther Tank. Many analysts consider the Panther
the finest overall tank design of WW2. (Source)

I am interested in three things:

  • Total tank production by all parties separately.
  • Total tank production of Allies versus Germans.
  • Weight mix of tanks produced (light, medium, and heavy).

My interest in the mix of tank weights is driven by the first-person accounts I have read from US veterans that show great concern about encountering heavy tanks, like Tiger Is and Tiger IIs. In the case of the US, the armor philosophy was focused on medium tanks, specifically the M4 Sherman. The lone US heavy tank, the M26 Pershing, was not deployed until late in the war. This decision is a major topic of discussion even today with some first-person accounts referring to the Sherman tank as a deathtrap. Opposing arguments are that the Sherman was not that bad (particularly the Easy 8 and Firefly variants) and that quantity has a quality all its own. There is some empirical support for this position.

  • The Sherman Easy 8 performed well against the Soviet T34/85 in Korea. The M4 Sherman Easy 8 and the T34/85 were the main US and Soviet tanks (respectively) during the latter stages of WW2.
  • Shermans performed well at the Battle of Arracourt against German forces with a large number of Panthers. Historians have argued that this victory shows that well-trained crews in a Sherman could effectively deal with Panthers.
  • The Soviets used a large number of Shermans and all reports I have seen indicate that they praised their performance (reference) – though one soldier said they were more suited to "colonial" campaigns than all-out war (reference) .

I have to mention that making direct comparisons of armor effectiveness is difficult because  training and operational tactics are critical to achieving good field performance.

Tabular Presentation

Using Excel and Power Query, I grabbed data from three Wikipedia pages: US armor production, Soviet armor production, and German armor production. The data includes both tanks and self-propelled guns. I consolidated the data into a single pivot table (Figure 3).

Figure 3: German, Soviet, and US Armor Production in WW2.

Figure 3: German, Soviet, and US Armor Production in WW2.

While the table contains all the data available, it is a difficult format for people to interpret quickly. We need to find a graphical way to present the data.

Graphical Presentation

Figure 4 shows the relative armor production and mix of the German versus the Allies. The Allies produced more than four times more armor than the Germans did. Also note the amount of heavy armor produced by the Allies.

Figure 3: German vs Allied Armor Production Numbers.

Figure 4: German vs Allied Armor Production Numbers.

Figure 5 shows how US and Soviet production compared. The Soviets outproduced the US in total numbers, particularly in heavy armor.

Figure 4: Soviet vs US Armor Production.

Figure 5: Soviet vs US Armor Production.


Here is how I interpret this data:

  • The US and Soviets outproduced the Germans in tanks by more than a factor of 4.
  • The Soviets outproduced the US in armored vehicles by almost 20%.
  • The Soviets produced significantly more heavy tanks than both the US and Germany.
  • While quite a bit has been written about the German Tiger tanks, relatively few of them were produced – less than 2000.

For those who are interested, my workbook is here.


The Allied strategy of using massive numbers of medium quality tanks proved effective against the German army. As Stalin used to say, "Quantity has a quality all its own."

I should mention that much has been written about the shortcomings of the tank guns of the US and UK when engaging Tigers and Panthers – particularly with respect to the M4 Sherman's 75 mm gun. The Allies did have some tank-mounted guns that were effective. For example, The British 17 pdr was superb, especially when used with APDS ammunition. However, this ammunition came rather late and was in short supply during WW2.

Figure 5: Tiger Killed By Sherman Firefly. (Source)

Figure 6: Tiger Killed By Sherman Firefly, which carried a 17 pdr
(76.2 mm) anti-tank gun that fired an APDS projectile. This Tiger
was commanded by Michael Wittmann, a Panzer ace. (Source)

Figure 7 shows a 17 pdr anti-tank round embedded in Tiger I armor.

Figure 7: 17 pdr Anti-Tank Round Embedded in Tiger I Armor.

Figure 7: 17 pdr Anti-Tank Round Embedded in Tiger I Armor. (Source)

Doctrine was an issue for the US Army because they focused on using tanks to support infantry and tank destroyers were to be used against tanks. The M4 Sherman was excellent in an infantry support role, and the Tank Destroyer M36's 90 mm gun was capable of stopping German heavy armor. However, the tank destroyer concept overall was a failure as German tanks had a habit of attacking where you did not expect – many M4 Sherman's ended up battling German heavy tanks that they were not designed to fight. The US did eventually mount a high-velocity 76 mm guns in the Sherman, which improved the situation, but this came to late for many crews.


In Figure 8, I show some data on which US manufacturers produced the M4 Sherman during WW2 (Source). Chrysler and GM were the dominant producers, but the railroad equipment manufacturers also contributed significantly.

Figure 7: M4 Sherman Manufacturers During WW2.

Figure 8: M4 Sherman Manufacturers During WW2.

I should mention that the total number of M4's shown in Figure 7 is different by ~400 units than is shown in my spreadsheet. There are two reasons: (1) Figure 7 includes 188 units produced in Canada, and (2) there seems to be small discrepancies between different WW2 databases for which I can find no reason.

Posted in Excel, History Through Spreadsheets, Military History | Leave a comment

Retired Hurricane Name Statistics


Quote of the Day

The cost of a thing is the amount of what I will call life which is required to be exchanged for it, immediately or in the long run.

— Henry David Thoreau, Walden, Chapter 1: Economy

Figure 1: Retired Atlantic Hurricane Names.

Figure 1: Retired Atlantic Hurricane Names.

I was reading an article that stated that names of Hurricane Harvey and Irma could be the eighth pair of back-to-back hurricane whose names have been retired since the 1954. I have never thought about which hurricane names have been retired, so I started to search around the web. As usual, the Wikipedia provided an excellent data source.

The modern hurricane naming system began in 1954 and the first name retirements occurred in 1955. Figure 1 shows that 83 Atlantic hurricane names have been retired as of the 2016. There are some famous names on this list. These names have a visceral response on some people. For example, my father-in-law used to shudder whenever he heard the name Camille.

You can see the kind of destruction that these storm cause by looking at the 10 retired hurricane names with the greatest loss of life (Figure 2a) and repair costs (Figure 2b).

Figure 2(a): Ten Retired Hurricane Names with the Greatest Loss of Life.
Figure 2(b): Ten Retired Hurricane Names with the Greatest Monetary Losses.

It also is interesting to look at the frequency with which hurricane names are retired. In Figure 3, I show the list down into 5 year periods that display the total number of names retired and total damage caused. You can see that the period from 1995 through 2009 was rough.

Figure 3: Hurricane Names Retired, Total Damage, and Loss of Life in Five-Year Periods.

For those who are interested, my workbook is available here.

Posted in History Through Spreadsheets | Leave a comment

Examples of Lake Clarity Variation Over Time


Quote of the Day

Put down everything that comes into your head and then you’re a writer. But an author is one who can judge his own stuff’s worth, without pity, and destroy most of it.

— Sidonie-Gabrielle Colett


Figure 1: Pattern on Freshwater Secchi Disk.

Figure 1: Pattern on Freshwater Secchi Disk. (Source)

I am having my cabin built on a small lake in northern Minnesota. At the same time that the cabin is being built, a friend is in the process of locating an an existing cabin for purchase on a nearby lake – there are dozens of lakes within a few miles of my building site. He has been asking questions about the clarity of the water in these lakes. Fortunately, the state of Minnesota has an excellent web page with all sorts of technical data on lake water, including clarity measurements. Professional lake monitors are also used. On a regular basis, they gather technical information on the lakes: chemistry, fish populations, presence of invasive species, etc. The lake water clarity data historically has been measured using a Secchi disk (Figure 1) and volunteer lake monitors. In recent years, satellites have been tasked with clarity monitoring as well. So there are now two sources of lake clarity information that can be used to cross-check one another.

Lake clarity is important because it gives you an idea as to the health of the lake, and the lake's health has a major influence on the value of your property. For me, a healthy lake supports a significant fish population, has no algae blooms, and has no significant weed issues, etc. However, low clarity does not necessarily mean an unhealthy lake. For example, a bog is the source of water for my lake. This gives the water a tea-color (referred to as "bog stain"), which reduces its clarity. However, the lake itself is quite healthy. In summary, clarity is just one of a number of useful measures of lake health.

In this post, I will be focusing on the Secchi disk data gathered by the volunteer lake monitors over the years on three lakes in Itasca County. The data acquisition process is straightforward:

  • lower the Secchi disk into the water.
  • record the depth at which you lose sight of the disk.
  • send the data to the Minnesota Department of Natural Resources (MnDNR).
  • For each lake, the MnDNR generates a CSV file containing data from both their volunteer program and their professional lake monitors. The file can be downloaded via the Internet, which is how I have configured Excel.

Lake clarity is just one parameter that volunteers gather. Another important parameter is lake level. It turns out that lake level and clarity are somewhat related. If you have have a substantial amount of rain (i.e. very clean water), the lake clarity will improve and your lake level will also rise. One issue with the volunteer program is that there sometimes are gaps in the data because volunteers are not always reliable data gatherers. Overall, I have to say the data gathering program is a great success.

For those who are interested, my workbook is available here.


My friend is strongly considering the purchase of a cabin on Johnson Lake, and I want to help him gather information for making an intelligent choice. So I grabbed the CSV files for Johnson Lake and two other nearby lakes, Eagle and Ruby, so he would have a basis for comparison.

Here is how I view these lakes.

  • Ruby Lake

    Ruby is sometimes called a "gem" of a lake because of its water quality and the small number of cabins on the lake. When I first saw Ruby Lake 20 years ago, I was very impressed with its clarity – you could clearly see the bottom. It is also has limited boat traffic because there are not many cabins on the lake. One reasons for the small number of cabins is because Ruby Lakes abutts a national forest, which means that part of the lake will never have cabins on it.

  • Johnson Lake

    Johnson is a good general use lake. The lake is near a major county road,which means access is easy in both winter and summer. Of course, this has the downside of being somewhat noisy. It has a good-sized gamefish population for those who like to fish and is large enough for those who like watersports, like waterskiing.

  • Eagle Lake

    Eagle is a small lake that is bog-stained. One side of the lake can be accessed by a major county road (i.e. easy access) and the other side of the lake has relatively poor access. It is not considered a major fishing lake, though kids love to fish on it because it has large numbers of sunfish and crappies. However, this is nice for those of us who do not fish and like our lakes quiet (i.e. me). It is large enough for watersports, but most folks just cruise around on their pontoon boats. There are relatively few cabins on the lake because much of the shoreline is owned by a few families.

Figure 2 shows the Secchi disk data for all three lakes. We can make several observations about this data:

  • Johnson Lake (green rectangles in Figure 2) has had very stable lake clarity for a long period of time. This indicates that the lake's health is stable.
  • Ruby Lake (blue triangles in Figure 2) had very good clarity back in the early '90s. No one gathered Secchi data during the early 2000s, but recent data shows that the clarity has declined down to the level of Johnson Lake. I would like to know more about why this change has occurred.
  • Eagle Lake (red circles in Figure 2) has very limited data. The data we have shows that the clarity can be low but is highly variable. This data seems to correlate with Eagle's Lakes level. Eagle has a variable level because its outflow is regularly blocked by beaver dams.
Figure 2: Excel Chart of Clarity of Three Lakes Over Time.

Figure 2: Excel Chart of Clarity of Three Lakes Over Time.


I am not sure if my friend will buy a cabin on Johnson Lake. The data shows that the lake is stable and not undergoing any stress.

Just for fun, I also plotted the data using ggplot2 (Figure 3).

Figure 3: Same Data as Figure 2 Plotted Using ggplot2 and Default Settings.

Figure 3: Same Data as Figure 2 Plotted Using ggplot2 and Default Settings.

Posted in Excel, General Science | Leave a comment

Hurricanes and Typhoons Ranked By Air Pressure


Quote of the Day

Whenever you are about to find fault with someone, ask yourself the following question: What fault of mine most nearly resembles the one I am about to criticize?"

Marcus Aurelius, Meditations

`Figure 1: Wind Speed and Air Pressure Distribution in a Hurricane or Typhoon. (Source)

Figure 1: Wind Speed and Air Pressure Distribution in a Hurricane or Typhoon. (Source)

I was watching the weather reports on Hurricane Irma and the discussions on how powerful it is. The most cited metrics for hurricanes and typhoons appears to be wind speed and eye barometric pressure (see Figure 1). I decided to look around for hurricane and typhoon strength data and the Wikipedia turned out to have a page containing large number of tables for all the most intense typhoons and hurricanes in different regions of the world. I used Power Query to (1) import the tables, (2) clean them up, (3) combine them, and (4) rank the storms by air pressure.

I have only personally experienced a weather event like Hurricane Irma once in my life. I was traveling in Taiwan when Typhoon Matmo struck. I was very impressed with the people of Taiwan and how organized they were on handling this event. The only unnerving part of the whole thing was that I was about to take a flight back to the US when a plane crashed at a nearby airport. This made me a bit queasy about how safe it was to fly back. It turned out the crash was because of pilot error.

For those who are interested, my workbook is here. Figure 2 shows the most intense hurricanes in the Atlantic basin when ranked by central barometric pressure.

Figure 2: List of the Most Intense North Atlantic Hurricanes Ranked By Barometric Pressure.

Figure 2: List of the Most Intense North Atlantic Hurricanes Ranked By Barometric Pressure.

I was also able to generate a list of the top 25 typhoons and hurricanes by air pressure (Figure 3). Notice how the Pacific has had the most intense storms, at least with respect to barometric pressure. Hurricane Patricia was interesting because it was a very intense hurricane that struck Mexico from its western side (see Figure 4).

Figure 3: Worldwide Ranking of Hurricanes and Typhoons by Air Pressure.

Figure 3: Worldwide Ranking of Hurricanes and Typhoons by Air Pressure.

Figure 3: Path and Intensity of Hurricane Patricia.

Figure 4: Path and Intensity of Hurricane Patricia. (Source)


Posted in Excel, History Through Spreadsheets | Leave a comment

Baby Name Frequency Ranking Using Power Query


Quote of the Day

In defeat unbeatable, in victory unbearable.

Churchill on General Montgomery, a man who was not popular with those he worked with and for.

Figure 1: Most Popular Baby Names of 2016.

Figure 1: Most Popular Baby Names of 2016 based on SSA data. You can verify my 2016 list by looking at the SSA web site.

You may have noticed that many of my recent posts are focused on data processing and analysis. These recent posts reflect the fact that I am preparing for a career change as I near "retirement," and I plan on working in the data analysis arena. This means that I have been in serious Python, R, and statistics training. While I love working with Python and R, I keep finding myself drawn back to Excel and Power Query (aka Get and Transform) for quick, ad hoc analysis work. While this blog will look at the frequency over time of the name Mark, you can use the tool to generate the same worksheet for any name.

In this post, I am going to use Social Security Administration (SSA) data on baby names, which dates back to 1880 – I assume the original data came from the Census Bureau because the SSA did not exist in 1880. The data is in the form of text files, one file for each year, that contain the number of babies given each name during that year. The SSA did not list names that occurred less than five times in a year. As an example of the information that can be extracted, Figure 1 shows the ten most popular names (male and female) in 2016.

I was inspired to look for this data because I am interested in the historical popularity of the name Mark (my name), which I became curious about because virtually every Mark I meet has an age from 50 to 60 years. Given that observation, it seems that I should see Mark being a very popular name during the late 1950s through the 1960s. I will show that the data supports this observation. I will present the results graphically. I also plotted data for other names in my family, but decided not to post that data for privacy reasons.

This is a really good data set to use for a Power Query example because:

  • The data consists of many text files in the same format that need to be combined.
  • Each text file name includes the year of the data, which I must extract and put into a column within the data.
  • There are well over a million lines of data, which exceeds the number of rows allowed in Excel. Power Query and Power Pivot allow these large data sets to be processed within the Excel environment. You can present the data using pivot tables, which allow you to aggregate the information down to a reasonable size.
  • I was able to use of an m-code routine to rank the names. I was inspired to use this approach by Chris Webb's excellent blog post on the topic.
  • I was able to demonstrate how to make the workbook relocatable by using sending Power Query the file address as a parameter.

Figure 2 show that the name Mark was quite popular (6th or 7th ranked) from 1957 to 1968. So my anecdotal evidence is supported by the data.

Figure 2: Popularity of the Name Mark in the US over Time.

Figure 2: Popularity of the Name Mark in the US over Time.

My workbook and the associated data is located here – the zip file is quite large (22.5 MB). You can generate Figure 2 for any name by inputting the name and gender (M/F), then pressing the update chart button. You can use it to check the popularity of any name over time. As an example, Figure 3 shows the popularity of the name Peter over time.

Figure 3: Popularity of the Name Peter Versus Time.

Figure 3: Popularity of the Name Peter Versus Time.

The workbook contains a single line macro that updates the query.

Posted in History Through Spreadsheets, Personal | 1 Comment

Largest Counties in the US


Quote of the Day

I don’t see why in Hell Roosevelt didn’t order Wainwright home and let MacArthur be a martyr … We’d have had a real General and a fighting man if we had Wainwright and not a play actor and bunco man such as we have now. Don’t see how a country can produce men as Robert E. Lee, John J. Pershing, Eisenhower, and Bradley and at the same time produce Custers, Pattons, and MacArthurs.

Harry S. Truman, Diary Entry – June 17, 1945. I have read everything I could find on MacArthur. I will never understand how a man like that rises to power.

Figure 1: Itasca County in Northern Minnesota.

Figure 1: Itasca County in Northern Minnesota.

I am currently building a retirement home in Itasca County, Minnesota. While at the construction site, I spoke with a sheriff's deputy about law enforcement coverage for my area. He told me that there were only two deputies on duty at night covering an area larger than the state of Delaware. He also said that if the deputies are tied up in the southern part of the county it can take a while for them to get to my place, which is in the the northern part of the county.  I had never thought about the size of the counties in northern Minnesota, but there are some large pieces of real estate there.

I became curious about the size of the largest counties (and county-like entities) in the United States and how they compare in size to the various states. I found county size data on the Census Bureau web site and state size data on the Wikipedia. It turns out that while some of the counties in Minnesota are large, they are dwarfed by the size of counties in the western states – Alaska in particular has some monster counties. Itasca County is 189th largest in the US. So while I think of it as big, there are many larger counties. For those who are interested, my analysis is in this workbook.

Figure 2 shows the states closest in area to the 25 largest counties by total area (i.e. land and water).

Figure 2: List of 25 Largest Counties by Total Area and the State Nearest in Area.

Figure 2: List of 25 Largest Counties by Total Area and the State Nearest in Area.

Figure 3 shows how many states are smaller than these counties. For an international comparison,  Yukon-Koyukuk, AK is larger in area than Germany (137,983 mi²). Figure 4 shows the size of Yukon-Koyukuk relative to the rest of Alaska.

Figure 3: Number Of States Smaller Than Specific Counties.

Figure 3: Number Of States Smaller Than Specific Counties.

Figure 4: Yukon-Koyukok Borough in Alaska.

Figure 4: Yukon-Koyukuk Borough in Alaska. (Source)

Posted in Excel, Personal | Leave a comment