Computing the Longest Lived US President with Power Query

Quote of the Day

...in any bureaucratic organization there will be two kinds of people: those who work to further the actual goals of the organization, and those who work for the organization itself. Examples in education would be teachers who work and sacrifice to teach children, vs. union representatives who work to protect any teacher including the most incompetent. The Iron Law states that in all cases, the second type of person will always gain control of the organization, and will always write the rules under which the organization functions.

— Pournelle's Iron Law of Bureaucracy.

Figure 1: Jimmy Carter, 39th President of the United States.

I was listening to news the other night when I heard a reporter mention that Jimmy Carter just became the longest-lived US president. I thought verifying this fact would be a good Power Query exercise. He had just surpassed George H.W. Bush, the previous record holder.

My approach was to obtain the birth and death information (if present) by scraping the Wikipedia pages for US presidents. For the dead presidents, I then computed their age at death. For the living presidents, I computed their age as of 25-March-2019.

Doing historical work in Excel is always an adventure because it does not handle dates prior to 1900. Power Query does not have this issue. On the downside, Power Query does not support a function for computing date differences in terms of years, months, and days (Excel has DATEDIF). I found an M-code routine that partially solved the issue, which I modified to fully solve the issue. My Excel worksheet is available here.

My routine for computing the ages of the presidents in years, months, and days is shown in the following code block. This is a modified version of some work by Imke Feldmann, a goddess of Power Query. The routine adds a  column with a record in each president's row that can be expanded to show years, months, and days.

Table 1 shows my table of the longest-lived dead presidents. Note that my algorithm does not count their last day – just like DATEDIF. I compared my result with the output from this website and they agree.

Table 1: Longest Lived Dead Presidents.

Table 2 shows my table for the ages of the living US presidents. As you can see, Jimmy Carter is now the longest living US president.

Table 2: Ages of the Living US Presidents.

Good Use for Excel Textjoin Command

Quote of the Day

When Winston is right he is unique. When he is wrong, Oh My God!

Figure 1: Old School Work Time Clock. (Source)

I am always amazed at how Excel gets used for everything in small companies. It is the glue that hold many small businesses together.

One common Excel task is tracking work hours. As a contractor, I encounter all sorts of approaches to recording work hours. One small company wants all of my hours captured in an Excel workbook that contains one worksheet per week. Every two weeks, an administrator goes in and captures the hours into another worksheet.

In addition to the company's desires, I want a couple of reports from this workbook.

• A list of my work hours by week and task.
• A list of my workdays, which I use to compute my mileage.

How to solve this problem in the simplest way possible? I have a few constraints and one personal request:

• I cannot confuse the administrator by adding extra worksheets.
• I cannot use macros.
• I cannot modify the worksheet structure:
• 3 lines for tasks assigned to each workday
• Monday is the start of the work week
• Each tab labeled by week number (Week1, Week2, ...)
• Because many of the customers and accounts repeat from one week to the next, I want to be able to add worksheets by copying the previous week's worksheet and incrementing the start of week date.

I took the following approach:

• I will hide my report worksheet.
• On my report worksheet, I will use an expanding spearing formula (aka 3D formula) and TEXTJOIN to create table with one column of cells with all the dates, tasks, and hours combined into delimited strings. To ensure that my table expands properly, I have added two hidden sheets called Begin and End that will bracket my weekly worksheets. I can include them in my TEXTJOIN address. They will ensure that as I add sheets, my TEXTJOIN formula will not need to be modified.
• I will use Power Query to process the table.
• I then generate two pivot tables with the information that I need.
• I added a pick list for all the Mondays in a year so that I cannot get a start date wrong.

This approach seems to be working. When I need my reports, all I do is unhide my report page, refresh my pivot tables, and I have the information that I need. When I start a new week, I just copy the previous week and change the start date.

My time card workbook is shown here. I have left everything unhidden so you can see what I did.

BER Test Time Spreadsheet

Quote of the Day

This... is [the isle of] Berk. It snows nine months of the year... and hails the other three.

— Introduction from How to Train Your Dragon. I feel this way about Minnesota every February.

Figure 1: Typical BER Test Configuration.

I test high-speed serial channels every day. The most common test parameter that I need to measure is the Bit Error Rate (BER). Figure 1 shows the most common test configuration used for measuring BER. Because bit errors occur randomly, there is a certain amount of error involved in measuring the parameter. So when you state a BER measurement, you also give a confidence interval to express your level of uncertainty.

I recently was asked how to compute the  test time required to verify the Bit Error Rate (BER) of a serial transport at various confidence levels. Test time is important because it drives test cost. I decided to make a spreadsheet that uses a data table and VBA to compute the test time as  function of the number of bit errors during the testing and the desired confidence level. While I have discussed this topic in another blog post, I did not provide a spreadsheet for computing the values. The spreadsheet included here resolves this shortcoming. You just need to put in your bit rate and push the button.

GPS and Y2K Deja Vu

Quote of the Day

There is something to be said for cold, calculating competence.

Morning Joe TV program gave this review of the movie Vice and its portrayal of Dick Cheney.

Figure 1: GPS-III-A satellite. (Source: US Government)

I have been working since May 2018 as a contractor for various companies on resolving specific issues – I am a troubleshooter. This role has provided me with a number of interesting challenges. One of my recent challenges is helping folks deal with the GPS Week Number Rollover (WNRO) issue.  This issue involves a 10-bit week counter that has been counting up from 0 every week since 21-August-1999.  This counter can will count up to 1023 and then it will rollover to 0 on 7-April-19. Not everyone is prepared for this rollover. This issue shares many similarities with the Y2K problem. See this presentation for a good overview of the rollover issue.

These 1024 week intervals are referred to as epochs. Each epoch lasts 19 years, 7 months, and 15 days. GPS went live on 6 - Jan-1980 and the first epoch ended on 20-Aug-1999. We are now in the second GPS epoch and the 3rd epoch begins 7-April-2019.

Going forward, GPS implementations will be using 13-bit counters, which will reduce the frequency of these events.

We can easily use Excel to show that the next rollover date is 7-April-2019.

I  swear that I deal with these counter rollover issues at least once a year (earlier example).

Posted in Electronics, software | 1 Comment

MTBF Predictions Often Misused

Quote of the Day

Nothing in the world is in a bigger hurry than a dead fish.

— I heard a Norwegian fisherman say this during an radio interview while discussing his need for a bridge to speed the truck transport of his fish to market.

Figure 1: P8 Poseiden, a new patrol plane that is having some teething problems. (Wikipedia)

Performing an MTBF prediction is to HW design as putting a license plate on the car is to driving the car. You need the license to drive the car legally, but it adds nothing to your driving experience. Similarly, every company I have worked for demands a predicted MTBF for every HW product, but it adds no value to the design process. I would argue that generating the MTBF predictions adds negative value to product deployment because it is often misused by customers to estimate spare requirements and field support costs. Since no one has told customers otherwise, they think the MTBF value accurately reflects the real failure rate of a product. MTBF predictions provide a gross estimate of the rate of random part failure at product maturity. Real products, especially at introduction, rarely fail because of random part failures. Instead, their failures are dominated by issues like:

• Software upgrade issues
I often see products returned because of an issue caused by a software problem. To the customer, the product did not work, and they return it.
• Environmental problems
Lightning is the most common issue environmental issue that I see, but there are others. They often have to do with a misapplication of the product. For example, the product was deployed in a wet environment, but it is sensitive to moisture.
• Installation-induced failures
Many modern products are complex, and their installation is not simple. For example, I have seen products deployed with improperly stored backup batteries, which sulfated the batteries and rendered them unchargeable. When the backup power was needed, the product failed.
• Products returned with no discernable problems
Over half the products I see returned are for problems that I could not replicate. This is usually because of training issues.

Because the return rates for these failure modes is much higher than the random part failure rate, customers are shocked when they see their real field support costs are much higher than those costs predicted using MTBF. Also, you often see unexpected reductions in system availability because the number of spares was seriously underestimated – another number commonly estimated using MTBF.

You rarely see this problem mentioned in the press, but I saw the following paragraph in an article on the troubled introduction of the P-8A Poseidon (Figure 1) that alludes to the issue.

Moreover, the report found, data from the operational testing and evaluation of the P-8A's latest software engineering upgrade as well as metrics from the Navy "show consistently negative trends in fleet-wide aircraft operational availability due to a shortage of spare parts and increased maintenance requirements."

When I have seen this problem, it always traces back to the use of an incorrect failure rate model. For this reason, I always estimate field support manpower and spare numbers based on the historical failure rates of similar products at similar levels of maturity.

Estimating the Lithium Content of a Lithium Battery

Quote of the Day

The ideal subject of totalitarian rule is not the convinced Nazi or the dedicated communist, but people for whom the distinction between fact and fiction, true and false, no longer exist.

Introduction

Figure 1: Lithium on the Periodic Chart. (Source)

Most of the products that I work on are powered by lithium batteries. Lithium batteries are popular today because they have excellent energy density but there are safety concerns with using them because there have been issues with battery fires. These fires have caused the shipping industry to impose special labeling and packaging information on their transport. I recently have needed to consider shipping batteries on airplanes, so I have been looking at the International Air Transport Association (IATA) shipment guidance for lithium-ion batteries.  These rules require knowing the amount of total amount of lithium mass present in a lithium-ion battery. This is not a number that is easy to get from the manufacturers, though I do have a number from one vendor.

As usual, I started by googling for a way to estimate the amount of lithium in a battery. I soon came upon a formula used by FedEx (Figure 2).

Figure 2: FedEx Guideline for Calculating Lithium Content in a Lithium Battery. (Source)

I decided that I needed to derive this relationship to understand it, which is the topic of this post. For the rest of this post, I will be working with a single cell. This post will focus on the amount of lithium in a single cell; a battery is just a bunch of cells in a serial and parallel configuration.

Background

A Little Chemistry

There are many different lithium-ion chemistries. A common one uses lithium cobalt oxide and Equation 1 shows the chemical formula for this discharge reaction. Observe that 1 electron is transferred for each atom of lithium reacted.

 Eq. 1 $\displaystyle Li++{{e}^{-}}+LiCoO2->Li2O+CoO$

Equation 1 tells us that one mole of lithium will product 1 mole of electrons.

Alternate Forms of This Relationship

I actually found a number of different relationships online. While their units were all different, they are based on the same principles. One assumption I have made is that the nominal lithium cell voltage is 3.3 V, which allows me to convert between Amp-hours (A-hr) and Watt-hours (W-hr).

 Eq. 2 $\displaystyle 8\cdot \text{gm}\approx 100\cdot \text{W}\cdot \text{hr}$ Eq. 3 $\displaystyle 1\cdot \text{gm}\approx 4000\cdot \text{mA}\cdot \text{hr}$ Eq. 4 $\displaystyle \text{ 0}\text{.3}\cdot \text{gm}\approx \text{1}\cdot \text{A}\cdot \text{hr }$

Analysis

Derivation

Figure 2 shows how to derive Equations 2 through 4. I obtain the atomic mass of lithium from the periodic chart symbol shown in Figure 1.

Figure 2: Derivation of Lithium Content Rules of Thumb.

Example with Known Lithium Content

I know the lithium content of one battery. Figure 3 shows a comparison between an equation from Figure 3 and the real value. At least for this case, approximation and reality are within 10%.

Figure 3: Comparison of Formula Estimates with Actual Value.

Conclusion

I expect to be using the FedEx approximation in Figure 2 for some battery packaging design in the next few months.

Appendix A: Great Lithium-Ion Safety Infographic

Chemical & Engineering News has a done a superb job of summarizing the fire issues associated with Lithium-ion batteries (Figure 4). Please check out their website for the reference material used in creating it.

Figure 4: Lithium-Ion Battery Infographic. (Source)

Posted in Batteries, Rules of Thumb | Leave a comment

US Cruiser Production During WW2

Quote of the Day

If you want real joy, stop looking at yourself and see how you can help someone else.

Luke Mickelson (2018 CNN Hero). I have found the Benedictine principles of hospitality, stewardship, and service are key to my own happiness and sense of purpose. In particular, the focus of the Benedictines on having an attitude of gratitude is particularly important. This all meshes well with my stoic approach to life.

Introduction

Figure 1: USS Cleveland, which was the lead ship of the most produced US cruiser class of WW2. (Wikipedia)

I have been working through the book Collect, Combine and Transform Data Using Power Query in Excel and Power BI by Gil Raviv – it is an excellent Power Query (PQ) resource. I particularly like the methods discussed in Chapter 10, which focused on how to make your queries robust, that is, insensitive to minor deviations in the input data. Chapter 10 spoke to me, and I immediately began looking for some practice data that suffered from common inconsistencies: headings in different cases, minor spelling errors in the data body, and inconsistent wording (example, "Co." instead of "Company"). I found that data in the Wikipedia's information on US WW2 cruisers. In this post, I will look at the production of cruisers by the US during WW2. See Figure 1 for a typical example of a WW2 US light cruiser.

For those who are interested, my workbook is here.

Background

Analysis Approach

Gil presents a number of methods that I applied in this exercise:

• Use standard PQ functions to ensure the headings were all in the lower, upper, or proper cases.
• Create custom functions for processing files using a template file as the basis for your work.
• Setup conversion tables that contain all the inconsistencies present in your data set and the consistent value you want to use. Merging will be used to perform the replacements.

My analysis approach was straightforward:

• Grab the URLs of the Wikipedia pages with the cruiser data I wanted.
• Tidy the data from one of those pages in a way that will work with all other pages (the hard part).
• Make a function of that template.
• Apply the function to all the different URLs.
• Aggregate the data using pivot tables.
• Plot what needs to be plotted.

What is a Cruiser?

I like this definition of a cruiser that I found on the GlobalSecurity.Org website.

By the mid-20th Century, cruisers were medium-sized, general-utility ships. They had a large cruising range and are capable of high speeds (over 30 knots). They served as protective screens against surface and air attacks and also provide gunfire support for land operations. Cruisers were lightly armored, heavily armed, fast ships designed to screen formations and to scout out enemy fleets. Their survivability depended on speed, not armor. This continued to be the meaning until after the Second World War - a fast, long-range, lightly armored ship, although by then more powerful than a destroyer.

Cruisers were further subdivided into three types:

Light Cruiser
For the US Navy under treaty limits, light cruisers mounted 6-inch main guns and had displacements of less than 10,000 tons.
Heavy Cruiser
For the US Navy under treaty limits, heavy cruisers mounted 8-inch main guns and had displacements of ~10,000 tons. Anything larger in terms of guns and displacement would have had them considered capital ships and subject to stringent naval arms control treaty regulations. After WW2, the US increased the displacement for heavy cruisers to 17,000 tons with the commissioning of the Des Moines-class.
Battlecruiser
This class of ship was poorly defined. They were similar in displacement, armament, and cost to battleships, but differed slightly in form and balance of attributes. Battlecruisers typically carried slightly thinner armor and a lighter main gun battery than contemporary battleships, installed on a longer hull with much higher engine power to attain faster speeds. For the US Navy, battlecruisers carried 12-inch main guns. The Alaska-class was comparable to the German's Scharnhorst-class battleships, which carried 11-inch guns.  (Wikipedia)

The cruiser mission changed throughout WW2. Early in the war, cruisers played a crucial surface combat role (for example, see the Battle of Savo Island). As the Imperial Japanese Navy's surface fleet became a shadow of its former self, US cruisers finished the war with carrier air defense and land bombardment as their key roles.

Analysis

Wikipedia List

I am basing my work here on the Wikipedia's list of US WW2-era cruisers. The Wikipedia list includes all cruisers that served in WW2 or were in construction during the conflict (Figure 2), which means that it includes some obsolete classes (example: Omaha-class) and some that were not commissioned until after the war (example: Juneau and Fargo-classes).

Figure 2: Wikipedia List of US WW2 Cruiser Classes.

US Cruisers Commissioned During WW2

Between the attack on Pearl Harbor (7-Dec-41) and VJ Day (2-Nov-45), the US commissioned 47 cruisers in 4 different classes. Figures 3(a) and 3(b) show that most of the cruisers commissioned were light cruisers.

 Figure 3(a): Number of US Light, Heavy, and Battle Cruisers Commissioned that Served During WW2. Figure 3(b): US Cruiser Classes Commissioned During WW2.

The Alaska-class battlecruisers were the bruisers of the bunch but came so late in the war that they had little impact. The reason the Alaska-class was so late was it was a new class with a new gun, and it just takes time to work through the teething problems. In other posts, I have applauded those US war planners that focused on producing war material that could be delivered to the front in time to make a difference – the Liberty ship and M4 Sherman tank are cases in point. The Alaska-class example should serve as a warning to those who are not careful about managing risk during development. For interesting forum discussions on the Alaska-class ships, look here and here.

US Rate of Cruiser Commissioning During WW2

Figure 4 shows the rate of the US cruiser commissioning during WW2. The US had been ramping up warship production starting in 1940 with the Two-Ocean Navy Act, which authorized the adding 257 ships to the US Navy. After the Pearl Harbor attack, warship production accelerated even further. However, warships take a long time to build. Figure 4 shows that the cruiser commissioning rate did not significantly increase for two years after Pearl Harbor.

Figure 4: US Cruiser Commissioning During WW2.

Shipyard Statistics

Figure 5 shows the US shipyards tasked with building the new cruisers. These shipyards faced enormous challenges. While the US Navy had an immediate demand for a massive number of warships, the shipyards faced difficulties with finding the materials and skilled workers that were needed to fulfill the wartime need

Figure 5: US Shipyards Producing Cruisers During WW2.

The William Cramp & Sons shipyard is an example as to the difficulties in converting a civilian shipyard to warship production; it had abysmal keel laid-to-commission durations. Their submarine build times were similarly bad.

Conclusion

Analyzing US WW2 cruiser production was a good training exercise for the methods that Gil Raviv shows in Chapter 10 of his PQ book. I have to say that I have learned a lot.

What can we learn about WW2 from the data? This post did not look at the fates of these warships after VJ day, but I could not escape seeing that so many ships were scrapped shortly after the war ended. The US left WW2 with the largest navy in the world, but it was a navy perfectly designed to defeat the Imperial Japanese Navy – an opponent that no longer existed.

The war created technological changes that affected every aspect of the US Navy:

• Aircraft carriers needed to be redesigned to support jet aircraft.
• US diesel-electric submarines were rendered obsolete first by the German Type XXI boats and shortly after that by nuclear boats.
• Battleships were rendered obsolete by their big guns being outranged by aircraft launched from aircraft carriers.
• Big gun cruisers were rendered obsolete as carriers and their air wings became the dominant ship killers in surface battles. Gun battles between big gun warships became a feature of an earlier time. Cruisers needed to focus more on air defense for the aircraft carriers.
• The US and Canada manufactured ~300K propeller-driven aircraft during WW2. Nearly all were obsolete at the end of the war.
• Jet aircraft rendered anti-aircraft artillery ineffective – the focus would shift to missile-armed vessels.

Whole new classes of warships would be needed going forward to face the threats in the Cold War.

US Submarine Production During WW2

Quote of the Day

Those that travel the high road of humility in Washington are not bothered by heavy traffic.

— Sen. Alan Simpson

Figure 1: USS Balao, lead ship of the most numerous class of US WW2 Submarines.

I recently purchased the book Collect, Combine and Transform Data Using Power Query in Excel and Power BI by Gil Raviv and am learning a lot – so much so that I am motivated to go hunt some additional data examples for processing by Power Query.

One WW2 topic that continues to intrigue me was how US war planners kept the Imperial Japanese Navy (IJN) at bay long enough to build a large naval force. The key was the use of submarines for commerce raiding to disrupt the war material supply chain and tie down Japanese surface forces with convoy defense duty. This post will use Power Query to scrape the Wikipedia for this data. The Wikipedia is becoming a wonderful source for WW2 information.

Figure 2: US Submarine Classes Commissioned During WW2 (Dec 7, 1941 – Sep 2, 1945).

Early in the Pacific War, the US Navy could only project power into Japanese waters on a sustained basis by using its submarines. The early actions were often conducted by smaller boats (for example, S-boats) that were inadequate for a theater as large as the Pacific. To paraphrase Donald Rumsfeld, you go to war with the Navy you have – not the Navy you might wish you have. Fortunately, the US had begun building its larger and more capable fleet submarines just before WW2. Figure 2 shows the three classes of fleet submarines that were commissioned during WW2: Gato, Balao, and Tench (in order of introduction).

The Wikipedia maintains a list of US WW2 submarines. I will use this list of submarines to access data from the Wikipedia page for the individual boats. I will filter the data to focu only on fleet submarines commissioned during the war. Some folks may disagree with individual boats on the Wikipedia list. For example, the Wikipedia does not list the USS Mero as a US WW2 submarine, but it technically was commissioned during WW2 so late in the war that it never entered combat. However, their approach is defensible and does not have a significant effect on my results.

I will access the information using the same approach as in this earlier  post:

• create a list of WW2 submarine web pages
• access each page and extract the relevant information
• generate pivot tables for the required reports

I followed this procedure to generate a list of submarines commissioned during the war, their shipyard, and the time between keels being laid and submarines being commissioned, which I call the build time. Figure 3 when keels were laid and when boats were commissioned. Note how the number of keels laid dropped precipitously after August 1944. By that point, the IJN was on the ropes and US Navy surface units were in Japanese waters. The war planners saw that more submarines were not needed at that point.

Figure 3: Dates of Submarine Keel Laying Versus Commissioning.

Figure 4 shows the median build times by class. When you consider that the US role in WW2 lasted 44 months, build times of 9 to 12 months were significant, especially when given that it takes additional months for a commissioned boat to get into combat.

Figure 4: Median Build Times By Class.

Submarines are very difficult to build and the shipyards must have specialized skills, like for the welding of thick steel pressure hulls. There were five shipyards that commissioned fleet boats during WW2 (Figure 5). The Manitowoc yard is on Lake Michigan and had been building ferries and ore haulers prior to the war. While Manitowoc had no submarine building experience, it was able to develop these skills under war-time conditions. I find this an impressive accomplishment. Manitowoc-built subs performed well during the war; one, USS Rasher, had the third-highest sunk tonnage total.

Figure 5: US Submarine Shipyards During WW2.

If you are interested in how the data was gathered and analyzed, my spreadsheet is here.

| 1 Comment

Calculating the Pointing Angle for My Television Antenna

Quote of the Day

If I strive for perfection, I can generally achieve good'nuff. If I strive for good'nuff, I generally achieve firewood.

— Woodworkers Credo. I find this quote holds true for so many things. I used to work for an HP engineer, Warren Pratt, who said similar things about designing printed circuit boards. Basically, you must strive for perfection because things will go wrong that you cannot anticipate. So nothing will be perfect, but you might get something worthwhile.

Figure 1: Television Stations Closest to My Cabin. (Source)

I spend quite a bit of time at a cabin I have built in northern Minnesota. Technically, I spend most of my time in the garage on the site and I have decided that I need to be able to watch the local television stations in Duluth. These stations are ~75 miles away and I need to determine the bearing along which to point my antenna. This seemed like a good Excel exercise that I can also use as an example for those I tutor at the Hennepin County Library. There are web calculators available that perform this calculation (example), but it is more fun doing it myself.

Once I decided that I needed an antenna, there are two numbers I in order to purchase the correct unit and to point it properly:

• Distance to the Stations
Distance determines the type of antenna that I will need.
• Bearing of the Stations Relative to North
I will be using a compass to point the antenna and I need an angle from my garage to Duluth.

Equation 1 contains the formulas needed for computing the range and bearing between two points on Earth given their latitudes and longitudes. For more details on these formulas, see this discussion on the haversine formula, which can be derived using spherical trigonometry.

 Eq. 1

where

• θ is the bearing to the antenna from the garage
• R is the distance to the television antenna.
• ϕlat is the latitude of the garage
• ϕlon is the longitude of the garage
• ψlat is the latitude of the television station antenna
• ψlon is the longitude of the television station antenna

I implemented the equation in an Excel spreadsheet, which you can download here.  Based on my range calculation, I ended up buying a Yagi-Uda television antenna (Figure 2).

Figure 2: My new television antenna.

Medal of Honor Statistics Using Power Query

Quote of the Day

Their drills are bloodless battles, and their battles bloody drills.

Publius Flavius Vegetius Renatus, Roman writer on the Roman military. An older version of "train like you fight, fight like you train."

Figure 1: US Air Force Medal of Honor. (Source)

In this post, I will use Power Query (PQ) to gather (aka web scrape) the US Medal of Honor (MOH) recipient names, rank, service, and conflict from a website called the Congressional Medal of Honor Society (CMOHS), which states that there have been  3505 MOH recipients. They have a beautiful website with the records stored on 140 ages with 25 names per page and one page with 5 names. Normally, I would use Python to scrape a large number of web pages but I am trying to use PQ more because my customers all have Excel. My approach as a contractor is always to work within the existing infrastructure of my customers. A copy of my workbook is available here.

I should mention that the Wikipedia states there have been 3522 MOH recipients. I do not know the reason for the discrepancy, but I do know that I gathered every MOH recipient from the CMOHS pages.

The web scraping process was straightforward:

• Create a list of all web pages with MOH recipients.
This was easier than you might think. There were 141 pages, each with the base URL and an appended number (1 through 141). You can see the format in the attached workbook.
• Build a parsing template by parsing one webpage
This was a little harder than you might think. PQ could not determine how to identify the information on the web page because it was not table-oriented. However, I took a standard web query and modified it by converting a WebPage command into a WebContents command –  it loaded in the raw HTML. I then parsed the HTML to extract the information. I illustrate the process in Figure 2, a figure which is large and you must click on it to view. The HTML parsing was routine and I just refer you to the query to see how I did it.

Figure 2: Grabbing HTML..

• Turn the single-page parsing routine into a function
This is a routine process. You add a function header (aka signature) and a closing phrase. I illustrate the process in Figure 3.

Figure 3: Turn a Query Into a Function.

• Load the list of web page addresses into PQ
Make the list of web page addresses into a table and load it into PQ.
• Run the function over every web address, producing a column of tables.
I used the "Invoke Custom Function" command from the menu to create a new column with the parsed data. I illustrate the process in Figure 4.

Figure 4: Calling the Parsing Function.

• Expand the column of tables into its separate fields.
You now have all the MOH recipients. At this point, I can begin filtering and pivoting.

I know it sounds complicated but it is simple when you see it in the spreadsheet.

Now that we have the data, let's see what we can learn. I decided to focus on WW2 and later conflicts. I wanted to know the number of MOH recipients by:

• conflict
• service
• rank

I now could generate the pivot tables that I wanted (Figures 5 through 7).

Figure 5: Number of MOH Recipients By Conflict.

Figure 6: Number of MOH Recipients By Service.

Figure 7: Number of MOH Recipients By Rank (Top 10).

The results make sense:

• WW2 was such a massive conflict that the number of recipients should be larger.
• The US Army is the largest of the US military services, so it would make sense for it to have the most recipients.
• Enlisted ranks would be more likely to be in the life and death situations that would call for such acts of bravery.
Posted in Excel | 3 Comments