Web Scraping WW2 Landing Ship Data


Quote of the Day

The destinies of two great empires seemed to be tied up in some God
damned things called LSTs.

— Winston Churchill


 Figure 1: LSTs on Omaha Beach Shortly After D-Day.

Figure 1: LSTs on Omaha Beach Shortly After D-Day. (Source)

I have been working on improving my web scraping abilities by analyzing WW2 data. I have focused on topics related to how the US took the 14th largest military in the world and in roughly 18 months turned it into a military that could fight anywhere in the world. In this post, I want to look in detail at how war materials were delivered to beaches around the world using a vessel called a Landing Ship Tank (LST). I have wanted to write about the LST for a while, but the web data was distributed on about 1200 separate pages – one for each ship. While a Python script would have worked nicely, I wanted to try gathering the data without doing any programming. I found some software that did a good job automating this task, which I will discuss later in this post.

To understand the role of the LST, you need to understand the US war material supply chain during WW2. To supply its forces, the US needed to:

  • manufacture the goods (e.g., tanks)
  • transfer the goods to ports (e.g., trains and barges)
  • ship the goods on ocean-going transports (e.g., Liberty and Victory ships) to staging areas
  • move the goods to a beach using landing ships – the LST is the largest landing ship and it handles the heaviest material (Figure 1).
  • move the goods from the beach to the troops using ground transport (e.g. Red Ball Express)
Figure 2: LST Unloading Rail Cars.

Figure 2: LST Unloading Rail Cars. (Source)

The Landing Ship Tank (LST) was the landing ship class tasked with placing heavy equipment, like tanks and railcars, onto beaches around the world during WW2 (Figure 2). The US and UK jointly developed the LST design during 1941 with production starting in 1942. The Wikipedia provides a comprehensive list of the 1,198 LSTs that were built by the US, but some of these vessels were built after WW2; 1,046 LSTs were built by the US during the war years (Pearl Harbor to VJ day). The US shipyards average 24 LSTs per month throughout the war. Shipyards in Canada and the UK built ~80 LSTs during the war years. The LSTs were so numerous that most were referred to by their hull designations and were not assigned formal names. Because the LSTs were plentiful, some were repurposed as utility vessels. For example, LST-490 was redesignated as USS Agenor, a landing craft repair ship.

While the  LSTs were a critical part of the war supply chain, the ships were not popular with their crews. They were not particularly seaworthy and the crews were often seasick. Because their bow design was focused on opening to deploy tanks on a beach and not for low drag, the ships were slow (maximum speed of ~12 knots)  and were vulnerable to submarine and E-boat attack. The crews often grimly said that LSTs stood for "Large Slow Target" – a sinking LST was not a pretty place. In fact, 40 US LSTs were sunk for all causes during WW2 (see Appendix A).


Two websites have pages assigned to individual LSTs: navsource.org and uboat.net. Because each LST's data was on an individual web page, I needed to scrape 1198 pages. The magnitude of this task meant that I wanted to automate this work.

There are two software tools that I used to gather the data:

  • Data Miner
    Data Miner is a chrome app that I used this tool to gather the individual LST URLs from navsource.org and uboat.net.
  • Octoparse
    I fed this tool URLs from Data Miner and used it to parse the individual LST URLs. It has a very useful wizard that guides you through its use.

Octoparse produced an Excel spreadsheet with the LST data I was looking for: LST name, date laid, date launched, data commissioned, and manufacturer. There was some cleanup needed, but Power Query handled that nicely. The rest of the work was standard Excel processing – graphs, computing averages, etc.

I should mention that the navsource.org website did not appreciate being scraped and kept asking me if I was a robot. I had to break my parsing up into small bits over a period of days.

My Excel worksheet is here. There is not much there because most of the work was done by Data Miner and Octoparse.


Build Time

I am defining build times as the interval between the date the keel was laid to when the LST was commissioned. Table 1 shows how the mean construction time for an LST varied by year. 1942 production began in traditional shipyards, which resulted in relatively short build times.  By 1943, manufacturing was being moved to non-traditional manufacturers and the build times extended. The build times dramatically reduced in 1944 as the manufacturers gained experience. Times grew again in 1945, which I have seen for other WW2 production. The war was clearing drawing to a close by the summer of 1945 and the government was beginning to reduce the priority of war production.

Table 1: Mean LST Build Time By Year.
Figure M: USS Newport, a Modern LST.Figure M: USS Newport, a Modern LST.

Monthly Production

Figure 3 shows the number of LSTs commissioned by US manufacturer per month during WW2. Notice how the monthly commissioning rate peaked just before D-Day. This is not a coincidence as the Anzio landing and D-Day landing created a need for LSTs in Europe. Later production during 1944 and 1945 was needed for action in the Pacific theater.

Figure 3: Monthly US LST Production During WW2.

Figure 3: Monthly US LST Production During WW2.

Average LST Production Per Month

Table 2 shows how the mean US monthly LST production varied by year.  Production averaged 24 units per month throughout the war.

Table 2: Monthly Average LST Production.
Figure M: USS Newport, a Modern LST.Figure M: USS Newport, a Modern LST.

Shipyards Involved

Table 3 shows the 17 US companies that produced LSTs during WW2. Because the coastal shipyards were busy building warships, the Pentagon awarded the bulk of the LST contracts to companies with no experience building military ships. During peacetime, these companies produced bridges and river craft. They had the welding technology, heavy equipment infrastructure, and management experience needed for switching their manufacturing lines to LSTs and they produced 68% of all the vessels manufactured.

Table 3: US Companies Producing LSTs During WW2.
Figure M: USS Newport, a Modern LST.Figure M: USS Newport, a Modern LST.


The LST may not be pretty and its lack of seaworthiness may have sickened its crews, but it was a critical component in bringing supplies to Allied forces during WW2. The first LSTs entered service early in 1942 and the ship class stayed in service with the US Navy until the 1980s.  As with the M4 Sherman tank and the B-24 Liberator bomber, the LST was not loved by its crews. While many historians look at these systems and dwell on their shortcomings, LSTs were produced quickly and in sufficient quantity to make a major impact on both the Atlantic and Pacific battlefields (Figure 4).  The Germans took an alternative approach to war material – they produced some excellent systems but not in time or in numbers sufficient to make a difference (e.g. Type XXI submarine, Me 262).

I am in awe of the fact that in a time before computers, CAD, and robots, over 1000 large ships designed as part of an international partnership could be produced in 44 months that did the job required.

Figure M: LST Unloading at Iwo Jima.

Figure 4: LST Unloading at Iwo Jima. (Source)

Appendix A: US LSTs Lost During WW2.

Table 4 shows the US LSTs lost for all causes during WW2.

Table 4: List of US LSTs Lost During WW2 For All Causes.
Figure M: USS Newport, a Modern LST.Figure M: USS Newport, a Modern LST.
Posted in History Through Spreadsheets, Military History, Naval History | Leave a comment

1968 Vietnam War Statistics


Quote of the Day

Captain's log. Using the lightspeed breakaway factor, the Enterprise has moved back through time to the twentieth century. We are now in extended orbit around Earth, using our ship's deflector shields to remain unobserved. Our mission, historical research. We are monitoring Earth communications to find out how our planet survived desperate problems in the year 1968.

Captain Kirk, Opening for the episode "Assignment: Earth" from Star Trek TOS. People were very aware at the time that 1968 was an unusual year.

Figure 1: US Military Deaths in Vietnam.

Figure 1: US Military Deaths in Vietnam By Year. All data from the Combat Area Casualties Current File (CACCF) hosted by Duke University.

It has been 50 years since 1968, and I have been seeing quite a few retrospectives on television about that tumultuous year. I was in 6th-grade in 1968 and the chaos of that year is still very clear in my memory – I remember spending quite a bit of class time on the Paris peace talks. One lesson was about how the Paris Peace negotiators argued about the shape of the table at which they would sit. Arguing about the shape of a table while people were dying seemed ridiculous to a 12-year boy. After hearing all these recent discussions about 1968, I decided to look at the US Vietnam casualty data (Figure 1) to see what insights I could gain on that year. All my work is done in Excel and my workbook is here.

The war was a regular topic at my family's dinner table. My father was a Republican and my mother was a Democrat, which meant that they did not agree on the war at all. To show you how strange the situation was, my father supported Nixon because Nixon was going to turn up the war's intensity. My mother thought Nixon might work out because he was Quaker and they are opposed to war. Dad's view of Nixon turned out to be closer to fact.

My family used to watch Walter Cronkite on the news every weeknight, and the rise in the weekly casualty reports was very worrisome. You could feel a change in people's attitudes when Walter gave a devastating war critique that created real doubt about the future of the war (Figure 2).

Figure 2: Walter Cronkite Vietnam War commentary on 27-Feb-1968. I remember this news broadcast. It left people stunned.

In Figure 3, you can see how the war's intensity ramped up by looking at how the monthly casualty rates varied. I have highlighted in red the two months (February and May) with the highest casualty rates of the war; these months correspond to two major Vietnamese pushes during the Tet Offensive.

Figure 2: US Vietnam War Dead By Month.

Figure 3: US Vietnam War Dead By Month. All data from the Combat Area Casualties Current File (CACCF) hosted by Duke University.

Posted in History Through Spreadsheets | 4 Comments

I Forgot – VLOOKUP is Not Case-Sensitive


Quote of the Day

Men acquire a particular quality by constantly acting a particular way … you become just by performing just actions, temperate by performing temperate actions, brave by performing brave actions.

— Aristotle

Figure 1: Picture of 38999 25/37 Connector with Pins Labeled.

Figure 1: Picture of 38999 25/37 Connector with Pins Labeled.

I almost sent out a spreadsheet today that contained a VLOOKUP error – I forgot that VLOOKUP is not case sensitive. Fortunately, I caught my error seconds before I hit send. Today's post is about how I chose to perform a case-sensitive lookup in an ancient version of Excel.

My application is simple. I am working with a round connector that assigns lower and upper case letters to the pin positions (Figure 1). I have a list of coordinate positions for each pin label. I also have a separate list with pin labels that are assigned to lasers. I want a list of the (x, y) coordinate of every laser. Normally, I would use Power Query and a join to perform the lookup, however, my customer is forcing me to use an ancient version of Excel that does not support Power Query.

There are many ways to solve this problem. A quick Google search led me to this web page that gave a solution based on an array function that contains the Index/Match/Exact functions to solve my problem. I liked this solution because it is generally applicable. Figure 2 shows how I this approach to lookup pin X-values using the transmit pin name and a column of all the pin names. The Y-value lookup is similar. My example uses tables and structured references.

Figure 2: My Lookup Function for X-Values Using Transmits Names.

Figure 2: My Lookup Function for X-Values Using Transmits Names.

I include a greatly simplified version of my original Excel work here.

Posted in Excel | Leave a comment

Evenly Spaced Points on Logarithmic Graphic Using Excel


Quote of the Day

If I had my life to live over, I would do it all again, but this time I would be nastier.

Jeannette Rankin, the only congressman to vote against the declaration of war on Japan after Pearl Harbor.

Figure 1: Example of Noise Injected as Part of a Susceptibility Test.

Figure 1: Example of Noise Injected as Part of a Susceptibility Test.

I am doing some testing at an Electromagnetic Compatibility (EMC) facility this week. Part of the test specification requires that we inject audio frequency interference on the power supply lines at discrete frequencies that range from 10Hz to 100+KHz, with 30 frequencies selected from each decade of frequencies (e.g. 10 Hz to 100 Hz, 100 Hz to 1 kHz, etc.). Figure 1 shows a specification similar to the one I am performing. My test facility that has chosen the discrete frequencies to be evenly spaced on a logarithmic axis. I started to wonder how the frequencies were selected – let's work through it.

The key to determining the points is to observe that consecutive, evenly-spaced points on a logarithmic axis share a common ratio. We can use this fact to derive a solution using Mathcad as shown in Figure 2. Excel users can see the solution here.

Figure 2: Mathcad Calculation of 30 Points Evenly Spaced on a Log Scale.

Figure 2: Mathcad Calculation of 30 Points Evenly Spaced on a Log Scale.

Posted in Excel, General Mathematics | 1 Comment

Test Time vs BER and Confidence Level Using Excel


Quote of the Day

When a thing is done, it's done. Don't look back. Look forward to your next objective.

— George C. Marshall. I have met many people who spend their lives constantly looking back and expressing regret for opportunities lost. Learn what you can from your experience and move forward. As Spock used to say "Regret is a useless emotion, especially in humans."

Figure 1: BER Test Data Table. for a Single Unit.

Figure 1: BER Test Data Table for a Single Unit. There are 18 measurements for each unit.

I am currently working as a project manager on a serial data channel qualification task. During this work, I need to estimate the time required to perform dozens of Bit Error Rate (BER) measurements under different scenarios (see Figure 1). In these measurements, we are working to ensure that our BER is always 1E-10 or less. I have performed this calculation many times and have derived all the associated formulas here. BER is a statistical parameter and its measurement is a function of the confidence level you want in your answer.  In this case, we want the highest confidence level possible consistent with a two-week test effort.

The calculation requires solving Equation 1 for n, the number of bits required, to confirm a maximum BER p at a specific confidence level CL and error number N.

Eq. 1 -n\cdot p=\ln \left( 1-CL \right)-\ln \left( \sum\limits_{k=0}^{N}{\frac{{{\left( np \right)}^{k}}}{k!}} \right)

To convert the number of bits transferred to a test time τ, we need to evaluate Equation 2 with the bit transfer rate, r.

Eq. 2 \displaystyle \tau =\frac{n}{r}

I normally perform this calculation using Mathcad, but in this case, other engineers wanted it written in Excel because they are familiar with that tool. The calculation is a bit messier in Excel because:

  • We need to come up with an iterative solution for Equation 1 because it does not have a closed-form solution. In Excel, I used the Goal Seek function with a macro.
  • Evaluating summations are a bit messy in Excel. In this case, I decided to use an array formula, which allowed the formula to be evaluated in a single cell.
  • You must configure the Excel spreadsheet to use iteration and reduce that maximum change allowed. You adjust these values in Excel by going to File/Options/Formula and checking the following boxes.

    Figure 2: Excel Iteration Settings.

    Figure 2: Excel Iteration Settings.

Figure 3 is a screenshot of the workbook's test time calculation.  The worksheet is easy to use – just input your p, N, CL, and rate values (highlighted in yellow), then press iterate. This calculation tells me how long a single BER measurement will take. Remember that I need to perform dozens of these measurements, so even small reductions in the single-test time make a big difference to the overall test time.

Figure 3: Screenshot of My BER Workbook.

Figure 3: Screenshot of My BER Workbook.

For those interested, my Excel workbook is here. It includes a very simple macro (shown below) that calls the Goal Seek function for each confidence level and allowed error number. The macro also changes the iteration setting.

Posted in Electronics, Excel | Leave a comment

Switch-Controlled Circuit To Isolate Battery from Load


Quote of the Day

Concentration is my motto. First honesty, then industry, then concentration.

— Andrew Carnegie

Figure 1: Block Diagram Showing Role of Battery Isolator.

Figure 1: Block Diagram Showing Role of Battery Isolator.

I recently was asked if it was possible to design a circuit that will isolate a battery from a circuit until a momentary switch is closed (Figure 1). Once the momentary switch is closed, the battery is connected to the rest of the circuit and it cannot be disconnected by further mechanical switch closures. This was just a proof-of-feasibility exercise and NOT a final implementation. However, it was a good example of how to use LTSpice to verify the first design concept and is worthwhile documenting here.

The circuit is a simple MOSFET/JFET latch combo. It has to meet a few requirements:

  • Very low DC load on the battery when in isolation mode.
  • Capable of latching with a charged battery (~3.5 V)
  • Stay latched down to ~2.0 V.

Figure 2 shows the basic circuit. The switch S2 represents the momentary switch. MOSFETs M1 and M2 form the latch.

Figure 1: LTSpice Schematic of Battery Isolator Circuit.

Figure 1: LTSpice Schematic of Battery Isolator Circuit.

Figure 3 shows my LTSpice simulation.

Figure 3: Simulation of the Battery Isolator Shown in Figure 2.

Figure 3: Simulation of the Battery Isolator Shown in Figure 2.

The simulation shows that the circuit works for this simple test case. Much more checking is needed. This is enough for now – I have shown that a circuit like this is possible. For those who are interested, my LTSpice schematic is available here.

Many thanks to Kurt Raichle for lending his superb analog expertise to this exercise.

Posted in Electronics | Leave a comment

Computing Dates of Fathers Day with Excel


Quote of the Day

I think a man only needs one thing in life. He just needs someone to love. If you can't give him that, then give him something to hope for. And if you can't give him that, just give him something to do.

— Great quote from the movie 'Flight of the Phoenix.'


Table 1: Years where Fathers Day and My Anniversary Coincide.

Table 1: Years where Fathers Day and My Anniversary Coincide.

My wedding anniversary (16-June) and Fathers Day are on the same weekend this year. Next year, Fathers Day and my anniversary are on the same day. I became curious about (a) how to compute the date of Fathers Day for each year, and (b) determining the years when Fathers Day and my anniversary occur on the same date. Since the calculation is simple and table-oriented, I will use Excel for this project. Table 1 shows the years with Fathers Days that coincide with my anniversary. I generated this table by filtering the comprehensive list for all Fathers Days on the 16th. Figure 2 shows how I derived my Excel formula. My approach was to look at the latest possible Fathers Day and then determine a correction based on the day of the week that the latest possible Fathers Day falls on. Table 3 is the comprehensive list that shows the dates for every Father's day for the next 100 years.

For those who are interested, my Excel Workbook can be found here.

YearFather's Day
Figure 2: Rationale Behind Excel Formula. Table 2: Fathers Day for the Next 100 years.
Posted in Excel, Math Education | Leave a comment

Air Conditioning Load of a Group of People


Quote of the Day

Concentration is my motto. First honesty, then industry, then concentration.

Andrew Carnegie

Figure 1: HVAC Load for 17 People.

Figure 1: HVAC Load for 17 People.

I was reading an article about HVAC (Heating, Ventilation, and Air Conditioning) calculations in the Journal of Light Construction that had a quote I found interesting. It said that

… 17 extra occupants added more than a half ton of cooling load.

Figure 1 shows how to prove this statement assuming that the average person burns 2000 kilocalories a day, which is probably a low number. This problem is all about unit conversion: kilocalories, BTUs, tons of cooling, etc. You might wonder why an electrical engineer would worry about these sorts of things. Unfortunately, I frequently have to specify the cooling capacity required for the electronics that I put into rooms.

Posted in Construction, General Mathematics, General Science | Leave a comment

Accelerometer Power Calculation Example


Quote of the Day

I planned on having one husband and seven children, but it turned out the other way around.

Lana Turner

Figure 1: Monolithic Accelerometer from Bosch. (Source)

Figure 1: Monolithic Accelerometer from Bosch. (Source)

I am working on a product that uses a Bosch BMA253 accelerometer as a motion sensor. This family of products has become a defacto standard for inexpensive motion detection. In this post, I will provide a simple power calculation example along with some empirical data that Bosch provided me. Normally, I would not consider a simple power calculation worth writing about, but the datasheet did not provide a worked example. I also provide an Excel workbook that parameterizes the critical variables.

Here are the critical formulas for low power mode 1. For low power mode 2, the formulas are the same with the 1 subscript changed to 2.

Eq. 1 \displaystyle {{t}_{{active}}}=\left| \begin{array}{l}{{t}_{{ut}}}+{{t}_{{w,up1}}}-0.9\text{ ms, }bw\ge 31.25\text{ Hz}\\4\cdot {{t}_{{ut}}}+{{t}_{{w,up1}}}-0.9\text{ ms, otherwise}\end{array} \right.
Eq. 2 \displaystyle {{I}_{{DDlp1}}}=\frac{{{{t}_{{sleep}}}\cdot {{I}_{{DDsum}}}+{{t}_{{active}}}\cdot {{I}_{{DD}}}}}{{{{t}_{{sleep}}}+{{t}_{{active}}}}}
Eq. 3 \displaystyle {{t}_{{ut}}}=\frac{1}{{2\cdot bw}}
Eq. 4 \displaystyle {{f}_{{Output}}}=\frac{1}{{{{t}_{{active}}}+{{t}_{{sleep}}}}}


  • bw is the bandwidth setting of the digital filter.
  • tut is the updated time for the filter output data.
  • tw,up1 is the data acquisition stabilization time.
  • tsleep is the time during which the BMA253 is put in suspend mode ("sleep").
  • tactive is the time during which the BMA253 is stabilizing and acquiring data.
  • fOutput is the output data rate from the digital filter.
  • IDD is the current draw of the BMA253 when it is performing data acquisition ("active").
  • IDDsum is the current draw of the BMA253 when it is in suspend mode ("sleeping").

Figure 2 illustrates the data acquisition timing.

Figure 2: Data Acquisition Timing.

Figure 2: Data Acquisition Timing.

I can use Equations 1-4 to generate Table 1, which assumes that

  • bw= 1000 Hz
  • tw,up1 = 1.3 ms
  • IDD = 130 μA
  • IDDsum = 2.1 μA
Table 1: Estimated and Measured BMA253 Current Draws.

Table 1: Estimated and Measured BMA253 Current Draws.

What I liked about the data I received from Bosch is that it allows me to determine the accuracy of the current draw formula relative to the measured data.

Posted in Electronics | Leave a comment

State Casualty Rates During the Vietnam War


Quote of the Day

It is foolish and wrong to mourn the men who died. Rather we should thank God such men lived.

— George S. Patton


Figure 1: Ten States with the Highest Military Death Rates During the Vietnam War.

Figure 1: Ten States with the Highest Military Death Rates During the Vietnam War. The population basis was 1975.

My first engineering manager was named Marl Godfrey. He was an excellent manager who also had keen insights into the human condition. These insights made quite an impression on my 22-year old self – I actually kept a notebook of his comments. Some of his most insightful comments were about the US military and the Vietnam War. Marl had grown up in Oklahoma and he had served in Vietnam. He once commented that Oklahoma had very aggressive draft boards, which resulted in Oklahoma having a relatively high death rate during the conflict. I was reminded of this statement when I recently reviewed my quote database. I thought that I should be able to determine how death rates varied by state during the Vietnam War, which is the subject of this post.

Figure 1 shows my results. I determined the rate based on the state populations during 1975, which was the year the war ended. Marl Godfrey was correct – Oklahoma had a very high death rate during the Vietnam War. I should mention that the rate ranking varies depending on the year from which the state population data was taken.

Figure 2 shows a graphic with all the state data plus Washington DC and the territories of Puerto Rico, Guam, and American Samoa. The range of rates is startling – Guam, in particular, paid a heavy price.

Figure 2: US Military Death Rates During Vietnam.

Figure 2: US Military Death Rates By State During Vietnam.

The US Army bore the brunt of the US deaths during the Vietnam War  (Figure 3).

Figure 3: US Military Death Percentages by Service.

Figure 3: US Military Death Percentages by Service.

For those who are interested, my Excel workbook is located here. The raw data came from this website.

Posted in History Through Spreadsheets | Leave a comment