## US Naval Losses in WW2

Quote of the Day

— Tim Wu

Figure 1: US Naval Losses in WW2. (Source)

I have been putting together some information on US naval actions during WW2. Specifically, I wanted to look at US Naval losses by year during WW2 in order to get a feel for the change in battle tempo over time. The Wikipedia has an excellent page on all the US naval losses during WW2, so I simply downloaded this page, cleaned it, up and generated an Excel pivot table (Figure 1). The breakdown by combatant type is my own, everything else is from the Wikipedia.

To help illustrate the loss rate by year, I added sparklines on the right-hand side of the table. The sparklines do make it clear that the pace of battle was high during 1942 and 1944. It is difficult to assess the tempo of naval losses in 1945 from this chart because (1) 1945 was a partial year of WW2, and (2) the Imperial Japanese Navy was a shell of its former self during most of 1945.

There is nothing particularly special about this data or my analysis, but I thought I would make the spreadsheet available to others in case they were interested in looking at WW2 data – there are a fair number of discussion group on this topic. Having the data in a spreadsheet makes it easy to work with.

## Word of the Day: Omphaloskepsis

Quote of the Day

Mastery lives quietly atop a mountain of mistakes.

— Eric Greitens, in Resilience. This statement could not be more true. It is similar in spirit to a quote by the great physicist Niels Bohr, "An expert is a man who has made all the mistakes which can be made in a very narrow field."

Figure 1: Four People Contemplating Their Navels. (Source)

I was sitting in a management meeting today that seemed to be rather unproductive. It ended up in a philosophical discussion that did not go anywhere. I commented that we seemed to be engaging in omphaloskepsis, which is the name for the ancient Greek practice of contemplating one's navel (Figure 1). I first heard this word at Orbital ATK, where it was used to describe some of the meetings there.

No one else in today's meeting had heard the word before, but we all agreed that it bore some relationship to what occurred during our meeting. I thought I would share this word with you – some of you may sit in meetings like this.

Save

Posted in Management | 2 Comments

## Television Analogies to Working for a Startup

Quote of the Day

All governments suffer a recurring problem: Power attracts pathological personalities. It is not that power corrupts, but it is magnetic to the corruptible.

— Frank Herbert, Dune

Figure 1: Star Trek is analogous to a startup in the beginning. (Source)

I was talking to an old friend the other night about the positives and negatives associated with working for a startup company. Overall, we both enjoyed working with startups enormously, and I would seriously consider joining another. However, both of us understand the special challenges that startups face.

My friend compared being part of a startup to various well-known television shows, and I would mention his thoughts here.

With a big smile on his face, he says that most startups go through three phases:

• Beginning: Star Trek

We are starting on a multi-year mission to boldly go where no one has gone before. Everyone is excited, you have a plan, and the sky is the limit.

• Middle: Survivor

Steven Blank defines a startup as

a temporary organization searching for a repeatable and scalable business model.

This search for a business model often involves people with strong opinions who want to influence the direction of the company. These folks frequently try to form alliances within the organization – just like Survivor. This is often a stressful time.

• End: ER

At some point, most startups start looking for a buyer/white knight/angel investor because they are having money problems. I have been there – there is nothing quite like having paying customers and no working capital for building product. There were days when you feel like you are in an ER and need someone to apply the paddles.

I had to laugh because his observations struck close to home.

Save

Save

## Graphic Depicting the Need for Succession Planning

Quote of the Day

If men are to be precluded from offering their sentiments ... the freedom of speech may be taken away, and dumb and silent we may be led, like sheep to the slaughter.

— George Washington

## Introduction

Figure 1: NASA Workforce Age. (Source)

I recently had an employee retire in my group that caused me to look at the age distribution within our entire HW organization. After seeing the age of our engineering staff, I made a proposal to our management team for ensuring that the skills of our senior staff members were being transferred over time to our junior staff members. This post shows how I presented the age information to internal management. The presentation was successful, and I thought it would be useful to show here.

The data presented changes the names of the managers, and the data itself has been altered for privacy reasons. However, the overall message of the data is the same – nearly half of our engineers are 55 year old or older. These are also our most skilled employees. We need to begin working on transferring their skills to more junior staff. We are not the only companies faced with the graying of their workforce – NASA has been working on the problem for a number of years (Figure 1).

## Background

### Constraints

At my company, I am not allowed to know the age of individual employees. However, I can request data on the individual ages within Hardware Engineering without names attached. I requested this data and was able to generate a plot and produce some useful tables, which I show below. Note that I have modified the names of managers and altered the exact age data, but kept the overall message the same.

I arbitrarily chose 55 years as the age at which we need to beginning considering succession planning for an employee.

### Tools

I analyzed the data using Rstudio. The actual tables were generated using Excel, because I like the look of Excel pivot tables. The raw source files are included here.

## Analysis

### Employee Age Distribution

Figure 2 was the chart that generated the most discussion. It showed just how many engineers that we have who are 55 or over. Note that some people are the same age, and I used ggplot2's jitter feature to show people of the same age by adjacent dots.

Figure 2: Employee Age Distribution.
I jittered the dots both horizontally and vertically to show employees of the same age.

### Employee Age Percentages

After showing the chart above, I then presented Figure 3, which shows a table of the percentages of our employee ages aggregated by manager and age group (less than 55, and 55 and over).

Figure 3: Pivot Tables Showing Employee Age Relative to 55 years.

## Conclusion

With nearly half of our hardware engineers with an age of 55 or over, we have quite a bit of work ahead of us.

Posted in Management | 2 Comments

## Optical Fiber Attenuation Specifications

Quote of the Day

Son, this is a Washington, D.C. kind of lie. It's when the other person knows you're lying, and also knows you know he knows.

## Introduction

Figure 1: Plot of Fiber Attenuation Using Different Approaches.

I needed to estimate the loss on a fiber network today – something that I have done hundreds of times before. However, today was a bit different because I decided to look at how sensitive my results were to my assumptions on when the fiber was deployed. I was a bit surprised to see how much fiber has improved with respect to losses due to contamination by OH molecules, a problem often referred to as the water peak.

This post graphs fiber loss data (Figure 1) based on:

• Corning SMF-28e fiber specification, a modern G.652-compliant fiber. This fiber has been around since the early 2000's.
• empirical data from 1990 fiber deployments using G.652-compliant fiber
• empirical data from  2000 fiber using G.652-compliant fiber
• empirical data from 2003 fiber using G.652-compliant fiber
• a common equation-based model.

I thought the results were interesting and worth sharing here (Figure 1).

## Background

### Definitions

attenuation (aka loss)
Attenuation in fiber optics, also known as transmission loss, is the reduction in intensity of the light beam (or signal) with respect to distance traveled through a transmission medium.  (Source)
attenuation coefficient
Optical power propagating in a fiber decreases exponentially with distance by the formula $P\left( z \right)={{P}_{o}}\cdot {{e}^{{-\alpha \cdot z}}}$, where P(z) is the optical power at distance z, P0 is the launch power, and α is the attenuation coefficient. We normally express the attenuation coefficient in terms of dB/km, which allows us to compute system losses using simple addition.
water peak
A peak in attenuation in optical fibers caused by contamination from hydroxyl (OH) ions that are residuals of the manufacturing process. Water peak causes wavelength attenuation and pulse dispersion in the region of 1383 nm.  (Source)

### Loss Modeling

Fiber optics losses are modeled by assuming a fraction of the light power is lost through each component. These losses are expressed in terms of dB. For example:

When expressed in dB, the losses can be added to provide a total loss. For more modeling information and an example, see this page.

## Analysis

### Corning Loss Model

I normally use the Corning loss model because virtually all my customers use Corning SMF-28e fiber. To assist customers with estimating fiber loss per km, Corning provides a spreadsheet (it contains a macro) with a simple model that uses the loss at a long wavelength, short wavelength, and at the water peak. For normal work, I use the values shown in Figure 2, which is from the SMF-28e specification sheet.

Figure 2: Typical Loss Values for SMF-28 Fiber.

I should mention that the raw fiber attenuation is slightly lower than the loss after it is been put into a cable. This is because the act of cabling tends to add microstresses (also known as microbends) to the fiber that increase its attenuation. At my company, we assume that the cabling penalty is ~0.05 dB/km.

### ITU-T G.652 Compliant Cable

The ITU has published a standard for optical fiber called ITU-T G.652. They have supplemented this standard with a document that contains measured data for cabled fiber. This data is interesting because it provides information I have never seen elsewhere:

• fiber loss per km plus the standard deviation of the loss (i.e. variation across fiber segments). I often need to estimate the "worst-case" fiber loss and the standard deviation allows me to use the RSS method.
• fiber loss per km for fiber installed during different years (1990, 2000, 2003). The fiber loss per km is different between the three years, particularly at the water peak. This data shows that fiber is greatly improved with respect to the water peak.

I should mention that G.652 sets the minimum standard for fiber. Manufacturers often compete by having a better attenuation coefficient, ability to handle more power before the onset of nonlinearities (e.g. SMF-28e+), zero water peak, or supporting a tighter bend radius (e.g. bend insensitive fiber).

### Equation-Based Loss Model

I occasionally see people model fiber loss (example) using Equation 1, which ignores the water peak . Equation 1 assumes that the attenuation versus wavelength is entirely due to Rayleigh scattering, which is accurate if you ignore the water peak.

 Eq. 1 $\displaystyle \alpha \left( \lambda \right)=\frac{{{{R}_{{{{\lambda }_{0}}}}}}}{{{{C}_{{{{\lambda }_{0}}}}}}}\cdot \left( {{{{\left( {\frac{1}{{9.4\cdot {{{10}}^{{-4}}}\cdot \lambda }}} \right)}}^{4}}+1.05} \right)$

where

• Rλ0 is the attenuation factor at the reference wavelength λ0.
• Cλ0 is a constant that varies with the reference wavelength λ0.
• λ is the wavelength at which I want to compute the attenuation factor.
• α is the attenuation coefficient at λ.

## Conclusion

Fiber deployments generally avoid wavelengths near the water peak of 1383 nm  because of the excess loss there.  I thought it was interesting to see how the water peak has changed so much over the last 17 years. Note that Corning's specifications show its water peak is larger than was measured on the empirical data from the 2003 deployment. I assume this is because they measured some fiber networks with zero water peaks, which would drop the average.

The Equation 1 model works well as long as you are far from the water peak. I will be using Equation 1 for my simple modeling tasks because:

• None of my wavelengths are near the water peak.
• Equation 1 is easy to evaluate in Excel, and all of my customers have Excel.
• It is quite accurate for wavelengths far from the water peak.

Save

## Greek Mythology is Relevant to Engineering Management

Quote of the Day

Curiosity is, in great and generous minds, the first passion and the last.

— Samuel Johnson

Figure 1: Artist's Imagining of Cassandra. (Source)

The older I get, the more I see the relevance of the classics to modern life. As a boy, I read a children's version of Aesop's fables, which I loved and are still relevant to daily life. Later in school, I read about Greek mythology from a book called Mythology: Timeless Tales of Gods and Heroes by Edith Hamilton. I still have a personal copy of this book that I refer to occasionally. It may seem odd, but the more time I spend in engineering management, the more relevant these myths seem to become.  The last two weeks I have mentioned two Greek myths several times – the tales of  Cassandra and Sisyphus. They seem particularly appropriate to modern management.

Cassandra (Figure 1)  was a princess who was given the power of prophesy by Apollo, but when she spurned his advances, he inflicted a curse upon her where no one would believe her. I frequently have engineers tell me that they had warned someone about some hazard, but their warning went unheeded and the worst occurred. I often refer to these engineers as "Cassandras." All I can tell these frustrated souls is that their obligation is to warn their coworker, but that ultimately their coworker owns their decisions. The most irritating response I have received after warning someone about a risk that was realized is that I should have been more vehement in stopping them. I can only do so much …

Figure 2: Artist's Imagining of Sisyphus. (Source)

The other Greek myth that comes up often is that of Sisyphus (Figure 2), who was a very clever king who was cursed by Zeus for his cleverness by making him endlessly roll a huge boulder up a steep hill. Just as the boulder was to reach the top of the hill, it would somehow find a way to roll all the way down to the bottom of the hill, and Sisyphus would be forced to repeat his labor. Sisyphus has come to be a metaphor for any pointless or interminable activities that goes on forever. Unfortunately, many engineering projects have a phase where they seem interminable.

I can illustrate this point by recalling a large program at HP that had the code name "Touchstone," a metaphor for a product that will set a new standard for the industry. After it had gone on for a couple of years, engineers started to call it "Millstone," a reference to a bible verse about a man thrown in the water with a millstone around his neck (Luke 17-2). Another year later, they were calling the program "Tombstone," recalling images of death. This is just how some programs go.

Save

Posted in Management | 2 Comments

## An Example of Cleaning Untidy Data with Tidyr

Quote of the Day

I wish to do something great and wonderful, but I must start by doing the little things like they were great and wonderful.

— Albert Einstein

## Introduction

Figure 1: P-61 Black Widow, the Most Expensive Army Air Force Fighter of WW2. (Source)

I recently decided to take some classes in data analysis at Datacamp, an online training site. My first classes were in dplyr and tidyr – two excellent R-based tools for manipulating files that are not amenable to analysis because of inconsistencies and structure: tidyr provides many tools for cleaning up messy data, dplyr provides many tools for restructuring data. After completing the two classes, I decided that I needed to firm up my knowledge of these tools by applying them to a concrete example.

The gnarliest data that I know of is hosted on the Hyperwar web site. The data seems to be scanned and OCRed WW2 records that were translated into HTML. While a fantastic resource for history aficionados, these records are painful to analyze using software because of:

• Numerous OCR errors
The original source is often difficult to figure out.
• Inconsistent expression of zeros
I see blanks, zeros, periods, dashes.
• Mixing data and totals in seemingly random ways
• Arithmetic errors (e.g. incorrect totals)
I often think of some poor secretary/yeoman typing numbers all day and then having to use an adding machine to complete a total. It would have been miserable work.
• Typing errors
For example, a cargo ship being listed as 70,006 tons during WW2 was most likely 7,006 tons. At that time, cargo ship were typically in the 10,000 ton range. This error actually occurred. I was able to confirm the error because the Wikipedia has a remarkable amount of data on WW2 ships, and it listed 7,006 tons for this ship.
• Unbelievably complex headings (literally three and four levels)

This is not a complaint about the Hyperwar site itself. This is just a statement about the format and quality of WW2 records that were written by humans typing numbers at keyboards. For this post, I decided to download  data on the cost of WW2 Army Air Force combat aircraft and clean it up for graphing. All of the heavy lifting is done in RStudio – only the Appendix is done in Excel because I like how it formats tables.

## Background

### Problem Statement

I have read quite a bit about air operations during WW2, and I often have read that the P-47 and P-38 were very expensive fighters compared to the P-51. One reason given for  the P-51 replacing the P-47 and P-38 in many applications was its cheaper unit cost. I decided that I would research the cost of these aircraft as a motivating application for practicing with tidyr and dplyr.

### Definitions

tidy data
Tidy data is a term coined by Hadley Wickham (creator of tidyr and dplyr) to describe data that is formatted with variables in columns and observations in rows with no blank rows or columns. Missing data is clearly marked in a consistent matter, as are zeros. If you are a database person, you will see hints of his concepts in Codd's Rules for relational databases.
untidy data
Untidy data is data that is not tidy – I hate definitions like this, but it works here. There are some common problems:

• Values of variables used for column headings (e.g. column headings containing years – years should be a column with specific year values stored for each row).
• Data that should be in multiple tables combined into one table. (e.g. describing a person's pets means multiple table rows for people with more than one pet – the pet details should be in a separate table.)
• Aggregate data (e.g. totals) in the rows.
• Blank rows or columns.
• Single observation scattered across multiple rows.

## Analysis

All costs are expressed in WW2 dollars. The point of this exercise is to provide a worked Rmarkdown-based example of cleaning an untidy data file. Converting WW2 dollars to modern dollars is fraught with issues, which I leave to others.

### Rmarkdown File

The Rmarkdown file is a bit long, so I include a PDF of it here as a link. For the complete source, see this zip file. The zip file contains the raw Hyperwar tab-separated file, a cleaned-up .csv, Rmarkdown source, and a PDF of the processed Rmarkdown document.

### Graphs

I am not interested in plotting all the data. There are just a few WW2 aircraft in which I am interested. I plotted the costs of my favorite WW2 fighters and bombers versus time. Note that some of the costs dropped dramatically over time. This is true for any product (example).

#### Fighter Plane Costs Versus Time

Figure 2 show how the unit costs changes with time for my favorite WW2 fighter planes. Note that the P-38 and P-47 are quite expensive relative to the P-51. The most expensive US fighter in WW2 was the P-61 night fighter (Figure 1).

Figure 2: WW2 Army Air Force Fighter Plane Costs Versus Time.

#### Bomber Plane Costs Versus Time

Figure 3 show how the costs changes with time for my favorite WW2 bombers. Notice how the B-29 was ~2.6x more expensive that the B-17. However, only the B-29 had the range needed to bomb Japan from the Marianas Islands.

Figure 3: WW2 Army Air Force Bomber Costs in WW2.

#### Cleaned-Up CSV File

The Rmarkdown file outputs a cleaned-up version of the data as a .csv file. I include that file in with my source. You can see a tabular version of the data in Appendix A.

## Conclusion

I work in corporate America, where Excel is the standard for processing data. While Excel has some strengths, it is not the most powerful data analysis tool available. RStudio provides a wonderful data analysis and presentation environment. This post provides a fully worked example of how to use RStudio with tidyr and dplyr to tidy-up some very messy data.

## Appendix A: Hyperwar Aircraft Cost Table Reconstruction.

Figure 4 shows how my tidy data version of the cost data can easily be converted back into the original form.

Figure 4: Import of Cleaned Table Into Excel for Table Presentation.

Save

Save

Save

Save

Save

Save

Save

## Quick Look at Large US Dams

Quote of the Day

Amateurs practice until they get it right. Professionals practice until they can’t get it wrong.

— Special Operations Credo. I hear doctors say something similar. It is true in Engineering and Software as well. For a professional, it is about far more than getting things right – that is a given. It is about having personal processes that reduce the possibility of mistakes and ensure that you can handle any contingency that might arise.

## Introduction

Figure 1: Oroville Dam. (Wikipedia)

I was on the phone this morning with a coworker who lives in California, about 150 miles south of the Oroville dam (Figure 1). This dam has recently been in the news because of concerns that spillway erosion could cause a dam failure. At one point, nearly 200K people were evacuated from the potential flood zone.

My coworker was quite familiar with the situation in Oroville, and he mentioned that warnings had been given in years past that this situation could occur. People are now asking how could something like this occur. I will leave that for the politicians to explain.

I was surprised to hear that the Oroville dam is the tallest dam in the US – I guess I remember hearing as a child that Hoover Dam was the tallest – it is now second tallest after Oroville. This got me curious as to what are the tallest dams in the US and where they are located. The Wikipedia has an excellent list of the 86 tallest dams in the US. I used Power Query to grab the list and pivot tables to examine the data. For those interested, my source is here.

Figure 2 shows the top 10 tallest dams in the US and their locations. Height is expressed in feet.

Figure 2: Ten Tallest Dams in the US. Height in ft.

I also was curious about when most of these 86 dams were built, which I show in Figure 3. It looks like the 1960s was a big decade for dam building.

Figure 3: Decades When Tallest Dams in US were Built.

I also looked at where the dams were built (Figure 4). By region, the Pacific Contiguous (coast) and Mountain West regions had the largest number of dams by far. I used the US regions as defined by the Department of Energy.

Figure 4: Dam Locations By US Region.

Figure 5 shows that California has the largest number of these tall dams.

Figure 5: Dams By State.

Save

Save

Save

Save

Save

Save

Save

Save

Posted in Excel, History of Science and Technology | 3 Comments

## Daily Tree Consumption for Toilet Paper

Quote of the Day

When the satisfaction or the security of another person becomes as significant to one as one's own satisfaction or security, then the state of love exists. Under no other circumstances is a state of love present, regardless of the popular usage of the term.

— Harry Stack Sullivan

## Introduction

Figure 1: Typical Roll of Toilet Paper. (Source)

I was reading an article on National Geographic when I spotted an interesting factoid about the impact of Toilet Paper (TP) world-wide tree consumption.

Toilet paper wipes out 27,000 trees a day.

Like many factoids, I doubt there is a way to actually measure this number – it can only be estimated. Thus, it is a prime candidate for a Fermi solution.

I see factoids like this all the time. My favorite factoid in the fiber optic business is that 99% of the transoceanic Internet traffic is carried by submarine cables. If you ask around, no one can tell you how the remaining 1% is carried – 1% of total transoceanic bandwidth is a lot of bandwidth for non-fiber transports (e.g. Iridium, TDRS). I heard a submarine cable expert say that 99.999% is probably closer to the true value, but people hedge their numbers by saying 99%. A better answer is that virtually 100% of transoceanic Internet traffic is carried by submarine cables, and the tiny amount not carried by submarine cables is so small that no one knows what it is. An example of a place requiring  transoceanic data service and that has no transoceanic fiber access is Antarctica.

## Background

### General References

The following links provided me some good background for the analysis that follows.

• National Geographic article mentioning the factoid (Link).
• Blog post on toilet paper rolls per tree (Link).
• Typical toilet paper measurements (Link).
• Wikipedia on tree pulp (Link).
• General toilet paper info (Link).

### Some Tree Statistics

The journal Nature reports that:

• The world is home to more than 3 trillion trees.
• People cut down 15 billion per year.
• The number of trees has declined by 46% since the beginning of human civilization.

### Average Mass of Harvested Tree

The mass of the average tree harvested for pulp can be estimated using Equation 1, which is an empirical formula developed by the US Forest Service. This formula gives us the typical mass of a tree based on it diameter. The parameters are species-dependent. For this exercise, I assumed the trees are aspens, which are commonly used for pulp where I live. The specific parameters (β0, β1) are given in Appendix A.

 Eq. 1 $\displaystyle {{m}_{{Tree}}}(d)={{e}^{{{{\beta }_{0}}+{{\beta }_{1}}\cdot \text{ln}\left( d \right)}}}$

where

• mTree is the mass of the tree [kg].
• d is the diameter of the tree measured at breast height [cm]. This parameter is often referred to as "d.b.h."

The US Forest Service has a number of other mathematical models for tree mass versus diameter. I chose this one because it was easy to code.

## Analysis

Figure 2 shows my analysis. I included many comments in-line, so I will not go through the details in my introductory text. For those who want to view my source, I include it here.

Figure 2: Estimates of Trees Consumed By Toilet Paper Usage.

## Conclusion

I can see where the 27K number is plausible. The actual number of trees cut for use in toilet paper is probably unknowable and can only be estimated. Unfortunately, the analysis is sensitive to parameters that are highly variable:

• percentage of people that use TP.
• amount of TP used per person.
• diameter of trees harvested for TP.

I suspect that the 27K trees per day number is probably low. Even with the uncertainty involved, it is an interesting number because it shows the environmental impact  of a small item can be substantial if enough people use it.

## Appendix A: Formula for Tree Mass vs Diameter.

Figure 3 shows the formula that I used to estimate the mass of a tree based on its diameter.

Figure 3: US Forest Service Formula for Tree Mass vs Diameter. (Source)

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Posted in General Mathematics, General Science | 4 Comments

## Fact Checking: US Murder Rate Over Time

Quote of the Day

I was planning to vote anarchist, but they don't seem to have any candidates.

— Michael Rivero

Figure 1: US Murder Rate Versus Time. The units are murders per 100K population. (Data Source)

I recently heard a politician claim that the US murder rate is the "highest it's been in 47 years." This is an easy fact to check and provided me another Power Query example to provide for my staff.

This statement is not correct. The statistics have not been released yet for 2016, but I can easily look at the data from 1960 to 2015. The actual 2015 rate is the ninth lowest in the 46 years period from 1960 through 2015. In fact, 2013 and 2014 had the lowest murder rates during that time interval. You have to go back to the 1950s to find significantly lower murder rates than we are experiencing now – this data is included in the spreadsheet attached below.

Here is the annual murder rate rank computed on my spreadsheet.

Figure 2: Years of Lowest Murder Rates.

For those who want to follow my work, here is my source.

Save

Posted in Excel, Fact Checking | 2 Comments