The Case of the Mysterious Bit Errors

Introduction

I am the hardware director for a team of optical networking professionals. More often than I care to admit, we need to embark on a detective case in order to resolve a customer problem. This blog describes a situation that occurred recently where a little math shed some light on the problem.

The Problem

We recently had reports from the field of bit errors occurring in the mornings and evenings at a newly deployed site in Mexico. Our customers use bit error rate as a measure of the quality of service they are receiving. In this case, our bit error rate should have been zero. The bit errors were disturbingly like some reports we had received during the previous spring from a deployment in North Dakota. The bit errors in North Dakota vanished before we were unable to find their root cause. We have hundreds of sites that use our hardware without any bit errors occurring at all — What was common between two sites in Mexico and North Dakota that would cause bit errors in our system?

Clues

As we asked our field engineers questions, some clues emerged:

  • There are two types of fiber deployment: aerial and buried. Each of the problem sites involved aerial fiber deployments. In an aerial deployment, the fiber is hung from poles instead of being buried in the ground. We have large numbers of both types of deployments.
  • The North Dakota, the bit errors had only occurred in the spring, while the bit errors in Mexico were occurring in the summer.
  • In both cases, the errors occurred at sunrise and in late afternoon.
  • The fiber runs were long — 11 km or longer. While our system is certified to 60 km, over 80% of our deployments have ranges of 5 km or less.
  • The bit errors were not randomly placed. They occurred in specific areas at the end of long data sequences.

Formation of Hypotheses

Here is where problem-solving experience is useful. We start with some inferences based on our experience.

  • We know that the maximum temperature changes in an aerial deployment occur in the morning at first exposure to sunlight and late in the afternoon.
  • Our experience is that both North Dakota and Mexico experience fast changes in temperature. While their temperatures are different, both regions have similar rates of temperature change.
  • Our experience has been that bit errors at the end of long data sequences almost always meant that there was a timing problem.

Could we be experiencing a temperature-dependent timing problem? Our system depends on precise timing for reliable data transmission. We adjust for timing changes periodically, but between adjustments the system timing (i.e. arrival time of photons at a receiver) must remain constant within ±19.2 nsec for reliable data transmission.

Could rapid temperature changes be causing the transmission time of our signal be varying on the fiber enough to cause bit errors? We do periodically check the system timing to adjust for slow-moving timing changes. However, what if a change in timing occurs so quickly that our current approach cannot adjust fast enough? That could cause this problem.

Testing Our Hypothesis

We used a two-prong approach to test our hypothesis:

  • Insert a change into our programmable hardware (i.e. Field Programmable Gate Arrays [FPGA]) that would allow our system in Mexico to work with larger delay variation. If increasing our tolerance for delay variation reduced or eliminated the bit errors, we would know that timing was at the heart of the problem.
  • Begin a detailed look at the effect of temperature on the signal delay variation with temperature on fiber. This would tell us the amount of timing variation that the system must be designed to handle. During our early development, we had made assumptions about how fast the temperature could vary. Maybe the temperature was varying faster than we assumed?

When we made the change to our system in Mexico, the bit errors went away. This gave us empirical evidence that timing was the culprit. Now we needed to determine if the signal delay on the fiber was varying enough to cause a problem. Ideally, we would have many kilometers of fiber in our lab that we could use to make direct measurements of the signal delay variation with temperature. Unfortunately, we had neither the fiber nor the required temperature chambers to make these measurements. So we decided to try to alternative approaches: web searching for published fiber delay versus temperature data and mathematical modeling.

The web searching provided some data. Specifically, one reference from the power industry [1] stated that they had measured \left( {~80 \pm 20} \right)\frac{{{\text{ps}}}}{{{\text{km}} \cdot ^\circ {\text{C}}}} . However, they were using cables that were constructed very differently from hours. The power industry data was for massive cables that had large steel reinforcing members. My concern was that the fiber transmission speed is a function of the tension on the fiber and I am quite certain the tension on our cables is different than theirs.

Maybe some mathematical modeling could provide some insight into the critical variables? We proceeded as shown below.

How Does Signal Speed Vary on a Fiber with Temperature?

We begin by looking at what determines the signal transmission time on a fiber. The transmission time is simply distance divided by the speed of light on the cable.

\tau = \frac{L_{Fiber}}{c_{Fiber} }

Where \tau is the signal transmission time on the fiber, L_{Fiber} is the distance between the light transmitter (a laser) and the light receiver, and c_{Fiber} is the speed of light on the fiber.

The speed of light on the fiber is given by the following expression.

c_{Fiber} = \frac{c}{n}

Where c is the speed of light in a vacuum and n is the fiber's index of refraction. It turns out that n is a complicated term — it is a function of both temperature (T) and mechanical stress (S).

We can combine these expressions as follows.

\tau = \frac{L_{Fiber} \cdot n(T,S)}{c_{Fiber} }

Now we can apply a little calculus to determine how this delay varies with temperature. We can take the derivative of this expression with respect to temperature T and obtain the following result (note that c is a constant).

c_{Fiber} \cdot \frac{{d\tau }}{{dT}} = \frac{{d{L_{Fiber}}}}{{dT}} \cdot n + {L_{Fiber}} \cdot \left( {\frac{{\delta n}}{{\delta S}} \cdot \frac{{dS}}{{dT}} + \frac{{\delta n}}{{\delta T}}} \right)

We can rearrange this expression to make it a bit more useful for the discussions to follow. These expressions are formed to use readily available physical data for fiber.

d\tau  = \left( {\frac{{\frac{{\frac{{d{L_{Fiber}}}}{{dT}}}}{L} \cdot n}}{c} + \frac{{\frac{{\delta n}}{{\delta S}} \cdot \frac{{dS}}{{dT}}}}{c} + \frac{{\frac{{\frac{{\delta n}}{{\delta T}}}}{n} \cdot n}}{c}} \right) \cdot {L_{Fiber}} \cdot d\tau

Where k_T is the overall temperature coefficient of the fiber delay. A few definitions will simplify our look at each term of the temperature variation independently.

Components of Temperature-Induced Delay Variation
Delay Component Equation
Fiber Length-Related Change {k_L} \triangleq \frac{{\frac{{\frac{{dL}}{{dT}}}}{L} \cdot n}}{c}
Index of Refraction-Related Change with Temperature {k_I} \triangleq \frac{{\frac{{\frac{{\delta n}}{{\delta T}}}}{n} \cdot n}}{c}
Index of Refraction Changes with Temperature-Induced Stress {k_S} \triangleq \frac{{\frac{{\delta n}}{{\delta S}} \cdot \frac{{dS}}{{dT}}}}{c}

Using these definitions, we can now write a simple expression for the variation of signal delay on the fiber.

{k_T} = {k_L} + {k_I} + {k_S}

We can now discuss the overall delay variation in terms of the variation of each of the components.

d\tau  = \left( {{k_L} + {k_S} + {k_I}} \right) \cdot L \cdot d\tau

Temperature Variation of the Components of Signal Delay

Key Fiber Characteristics

Before we go too far, we need to list out some of the key physical characteristics of fiber optic cable. These characteristics vary a bit from manufacturer to manufacturer, but these values are representative.

Table of Important Fiber Physical Characteristics
Parameter Description Value Units
n Silica Index of Refraction 1.455 dimensionless
{\frac{{\frac{{\partial {L_{Fiber}}}}{{\partial T}}}}{{{L_{Fiber}}}}} Temperature Coefficient of Fiber Length 5.5E-7 1/°C
\frac{{\frac{{\partial n}}{{\partial T}}}}{n} Temperature Coefficient of Index of Refraction 9.76E-6 1/°C
p11 First Pockel's constant 0.113 dimensionless
p12 Second Pockel's constant 0.252 dimensionless
ν Poisson's Ratio 0.16 dimensionless

A Brief Look at the Photoelastic Effect

It turns out the effect of stress on the fiber is difficult to model. While there is a good theory for the fiber by itself, the fiber is bundled in other materials that have their own expansion coefficient. Thus, the overall stress on the fiber is a complex function of the constituent material properties. We can compute the photoelastic variation in the index of refraction for unsheathed fiber using the equation shown below.

{k_S} \triangleq \frac{{\frac{{\delta n}}{{\delta S}} \cdot \frac{{dS}}{{dT}}}}{c} = \frac{{{n^3}}}{2}\cdot\left[ {\left( {{p_{11}} + {p_{12}}} \right)\cdot\nu  - {p_{12}}} \right]\cdot\frac{\alpha }{c}

The analysis that follows will model the photoelastic variation of the fiber itself, but it will not attempt to model the total variation in the index of refraction due to stress — we simply do not have enough information. Similarly, while we have data for the temperature coefficient of expansion for silica, we do not have data on the specific cable being used in Mexico or North Dakota. So our modeling will be rough for two of the three terms. This means that we should treat our mathematical analysis as giving us an order of magnitude result that may not be exactly correct. It will tell us what are the most important terms.

Summary of Delay Coefficients

Calculation of Delay Variation Due to Length Change
Symbol Equation Value Unit
kL \frac{{\frac{{\frac{{\partial {L_{Fiber}}}}{{\partial T}}}}{{{L_{Fiber}}}}}}{c} \cdot n

2.67 \frac{{{\text{ps}}}}{{{\text{km}} \cdot {^{\circ}\text{C}}}}
kI \frac{{\frac{{\frac{{dn}}{{dT}}}}{n} \cdot n}}{c}

47.42 \frac{{{\text{ps}}}}{{{\text{km}} \cdot {^{\circ}\text{C}}}}
kS \frac{{\frac{{\partial {n_S}}}{{\partial T}}}}{{{n_S}}} -0.55 \frac{{{\text{ps}}}}{{{\text{km}} \cdot {^{\circ}\text{C}}}}
kT {k_L} + {k_I} + {k_S} 49.54 \frac{{{\text{ps}}}}{{{\text{km}} \cdot {^{\circ}\text{C}}}}

We see that the delay variation is dominated by the temperature coefficient of the index of refraction. So we should expect to see ~50 \frac{{{\text{ps}}}}{{{\text{km}} \cdot {^{\circ}\text{C}}}} for the fiber itself. However, Reference [1] reports larger values for the delay variation with temperature. My assumption is that this variation is due to the change in mechanical stresses with temperature, which I have not modeled. They are considered in [3], but that is way beyond my scope here. So 50 \frac{{{\text{ps}}}}{{{\text{km}} \cdot {^{\circ}\text{C}}}} should be looked upon as a lower bound. For the sake of further analysis, we will assume a delay temperature coefficient of \left( {~75 \pm 25} \right)\frac{{{\text{ps}}}}{{{\text{km}} \cdot ^\circ {\text{C}}}} .

Estimate of Total Delay Variation

If we assume a temperature variation of 30 °C, we can estimate the total delay variation using the following equation.

\tau  = {k_T} \cdot {L_{Fiber}} \cdot \Delta T

We can substitute the values from our Mexican deployment into this equation to obtain.

\tau = 75\frac{{{\text{ps}}}}{{{\text{km}} \cdot {^\circ \text{C}}}} \cdot 11{\text{ km}} \cdot \text{30 } ^\circ \text{K}  =  \text{25.750 ns}

This shows that the delay variation with temperature larger than our limit of 19.2 nsec and is sufficient to cause bit errors.

Comparison With Published Results

I was not able to find a large number of references on the variation in signal delay with temperature but there were two that were useful. One reference [1] stated that they had observed a fiber delay variation in the range of 60 to 100 \frac{{{\text{ps}}}}{{{\text{km}} \cdot {^\circ \text{C}}}} . This is consistent with what we were seeing. Another reference [2] states that the fiber delay variation is 43 \frac{{{\text{ps}}}}{{{\text{km}} \cdot {^\circ \text{C}}}} . This is less than my estimate, but still enough to cause a failure given a long fiber, large temperature variation, or both.

Conclusion

Here is how everything ended up:

  • We modified our system to meet a maximum fiber delay variation of 100 \frac{{{\text{ps}}}}{{{\text{km}} \cdot {^\circ \text{C}}}} .
  • Thanks to programmable hardware, we were able to download the change to our customers at no cost.
  • The problem has not reappeared. This is the best indication of all that we nailed the bug!

References

[1] Myoujin, Y.. "Transmission delay variations in OPGW and overhead fiber-optic cable links ." IEEE Transactions on Power Delivery 12, no. 4 (1997): 1415-1421.
[2] Nooren, G-J. "Influence of temperature and pressure on light propagation in fibres." June 6, /2000.http://www.nikhef.nl/~nooren/fibretest/ (accessed March 21, 2010).
[3] Kiesel, S et al. "Behaviour of intrinsic polymer optical fibre sensor for large-strain applications." Measurement Science and Technology 18 (2007) 3144–3154.

This entry was posted in Fiber Optics. Bookmark the permalink.

One Response to The Case of the Mysterious Bit Errors

  1. Pingback: Coaxial Cable Temperature Expansion Rate | Math Encounters Blog

Comments are closed.