What Could Elections Officials Learn From the Delta Airlines Outage

This week Delta Airlines was partially down, so far, for at least three days.  Because of “computer” or “power” problems according to reports, e.g. How A Computer Outage Can Take Down An Entire Airline <read>.

Just after five in the morning on Monday, Delta sent out an alert every traveler dreads. “Delta has experienced a computer outage that has affected flights scheduled for this morning.”

Two hours later, Delta added discouraging details: The outage in Atlanta had crippled its mission control center—the NASA-inspired room that keeps Delta’s global fleet running. Soon, static check-in lanes clogged airports and gate agents started writing boarding passes by hand. Passengers slept on airport floors or sat in parked planes, even as departure boards and smartphone apps wrongly told them everything was running great. The airline canceled more than 650 flights and delayed many more in the US, Japan, Italy, and the UK…

Georgia Power, which supplies electricity to Delta, says it’s working with the airline today to fix a failed switchgear—a heavy duty version of the circuit breaker panel you’ve got in your basement. That would suggest that if an update or test is the problem, it was of hardware (perhaps, ironically, something like a new power supply), rather than of software. Georgia Power says the outage affected nobody else.

This is not the first time:

No one seems to know what went wrong, exactly—Delta’s investigating—but this is hardly the first time a computer glitch has shackled an airline’s global operations to the tarmac. So how does this keep happening?…

If you’re starting to think this kind of thing happens a lot, you’re right. In July, the failure of a single data center router forced Southwest to cancel 2,300 flights across four days, costing the airline well over $10 million. CEO Gary Kelly told The Dallas Morning News the router only partially failed, so it didn’t trigger the backup systems. In May, JetBlue had to check in customers by hand when its computer system went down. American Airlines blamed connectivity issues when it had to suspend flights last September. A year ago, United blamed a glitch for 800 flight delays.

And then there are the cases that defy contingency planning. In 1991 a farmer reportedly took 20 air traffic control centers offline when he inadvertently cut through an underground fiber optic cable while burying a cow. In 2014, an FAA contractor set fire to an air traffic control center in Chicago, disrupting travel for more than two weeks.

There are three lessons we might absorb and election officials might learn from this.  (We have to admit that we are skeptical that these lessons will be learned by the public or officials.)

  • System failures are generally explained away as accidents, usually unique and isolated ones.
  • Human systems are vulnerable to failure, especially those dependent on computer systems, especially when there is no manual backup.
  • If businesses like airlines, banks, and Federal Government agencies cannot protect their systems, how can state, county,  and local systems be expected to be reliable?

System Failures Are Generally Explained Away as Accidents

How can we be sure that a system failure is an accident, not a sabotage?   How do we know that an individual, foreign power, or business competitor did not bring down the system?  This could have been a test of a surgical strike which could be used to take down multiple airlines or other critical systems.

You maybe thinking “Conspiracy Theorist” here.  That is a good way to deflect concern, without delving deeper, without considering actually learning.  Yet, such an attack has happened.  Maybe more than one or several. The U.S. Government and Israel attacked Iranian nuclear facilities, by attacking the control system responsible for nuclear centrifuges.  The attack known as Stuxnet was designed to go undetected, and it did so for several years.

The point here is not that the Delta outage was necessarily such an attack. It is that it could have been and even with diligence that may not ever be determined.  It could also have been sabotage by a single individual.  In any case, computer attack, human attack, or accident, our infrastructure is vulnerable.

Human systems are vulnerable to failure, especially those dependent on computer systems, especially when there is no manual backup.

Without their computer system, Delta, was dead in the water (actually dead in the air, stuck on the ground), completely dependent on computer systems power, and apparently a single point of failure.

But wait.  What if Delta could have had a simple manual backup?  Would it be possible to save millions, perhapss billions of dollars, and continue most flights, with most passengers, saving them many problems?

I am not an airline expert, yet my guess is that Delta’s system is largely separate from the Air Traffic Control, TSA, and Immigration Systems.  Here is an outline of a simple backup system:

  • Every couple of hours, spreadsheets of the following are sent to a personal computer at each Delta airport:  Passengers booked for each flight for the next 24 hours.  Equipment, crew, and schedule for each flight in that period.
  • In a similar emergency all those items are printed on paper and used by personnel to create boarding passes and checkin passengers.
  • Flight crews, baggage handlers, and maintenance use that information to continue operations.

Obviously it would not work perfectly, yet it would provide for most service to continue at a considerably slower pace.

If businesses like airlines, banks, and Federal Government agencies cannot protect their systems, how can state, county and local systems be expected to be reliable?

Which brings us to our election system.  To the extent we make it an electronic election system, we are similarly dependent on systems, to the extent we have no manual backup or workable pre-planned contingencies.

How about Connecticut

One area where we are very good, is that we have paper ballots.  Even if our scanners fail due to an extended power outage we can still vote on paper ballots and count them later!

But there are other potential problems.

The current voting system is partially dependent on the availability of the online Central Voter Registration System (CVRS) and the phone system. CVRS and the phone system are also generally dependent on the availability of the Internet and the power grid.  Availability required statewide and in each town in the state.

  • The CVRS must be available in the few days before an election so that paper checkin lists can be printed, so that voters can checkin at the polls.
  • On election day, Registrars are constantly checking the system to resolve voter registration issues at polling places, perhaps 5% of voters would not be able to vote if that system were unavailable.
  • Also on election day, election day registration is currently 100% dependent on the availability of the CVRS, with no model contingency plan specified by the Secretary of the State’s Office.
  • Also the whole system is highly dependent on the phone system which is used by polling place officials to call the Registrars’ Office, and for the Registrars’ Office call other towns for Election Day Registration.

When we convert to electronic checkin, we must be careful to require paper copies of  checkin lists so that polling place voting can mostly continue in the event of power, phone, and computer outages.

Finally, a reminder that it is tough for individual industries to protect themselves, harder for state and local governments, and that Connecticut is not the pick of the litter here:

As was reported in April: Connecticut Makes National Short List – Embarrassing <read>

U.S. federal, state and local government agencies rank in last place in cyber security when compared against 17 major private industries, including transportation, retail and healthcare, according to a new report released Thursday.

The analysis, from venture-backed security risk benchmarking startup SecurityScorecard, measured the relative security health of government and industries across 10 categories, including vulnerability to malware infections, exposure rates of passwords and susceptibility to social engineering, such as an employee using corporate account information on a public social network.

Educations, telecommunications and pharmaceutical industries also ranked low, the report found. Information services, construction, food and technology were among the top performers…

Other low-performing government organizations included the U.S. Department of State and the information technology systems used by Connecticut, Pennsylvania, Washington and Maricopa County, Arizona.

As we said then:

We sadly await the Election Day when the Connecticut voter registration system is down, especially with no contingency plan for Election Day Registration. Don’t say “Who Could Have Imagined”, we did.







Leave a Reply

You must be logged in to post a comment.