Despite the best of intentions and processes we all screw up sometimes.

I think pretty much every developer has made a screw up of some kind over their career (and if you haven’t you are either very lucky, unaware you have done so or prob not doing anything interesting!).

The following is a list of high profile software errors I have compiled from various sources.

Some of these issues are quite old now & documentation is limited so if I have made an error or reported the issue incorrectly please don’t hesitate to let me know.

Some issues are amusing & entertaining, some are downright terrifying & a few tragically resulted in loss of life.

Read on, hopefully be entertained, perhaps there is something to learn or simply be glad you were not involved in any of these..

Space/Aerospace

Mariner 1 rocket (1962)

The Mariner 1 https://en.wikipedia.org/wiki/Mariner_1 was a $18.5 million rocket designed to perform a fly by of Venus that had to be destructed shortly after launch after it didn’t respond properly to commands.

The issue was due to a missing overbar (yep a line) that was present on the formula being transcribed for the requirements.

The science fiction author Arthur C Clarke described Mariner 1 as "wrecked by the most expensive hyphen in history".

Mars Climate Orbiter (1988)

The Mars orbiter https://en.wikipedia.org/wiki/Mars_Climate_Orbiter was a $125 million project designed to study the Martian surface and climate.

Communications were lost as the orbiter likely entered the atmosphere at the wrong trajectory. The main cause of this issue was found to be NASA & Lockheed Martin using different units of measurement..

Ariane 5 (1996)

The Ariane 5 https://en.wikipedia.org/wiki/Ariane_5 is a European rocket that is used to launch satellites and cost nearly 8 billion to develop. On it's first test flight it self-destructed 37 seconds after launch.

Why? a conversion between a 64 bit & 16 bit number that resulted in an integer overflow - apparently in the Ariane 4 efficiency was quite important hence the 16 bit numbers and the code had been re-purposed for Arianne 5..

Boeing 787 (2015)

An integer overflow bug could result in electrical systems being shut down if the aircraft was on for more than 248 days!

Technology

AT&T (1990)

AT&T’s long distance phone network became disabled after a software bug caused switches to continuously reset. It was estimated AT&T lost $60 million in potential charges.

Pentium FDIV Processors (1994)

Intel’s early Pentium processor chips had an issue that could result in incorrect results in certain fairly rare situations. The issue was discovered by mathematics professor Thomas Nicely who noticed some inconsistencies in his calculations.

Initially Intel agreed to replace chips only where users could prove they were impacted. This caused a widespread backlash & resulted in an estimated 475 million dollar clean-up..

Windows Genuine Advantage (2007)

Microsoft included a piracy checking feature that mistakenly detected legitimate software as pirated.

This resulted in legitimate users seeing antipiracy messages & having some features disabled for 19 hours depending on their OS version.

The issue was due to pre-production software being accidently released .

Paypal (2007)

Chris Reynolds was surprised to find $92 quadrillion dollars in his paypal account. The exact amount is significant in the world of 64 bit numbers suggesting a programming error.

Google malicious site detection (2009)

An error identified every site as malicious.

Apparently this was due to a developer adding a slash to a list of malicious sites meaning every site would be listed as malicious!

Olympic Hammer Throw Score System (2012)

A computer system apparently couldn’t cope with successive hammer throws of the exact same distance (it automatically assumed they were a mistake which is er an odd decision) led to the wrong competitor being awarded a medal initially.

Heathrow Airport T5 baggage system (2014)

This system was tested with over 12,000 pieces of baggage & worked flawlessly but come real world usage & it fell over leading to long delays.

It’s thought that simple things like passengers picking up and putting back luggage had caused these failures.

NEST thermostat (2016)

A bug in NEST's smart thermostate update led to batteries draining quickly & users unable to heat their homes or get hot water - it also occurred during a particularly cold period for the US.

Computer Games

World of Warcraft Corrupted Blood (2005)

The online game World of Warcraft (WOW) introduced a character called Hakkar that could infect player characters with a disease that could then be transferred to other players.

The disease was supposed to be restricted to a specific area of the game and targeted at high level characters as would quickly kill beginner characters. Unfortunately the disease was not restricted by area leading to mass player deaths as the disease quickly spread throughout the game world.

Apparently this incident was also used to by researchers to study how a population might respond to a pandemic.

Eve Online (2007)

A game update accidently erased the boot.ini file used by Windows as part of the start up process.

Civilisation (1991?)

Civilization is a popular strategy game which contains famous historical leader figures.

Early versions of the game contained a bug where the normally peaceful Ghandi would become the most aggressive character of all!

This bug was due to civilization characters all having an aggression rating.

Ghandi being er Ghandi was given the lowest possible aggression rating of 1. Unfortunately under another game condition (a player adopts democracy as a government type) all characters aggression ratings are reduced by 2.

This lead Ghandi having an aggression rating of -1 which then looped round to 255 making him the most aggressive player in the game!

Due to the popularity of the bug it was kept for future versions..

Financial

Vancouver Stock Exchange

The Vancouver stock exchange experienced a 500 point loss over 2 years due to a rounding error where the system truncated a number rather than rounding it to 3 decimal places.

Knight Capital

Improperly installed software resulted in a $440 million loss overnight.

Military

NORAD (1979)

NORAD (North American Aerospace Defence Command) defence systems detected an incoming attack leading to aircraft carrying nuclear weapons being scrambled! The issue? A technician loading a “test tape” but failing to set the system to Test mode - this er happened 3 times!

Soviet Nuclear War Early warning system (1983)

On 26th Sep 1983 (during the height of the cold ware) a defence system alerted officer Stanislaus Petrov that the US had fired a missile at Russia. Luckily for everyone involved Stanislaus reasoned a single missile was an unlikely first strike strategy & dismissed this as a false alarm.

Apparently the system had picked up a the suns rays reflecting off clouds in a rare way.

Patriot Missile Timing Bug (1991)

Patriot missiles are a ground to air missile that during the Gulf was sometimes used to intercept scud missile attacks.

As you can imagine intercepting a fast moving object is a complex problem and relied on very accurate timing.

Apparently the missile system suffered a time drift issue due to a software bug in the timing code.

This timing issue sadly meant the missile did not prevent an oncoming attack on an army barracks in Saudi Arabia and 28 American soldiers lost their lives.

USS Yorktown (1997)

A divide by zero error left the USS Yorktown missile crusier stationary for 3 hours.

Apparently this issue was due to a zero being entered by an operator & I presume some poor programming practice http://motherboard.vice.com/read/why-were-not-allowed-to-divide-by-zero.

F22 Raptor flight (2007)

During practice exercises the F22 experienced issues in multiple computer systems that seemed to occur crossing international dateline.

These issues led to multiple systems shutting down – luckily due to good weather and the aircraft tankers being in the vicinity the F22’s could safely return to base .

Medical

Therac 25 & Cobolt 60 (1985/1986)

The Therac 25 is a machine used in radiation therapy.

Unfortunatly the machines suffered a race condition that could result in some patients receiving massive dosages of radiation.

Fritz Hager a physicist at East Texas Cancer Centre eventually worked out what caused the issue by bringing in the same technician involved in previous incidents. They found that if the user changed the machines mode within 8 seconds of selecting a previous mode the problem could occur.

Misc

Undetected Ozone layer hole (1985)

Apparently it wasn’t known until 1985 that a hole in the ozone layer existed as data quality algorithms would filter out extreme measurements!

US Prison System (2015)

A bug in sentence reduction calculation led to 3200 prisoners being released early!

Apparently this issue had been going on for 13 years!

IBM’s Deep Blue beats Kasporov (1997)

Deep blue won by a flaw that resulted in it picking a move completely at random.

This confused Kasporov & arguably led to Deepblue winning the match

Further reading

  • http://www.bloomberg.com/news/articles/2012-10-17/knight-capital-reports-net-loss-as-software-error-takes-toll-1-
  • http://www5.in.tum.de/~huckle/bugse.html
  • https://www.quora.com/What-are-some-famous-bugs-in-the-computer-science-world
  • http://www.computerworld.com/article/2515483/enterprise-applications/epic-failures--11-infamous-software-bugs.html
  • https://en.wikipedia.org/wiki/List_of_software_bugs
  • http://www.scientificamerican.com/article/pogue-5-most-embarrassing-software-bugs-in-history/
  • http://royal.pingdom.com/2009/03/19/10-historical-software-bugs-with-extreme-consequences/