In my previous article we talked about how we might represent a number in memory and that the obvious way probably is not the most efficent. We then discussed another way - allow me to introduce IEEE-754..

IEEE-754

IEEE-754 is an incredibly important standard in software development & electronics and its used in a large number of programming languages such as JavaScript, C# and Java. Amongst other things it defines how numbers should be held, how they work together and how to handle errors.

But how does IEEE-754 work?

Well the simple explanation is that IEEE-754 is a bit like scientific notation. Scientific notation is great as it makes some very large & very small numbers more readable and takes up less space.

E.g. you have no doubt written something like:

5.6 x 10 – 6 (0.0000000056)

I am simplifying things however it is different from scientific notation when IEEE-754 represents numbers it actually divides them into 3 different sections.

Below is how these sections work for a 64 bit IEEE-754 double precision number:

  • Sign (1 bit) - positive or negative with 0 for a positive, 1 for a negative number
  • Exponent (11 bits)
  • Mantissa/Fraction (52 bits)

Ok but what does this look like in practice?

IEEE-754 Example

Let’s take an example using a 32 bit number (32 bit is easier as involves less bits which makes it fit on a web page easier!).

32 bits values work in a very similar way to 64 bit floating point values but there are less bits in the exponent & fraction to utilize and something called the bias value is different that we will discuss shortly is different.

Let’s run through an example.

Say we have the following values in our 3 sections:

  • Sign: 0
  • Exponent: 10000001 (129)
  • Mantissa: 01000000000000000000000 (0.25)

We’ll also need the following formula to work out our end number:

Number stored = (Exponent – Bias) ^2 * (1 + Mantissa)

Ok we have everything we need let’s work across from left to right.

Sign

As 0 is in the sign portion we know we have a positive number – easy!

Exponent and Bias

Next we need to work out the exponent.

Now there is an important rule here that we also need to deduct something called the bias from the value. With 32 bit numbers this bias value is is 127 so we have:

129 (Exponent) – 127 (bias value for 32 bit numbers) = 2

Why do we need a bias value?

Well one reason is that it saves us holding the sign of the number & apparently it also enables some binary gymnastics that make it quicker for performing comparisons of numbers.

Ok so we have removed our bias - next we need to raise this value to the power of 2:

2 ^ 2 = 4

Mantissa

Next we need to calculate the mantissa. To do this we always pretend there is 1 in front of the value stored which gives us: 1.25.

This convention saves us a bit of space amongst other benefits.

Finally we take the first value and multiply it by the second which gives us our final value:

4 x 1.25 = 5

So I am simplifying this slightly for easy readability but you get the idea.

IEEE-754 Range

IEEE-754 allows us to store values from −2^63 to 2^63 – 1 or -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

This is a huge range and almost a big enough number to store the number of silly things the Australian Prime Minister Tony Abbott has to say but not quite.

It’s worth mentioning that IEEE-754 also has some special bit patterns reserved for storing special values such as:

  • NaN (not a number)
  • Infinity
  • 0
  • -infinity
  • -0

Yes there are negative as well as positive values of 0 and infinity! – these probably aren’t too useful for every day programming & I have never come across them before researching this. You can see negative 0 for example by dividing 0 by -1 in case you are interested.

If you want to play with IEEE-754 and check your math I would highly recommend going to this site that allows you to turn bits on and off and check your workings.

Unfortunately the IEEE charge to obtain a copy of the full standard but there is a lot of academic resources out there for understanding more about it.

But what has this all got to do with JavaScript and numbers?

Well there are many ways to create/hold numbers in JavaScript – here is a few I can think of off the top of my head:

  • var x = 5;
  • var x = 0.5;
  • var x = Number(5);
  • var x = parseInt(5);
  • var x = parseFloat(5);
  • var x = +"5"

But no matter how you create/define a number all numbers in JavaScript are actually 64 bit IEEE-754 floating point values.

Yep no matter how much your values may look like integers, floats or decimals they are ALL 64 bit floating point numbers – and as all numbers in JS are 64 bit IEEE-754 floating point values this means they all adhere to the rules of IEEE-754.

IEEE-754.

Now IEEE-754 can store some values exactly:

  • 0.5 = 0.5
  • 0.25 = 0.25

But unfortunately there are many, many more that cannot quite be represented using this format such as:

  • 0.1 = 0.10000000149011612
  • 0.2 = 0.20000000298023224
  • 0.52 = 0.5199999809265137
  • 0.9 = 0.8999999761581421

IEEE-754 can store these numbers very close to the intended values but they are not quite the number.

It’s a bit like if I ask you to give me 43 cents using Australian currency where the smallest denomination is 5 cents. You can give me 40 or 45 which are pretty close to the amount I require but just don’t have the coins to make the exact amount.

IEEE-754 is not just for JavaScript

It’s worth noting that C#, Java and many other languages also use IEEE-754 for double values so don’t think you’ll get away from this issue by changing language!

They do generally have other types you can use which avoid these issues. For example C# has a decimal type which should be used for tasks like monetary calculations.

Sometimes languages will make it look like you are storing the exact value when you are not – you are just storing a value that’s very close.

For example in JavaScript:

var x = 0.3;
console.log(x) //0.3

<p></p>However JavaScript is hiding stuff from you – if you want to see a more accurate representation you can do something like the following:

x.toFixed(20); //0.29999999999999998890

<p></p> Or in C#:

double x = 0.1;
double y = 0.2;
double result = x + y;
Console.Write(result); //0.3

<p></p>Hmm same thing but if you want to see the real result held you will have to tell C# you want to see more with something like:

Console.Write("{0:G17}", result); //0.30000000000000004

<p></p>

Arithmetic

So as you can imagine not being able to represent some numbers exactly causes some issues with arithmetic e.g. our first example of 0.1 + 0.2 != 0.3.

Now for some applications these small differences won’t be an issue and will be hidden by formatting but then for others they can become a huge issue..

So what can you do about this?

Well you have two main choices:

  • Always work in integers
  • Use a library

Integers are always held exactly so if you ensure you work just in integer values everything will work fine. For example when working with monetary amounts think of amounts in cents rather than dollars and cents – just remember to format/convert once you are finished working with them!

Libraries

Maybe an easier option however is to use a library to handle these issues for you. There are a number of different library options available & probably one of the best known is BigNumber.js which is a JavaScript version of Java’s big number.

Unfortunately this library isn’t the easiest to understand (well near impossible) as it is compiled JavaScript from Google’s GWT toolkit so my recommendation is to instead use decimal.js which is smaller (it does less in fairness) and much more readable if you want to know what’s going on behind the scenes (which is fairly complex).

Below shows an example of the syntax of the BigDecimal library – all the libraries are pretty similar in this respect:

var a = new BigDecimal("0.1");

To avoid floating point issues these libraries hold numbers using their own structure (decimal .js uses an array of 3 numbers, a sign, an exponent and a coefficient value). The libraries will then implement their own version of methods such as toFixed to return the results you are expecting.

Library disadvantages

There are two main disadvantages with using a library. The main issues is that a software implementation is many, many times slower than native methods so if you are doing something where performance is vital this probably isn’t for you – so move along!

Secondly (in JavaScript anyway) there is no method of overriding inbuilt mathematical operators. As developers we expect to be able to do things like: x + y which won’t work when working with types such as big decimal and you instead have to use inbuilt methods such as add & plus depending on the library you are using. This can make code less readable and developers unfamilar with the libraries may make mistakes.

Comparing Amounts

As amounts are not held exactly you may not be comparing what you think you are. Let’s say for example you create a system that has a discount that kicks in at 30 cents or this could cause a few issues!

An easy way round this problem is to define an acceptable margin of error when comparing numbers – often called Epsilon by math guys (Epsilon is a Greek letter and often used to detonate a small difference for some reason):

var epsilon = 0.00001;
return Math.abs(x - y) &lt; epsilon;

JavaScript Engines

It’s worth noting that browser JS engines do perform their own internal optimizations and may hold values as integers but you can guarantee they will ensure numbers meet IEEE-754 behaviour criteria so as far as you are concerned they are IEEE-754 double precision floating point values – you have been warned!

Future

So this all sounds a bit crap – is there a better way?

Well there may be on the horizon..

Introducing a decimal type has been a long running discussion in the ECMA script committee (I found mail archives mentioning this in EcmaScript v3) so I wouldn’t hold your breath on this being implemented tomorrow.

In fairness introducing a new type is a rather complicated matter as there are a lot of questions about how a new type should work with existing types. For example what should the typeOf method return, what happens when an existing number is compared to a new decimal & what should happen with JSON serialization?

All good questions…

However a current straw man proposal might help solve this – let me introduce value_objects. There is not a lot of information on this area yet so i'm afraid this is a bit vague.

Value objects could potentially enable users to define their own primitive types and define a way of implementing types in the future. Apparently the committee wanted a method that could be reused going forward to introduce new types as no one wants to go through the issues of introducing new types too often

This is good news as there is a lot of interest in implementing various other types such as int64 and BigNum and structures that could be utilized for very efficient processing of numbers so there is quite a lot of motivation to implement this.

Maybe we will see this in EcmaScript 2016 but who knows?

Conclusion

So in conclusion some numbers cannot be represented exactly in JavaScript due to the way IEEE-754 represents numbers internally. The best current workarounds are to work in integers or use libraries such as decimal.js.

Further reading: