Numbers

Numbers are representations of things, not things themselves. We have to be able to store and process numbers in an efficient manner in order for our programs to run well. However, those numbers also have to be accurate. We can’t lose data in the process of storage, retrieval, or arithmetic operations.

“It seems like it would be simple in practice but…”

At the root, things are stored as base 2 (otherwise known as binary). Some numbers are difficult, if not impossible to accurately represent in binary, pi for instance. Our way of storing numbers can also be difficult if a number is too big to represent in the space we allocate for it.

Episode Breakdown

14:26 How Numbers Are Stored

It’s all bytes. A byte is a group of 8 bits (values of true and false). Half a byte is called a nibble. A Word is a fixed-sized chunk of bytes, defined by the processor architecture (number of bits processed in one go, usually 32 or 64 on home computers).

An integer representation of a number can represent at most 2 raised to the number of bits in that type, for example a byte is 28. Bytes (and by extension words) can be signed or unsigned. Without caution, you can screw up and treat an unsigned value as signed (or the reverse).

When you try to increment a byte past what it will hold, two things can happen. It can throw an error or it can overflow the value. Depending on the system, you may want one or the other, but be cautious. Very bad things can happen.

21:15 What We Do With Numbers

Simple arithmetic operations such as addition, subtraction, multiplication, or division. These are used for conversion to string representation for display, printing, etc. More complex functions like trigonometric functions, sine and cosine have ranges but tangent can lead to overflows if not careful.

Binary operations are also known as boolean algebra (AND, OR, XOR, etc.). This is pretty common for things like bitmasks. Shift operations basically move the bits, either to the left or to the right and losing the 1’s when they fall off the end (SHL, SHR) or moving them back to the other end of the byte stream (ROL, ROR).

We use numbers as the underlying representation for other data types. Any inaccuracy in the way you deal with the numbers making up a type will drive inaccuracy in that type.

26:36 Integer Numbers

Integers are represented in several different ways. Binary (base 2) is useful when working with bitmaps but not used for the most part. Octal (base 8) and hexadecimal (base 16) are useful as shorthand for binary. Decimal (base 10) is the commonly used form of integers by humans. Be sure to know the underlying representation when reading from a string.

29:15 Non-Integer Rational Numbers (Floating Points)

Start by thinking about scientific or engineering notation.

Example: 1.08 x 108

There are two pieces used in the storage. The first is the significand, mantissa, or coefficient, a signed digit string of a particular length in a particular base (which is referred to as the radix). The 1.08 in the example. The second piece is a signed integer exponent, which modifies the magnitude of the first number. The 108 of the example. These can be stored in different bases. Base 2 is most common. To get the actual value of the number (versus its representation in storage), you multiply the significand by the base raised to the power of the exponent. It’s important to note that the decimal is not stored.

The length of the significand (1.08) tells you how precisely you can represent a number. This also implies that there are some numbers that you simply can’t represent in a given base, but you can approximate them. Some bases can’t be used to represent certain numbers. There is no base that you can use to represent all numbers accurately. For instance, you can’t perfectly represent 1/5 in binary base, but you can in decimal. You can’t represent 1/3rd in either base, but if you had an arbitrary base 3 system, it’s trivial. The way a computer stores these is implementation dependent and there are different formats. Typically it’s going to be IEEE 754 encoding.

“Measure twice cut once”

When you do arithmetic operations on approximations of two numbers you lose accuracy. This can really hurt you if you aren’t using an accurate enough representation of something financial, like dollars and cents. It’s much worse with engineering calculations if you aren’t careful. It also means that you have to use somewhat fuzzy comparisons. You don’t check to see if a floating point is zero if it might have been involved in a number of calculations. Instead you see that it is less that some very small value. You also have to be careful when serializing/deserializing, as inaccuracy is transitive.

41:36 Irrational Numbers

“It’s close enough for figuring hand grenade blast radius, maybe…”

When working with irrational numbers approximate them by finding a nearby rational. This can be challenging as approximations further reduce the accuracy of any calculation.

42:54 Accuracy

The real point here is that you mostly don’t have perfect accuracy with numbers when using a computer. You also don’t if interacting with non-terminating floating points, whether irrational or not. Rather, you make a tradeoff between storage space, processing time, and accuracy such that you are accurate within a tolerance.

IoTease: Service

SmartLiving Maker

An end-to-end Internet of Things solution for tech enthusiasts, developers and makers. Connect Arduino, Raspberry Pi, Intel Edison or other smart devices with cloud services, automate with simple when-then rules and visualize data with smooth mobile and web dashboards.

Tricks of the Trade

The fact you can’t represent with perfect accuracy doesn’t mean you are stuck. It’s called a binary mindset. As developers we like to know all the variables and be able to predict the outcomes before doing something. This is good when creating computer programs but not so with other areas of life.

Editor’s Notes:

Tagged with: , , , , , , , , , , , , , , ,