Integer

From Rosetta Code

An integer is a number with no fractional part. An integer consists of some digits and a sign: 0, 42 and -1024 are all examples of integers.

How Integers Are Stored in Memory

Typically, the term "integer" has become associated with a 32-bit value, thanks mostly to C, however this is not necessarily the case across all languages and all computer architectures. Most languages stick to this standard, but not all do. Languages that allow for arbitrarily large integers definitely do not. Storing a 32-bit value in memory is simple, the bytes are placed consecutively, starting at the specified memory address (usually this is done by assigning a value to a variable, and the programmer never sees the actual address).

Unlike floats, an integer's binary representation is taken at face value (except sometimes for the leftmost bit, we'll get to that in a moment.) So a binary 1 is 1, a binary 10 is 2, etc.

Signed/Unsigned

Integer variables can be either be declared as signed or unsigned, and this affects how the compiler handles them. CPUs have different ways of comparing values depending on whether a variable is intended to be signed or unsigned. Notice that I said "intended" - the CPU doesn't really know whether your data is meant to be signed or unsigned. This means that the quantity 0xFFFFFFFF can represent either negative 1 or 4,294,967,295. But which one is it? Most high-level languages lock you into picking one, but at a hardware level it can be whatever you want it to be at any particular moment. (Kind of like the Ace in Blackjack.)

For most programming languages, integer variables (and numeric variables in general) are treated as signed by default (some don't even give you a choice.) <lang C>int foo; //this is a signed integer unsigned int bar; //this is an unsigned integer</lang>

Two's Complement

This is the method computers use to represent negative numbers. The leftmost bit of a number's binary representation is often called the "sign bit" and determines whether the number is positive or negative. For 32-bit integers, 0x7FFFFFFF (2,147,483,647 in decimal) represents the largest positive signed integer, and 0x80000000 (-2,147,483,648 in decimal) represents the smallest negative signed integer. (If you've seen these numbers in programming bugs before, there's a reason for that - which will be explained later.) With two's complement, you can easily change a positive number to a negative number by flipping the bits (turning every 0 to a 1 and vice versa), then adding 1 to the result. Many CPUs have dedicated instructions just to do this. This system allows integer math to work the way you would expect it to in real life. There's just one small problem...

Integer Overflow

And I'm sure you've figured it out by now. Yes, if a signed number gets too big, it suddenly becomes very very small. This is known as integer overflow and occurs when a number crosses over from 0x7F... to 0x80... (fill in the dots with Fs/0s depending on the size of your numeric data type.) Luckily, nearly all CPUs have special hardware just for detecting overflow, and do so automatically after every calculation. Here's a simple example from x86 Assembly. <lang asm>MOV EAX,0x7FFFFFFF ADD EAX,1 JO WeHaveOverflow ;this branch will always be taken, since 0x7FFFFFFF + 1 = 0x80000000 RET WeHaveOverflow:</lang>

The jo instruction stands for "jump if overflow" and acts as a conditional GOTO in the event an overflow occurs. Many high-level languages will do something similar, usually in the form of throwing exceptions. Keep in mind that raising errors on overflow is purely a software convention - the hardware will detect it any time it happens, but the software decides whether the overflow is actually a problem or not. (Integer overflow, of course, isn't a problem for unsigned integers, so no errors are thrown when they "overflow.")

Unsigned Overflow

Unsigned integers can experience a different kind of overflow. Adding 1 to 0xFFFFFFFF results in 0. Since a CPU's arithmetic is modulo 2^(??) (fill in the ?? with whatever "bitness" your computer is, e.g. 32 for a 32-bit CPU, etc.), this means that your data can "wrap around" if it gets too large. Once again, the CPU detects this automatically, this time using the carry flag rather than the overflow flag. While this seems like a bad thing, it's really not, as it allows programs to perform arithmetic on values much larger than the "bitness" of the CPU. It can still be a problem if you didn't intend for it to happen, however, which is why bounds checking can be helpful.