Binary Numbers and Floating Point Representation

Posted on Mar 19, 2025 in Computers

1. Binary Number Systems

Binary Basics:
- Each digit (bit) is either 0 or 1.
- Binary numbers are used to represent all data in computers.
- Example: 1011 in binary is 11 in decimal.
Binary to Decimal Conversion:
- Each bit represents a power of 2.
- Example: 1011 = 1×2³ + 0×2² + 1×2¹ + 1×2⁰ = 11.
Decimal to Binary Conversion:
- Divide by 2 and record remainders.
- Example: 11 in decimal is 1011 in binary.

Unsigned Integers (B2U):
- Represents non-negative numbers.
- Range: 0 to 2^w – 1 (where w is the number of bits).
- Example: 4 bits can represent 0 to 15.
Signed Integers (B2T – Two’s Complement):
- Represents both positive and negative numbers.
- Range: -2^w-1 to 2^w-1 – 1.
- Negation: Invert bits and add 1.
- Example: 1101 in 4-bit two’s complement is -3.
Overflow:
- Occurs when a result exceeds the representable range.
- Example: Adding 10 + 7 in 4-bit unsigned results in 1 (overflow).

Fractional Binary Representation:
- Bits to the right of the binary point represent negative powers of 2.
- Example: 101.101 = 1×2² + 0×2¹ + 1×2⁰ + 1×2^-1 + 0×2^-2 + 1×2^-3 = 5.625.
Precision Limitation:
- Only numbers of the form x/2^k can be exactly represented.
- Example: 1/3 cannot be exactly represented in binary.

Floating Point Representation:
- Sign bit (s): Determines if the number is negative or positive.
- Significand (M): Fractional value in the range [1.0, 2.0).
- Exponent (E): Weights the value by a power of 2.
- Formula: v = (-1)^s × M × 2^E.
Normalized Values:
- Exponent is neither all 0s nor all 1s.
- Example: 0100 0110 0110 1101 1011 0100 0000 0000 represents 15213.0.
Denormalized Values:
- Exponent is all 0s.
- Used to represent very small numbers close to 0.
Special Values:
- Infinity: Exponent is all 1s, significand is 0.
- NaN (Not a Number): Exponent is all 1s, significand is non-zero.

Addition:
- Align exponents, add significands, and normalize the result.
- Example: 1.5×2³ + 1.25×2¹ = 1.5×2³ + 0.15625×2³ = 1.65625×2³.
Multiplication:
- Multiply significands, add exponents, and normalize the result.
- Example: 1.5×2³ × 1.25×2¹ = 1.875×2⁴.
Rounding:
- Round to nearest even to avoid bias.
- Example: 1.1011 rounded to 3 bits becomes 1.110.

Left Shift (<<):
- Shifts bits to the left, filling with 0s.
- Equivalent to multiplying by 2^k.
- Example: 1010 << 2 = 101000.
Right Shift (>>):
- Shifts bits to the right.
- Logical Shift: Fills with 0s (for unsigned).
- Arithmetic Shift: Replicates the sign bit (for signed).
- Example: 1010 >> 2 = 0010 (logical), 1110 (arithmetic).

Big Endian:
- Most significant byte is stored at the lowest address.
- Example: 0x01234567 is stored as 01 23 45 67.
Little Endian:
- Least significant byte is stored at the lowest address.
- Example: 0x01234567 is stored as 67 45 23 01.

Explicit Casting:
- Convert between data types.
- Example: int x = (int) 3.14;.
Implicit Casting:
- Automatically converts types in expressions.
- Example: int x = 3.14; (truncates to 3).

Commutativity:
- Addition and multiplication are commutative.
- Example: a + b = b + a, a × b = b × a.
Associativity:
- Addition and multiplication are not associative due to rounding errors.
- Example: (a + b) + c ≠ a + (b + c).
Distributivity:
- Multiplication does not distribute over addition due to rounding.
- Example: a × (b + c) ≠ a × b + a × c.

Precision vs. Range:
- Floating point offers a trade-off between precision and range.
- Higher precision for small numbers, lower precision for large numbers.
Overflow and Underflow:
- Be cautious of overflow in integer arithmetic and underflow in floating point.
Bitwise Operations:
- Useful for low-level manipulation but requires careful handling of overflow and sign.