Binary Numbers and Floating Point Representation

1. Binary Number Systems

  • Binary Basics:

    • Each digit (bit) is either 0 or 1.

    • Binary numbers are used to represent all data in computers.

    • Example: 1011 in binary is 11 in decimal.

  • Binary to Decimal Conversion:

    • Each bit represents a power of 2.

    • Example: 1011 = 1×23 + 0×22 + 1×21 + 1×20 = 11.

  • Decimal to Binary Conversion:

    • Divide by 2 and record remainders.

    • Example: 11 in decimal is 1011 in binary.


2. Encoding Integers

  • Unsigned Integers (B2U):

    • Represents non-negative numbers.

    • Range: 0 to 2w – 1 (where w is the number of bits).

    • Example: 4 bits can represent 0 to 15.

  • Signed Integers (B2T – Two’s Complement):

    • Represents both positive and negative numbers.

    • Range: -2w-1 to 2w-1 – 1.

    • Negation: Invert bits and add 1.

    • Example: 1101 in 4-bit two’s complement is -3.

  • Overflow:

    • Occurs when a result exceeds the representable range.

    • Example: Adding 10 + 7 in 4-bit unsigned results in 1 (overflow).


3. Fractional Binary Numbers

  • Fractional Binary Representation:

    • Bits to the right of the binary point represent negative powers of 2.

    • Example: 101.101 = 1×22 + 0×21 + 1×20 + 1×2-1 + 0×2-2 + 1×2-3 = 5.625.

  • Precision Limitation:

    • Only numbers of the form x/2k can be exactly represented.

    • Example: 1/3 cannot be exactly represented in binary.


4. IEEE Floating Point Standard (IEEE 754)

  • Floating Point Representation:

    • Sign bit (s): Determines if the number is negative or positive.

    • Significand (M): Fractional value in the range [1.0, 2.0).

    • Exponent (E): Weights the value by a power of 2.

    • Formula: v = (-1)s × M × 2E.

  • Normalized Values:

    • Exponent is neither all 0s nor all 1s.

    • Example: 0100 0110 0110 1101 1011 0100 0000 0000 represents 15213.0.

  • Denormalized Values:

    • Exponent is all 0s.

    • Used to represent very small numbers close to 0.

  • Special Values:

    • Infinity: Exponent is all 1s, significand is 0.

    • NaN (Not a Number): Exponent is all 1s, significand is non-zero.


5. Floating Point Arithmetic

  • Addition:

    • Align exponents, add significands, and normalize the result.

    • Example: 1.5×23 + 1.25×21 = 1.5×23 + 0.15625×23 = 1.65625×23.

  • Multiplication:

    • Multiply significands, add exponents, and normalize the result.

    • Example: 1.5×23 × 1.25×21 = 1.875×24.

  • Rounding:

    • Round to nearest even to avoid bias.

    • Example: 1.1011 rounded to 3 bits becomes 1.110.


6. Bit Shift Operations

  • Left Shift (<<):

    • Shifts bits to the left, filling with 0s.

    • Equivalent to multiplying by 2k.

    • Example: 1010 << 2 = 101000.

  • Right Shift (>>):

    • Shifts bits to the right.

    • Logical Shift: Fills with 0s (for unsigned).

    • Arithmetic Shift: Replicates the sign bit (for signed).

    • Example: 1010 >> 2 = 0010 (logical), 1110 (arithmetic).


7. Byte Ordering

  • Big Endian:

    • Most significant byte is stored at the lowest address.

    • Example: 0x01234567 is stored as 01 23 45 67.

  • Little Endian:

    • Least significant byte is stored at the lowest address.

    • Example: 0x01234567 is stored as 67 45 23 01.


8. Casting in C

  • Explicit Casting:

    • Convert between data types.

    • Example: int x = (int) 3.14;.

  • Implicit Casting:

    • Automatically converts types in expressions.

    • Example: int x = 3.14; (truncates to 3).


9. Mathematical Properties of Floating Point

  • Commutativity:

    • Addition and multiplication are commutative.

    • Example: a + b = b + a, a × b = b × a.

  • Associativity:

    • Addition and multiplication are not associative due to rounding errors.

    • Example: (a + b) + ca + (b + c).

  • Distributivity:

    • Multiplication does not distribute over addition due to rounding.

    • Example: a × (b + c) ≠ a × b + a × c.


10. Key Takeaways

  • Precision vs. Range:

    • Floating point offers a trade-off between precision and range.

    • Higher precision for small numbers, lower precision for large numbers.

  • Overflow and Underflow:

    • Be cautious of overflow in integer arithmetic and underflow in floating point.

  • Bitwise Operations:

    • Useful for low-level manipulation but requires careful handling of overflow and sign.