Binary Numbers and Floating Point Representation
1. Binary Number Systems
Binary Basics:
Each digit (bit) is either
0
or1
.Binary numbers are used to represent all data in computers.
Example:
1011
in binary is11
in decimal.
Binary to Decimal Conversion:
Each bit represents a power of 2.
Example:
1011
= 1×23 + 0×22 + 1×21 + 1×20 = 11.
Decimal to Binary Conversion:
Divide by 2 and record remainders.
Example:
11
in decimal is1011
in binary.
2. Encoding Integers
Unsigned Integers (B2U):
Represents non-negative numbers.
Range: 0 to 2w – 1 (where w is the number of bits).
Example: 4 bits can represent 0 to 15.
Signed Integers (B2T – Two’s Complement):
Represents both positive and negative numbers.
Range: -2w-1 to 2w-1 – 1.
Negation: Invert bits and add 1.
Example:
1101
in 4-bit two’s complement is-3
.
Overflow:
Occurs when a result exceeds the representable range.
Example: Adding
10 + 7
in 4-bit unsigned results in1
(overflow).
3. Fractional Binary Numbers
Fractional Binary Representation:
Bits to the right of the binary point represent negative powers of 2.
Example:
101.101
= 1×22 + 0×21 + 1×20 + 1×2-1 + 0×2-2 + 1×2-3 = 5.625.
Precision Limitation:
Only numbers of the form x/2k can be exactly represented.
Example:
1/3
cannot be exactly represented in binary.
4. IEEE Floating Point Standard (IEEE 754)
Floating Point Representation:
Sign bit (s): Determines if the number is negative or positive.
Significand (M): Fractional value in the range [1.0, 2.0).
Exponent (E): Weights the value by a power of 2.
Formula: v = (-1)s × M × 2E.
Normalized Values:
Exponent is neither all 0s nor all 1s.
Example:
0100 0110 0110 1101 1011 0100 0000 0000
represents15213.0
.
Denormalized Values:
Exponent is all 0s.
Used to represent very small numbers close to 0.
Special Values:
Infinity: Exponent is all 1s, significand is 0.
NaN (Not a Number): Exponent is all 1s, significand is non-zero.
5. Floating Point Arithmetic
Addition:
Align exponents, add significands, and normalize the result.
Example: 1.5×23 + 1.25×21 = 1.5×23 + 0.15625×23 = 1.65625×23.
Multiplication:
Multiply significands, add exponents, and normalize the result.
Example: 1.5×23 × 1.25×21 = 1.875×24.
Rounding:
Round to nearest even to avoid bias.
Example:
1.1011
rounded to 3 bits becomes1.110
.
6. Bit Shift Operations
Left Shift (
<<
):Shifts bits to the left, filling with
0
s.Equivalent to multiplying by 2k.
Example:
1010 << 2
=101000
.
Right Shift (
>>
):Shifts bits to the right.
Logical Shift: Fills with
0
s (for unsigned).Arithmetic Shift: Replicates the sign bit (for signed).
Example:
1010 >> 2
=0010
(logical),1110
(arithmetic).
7. Byte Ordering
Big Endian:
Most significant byte is stored at the lowest address.
Example:
0x01234567
is stored as01 23 45 67
.
Little Endian:
Least significant byte is stored at the lowest address.
Example:
0x01234567
is stored as67 45 23 01
.
8. Casting in C
Explicit Casting:
Convert between data types.
Example:
int x = (int) 3.14;
.
Implicit Casting:
Automatically converts types in expressions.
Example:
int x = 3.14;
(truncates to3
).
9. Mathematical Properties of Floating Point
Commutativity:
Addition and multiplication are commutative.
Example: a + b = b + a, a × b = b × a.
Associativity:
Addition and multiplication are not associative due to rounding errors.
Example: (a + b) + c ≠ a + (b + c).
Distributivity:
Multiplication does not distribute over addition due to rounding.
Example: a × (b + c) ≠ a × b + a × c.
10. Key Takeaways
Precision vs. Range:
Floating point offers a trade-off between precision and range.
Higher precision for small numbers, lower precision for large numbers.
Overflow and Underflow:
Be cautious of overflow in integer arithmetic and underflow in floating point.
Bitwise Operations:
Useful for low-level manipulation but requires careful handling of overflow and sign.