Assembly Language and Computer Architecture Essentials
Chapter 7: IEEE 32-bit Float
IEEE 32-bit float representation includes:
- 1 sign bit
- An 8-bit biased exponent. To calculate, normalize the binary number to the 1.xxxx form, and add 127 (127 = 01111111 in binary).
- 23 bits for the fraction (mantissa), using the binary number after the decimal.
Chapter 11: Assembly Language
General Purpose Registers
Registers hold temporary data and instructions:
- %rdi (edi), %rsi (esi), %rdx (edx), %rcx, %rbx, %rax
- %rsp: Stack Pointer – Points to the top of the stack.
- %rbp: Base Pointer – Points to the base of the current stack frame.
- %rip: Instruction Pointer – Address of the next instruction to be executed.
Flags Register
Flags indicate the status of operations:
- CF (Carry Flag): Set if there’s an unsigned overflow (or underflow in subtraction).
- ZF (Zero Flag): Set if the result of an operation is zero.
- SF (Sign Flag): Set if the result of an operation is negative.
- OF (Overflow Flag): Set if there’s a signed overflow.
Common Instructions
- cmpq (64-bit): Compares two values by subtracting the second operand from the first. Affects flags.
- leal: “Load Effective Address Long”. Performs address arithmetic with 32-bit operands. Stores the calculated 32-bit address in the destination operand. Does not affect flags.
- Jump (jX):
- jmp: Unconditional jump.
- je / jz: Jump if equal or zero (ZF set).
- jne / jnz: Jump if not equal or not zero (ZF not set).
- js: Jump if sign (SF set).
- jns: Jump if not sign (SF not set).
- jg: Jump if greater (signed; SF equals OF and ZF not set).
- jge: Jump if greater or equal (signed; SF equals OF).
- jl: Jump if less (signed; SF not equal to OF).
- jle: Jump if less or equal (signed; ZF set or SF not equal to OF).
- ja: Jump if above (unsigned; CF and ZF not set).
- jae: Jump if above or equal (unsigned; CF not set).
- jb: Jump if below (unsigned; CF set).
- jbe: Jump if below or equal (unsigned; CF set or ZF set).
- set: Sets a byte based on flag conditions after a cmp or test. For example, setl %al sets %al to 1 if the last comparison determined the destination was less than the source (signed).
- test: Performs a bitwise AND. test s2, s1 sets condition codes according to S1&S2. Affects flags (ZF if the result is zero, SF if negative).
- mov: Transfers data without affecting flags.
- Bitwise Operations: AND, OR, XOR, NOT.
- Shift:
- shl/sal: Shift left (multiplies by 2).
- shr: Shift right (divides by 2, discarding the LSB).
- sar: Arithmetic shift right (divides by 2, preserving the sign).
- Conditional Move (cmovXX): Moves data based on a condition.
- cmove / cmovz (S, D): Move if equal/zero (ZF is set).
- cmovne / cmovnz (S, D): Move if not equal/nonzero (ZF is not set).
- cmovs (S, D): Move if negative (SF is set).
- cmovns (S, D): Move if nonnegative (SF is not set).
- cmovg / cmovnle (S, D): Move if greater (signed) (SF equals OF, and ZF is not set).
- cmovge / cmovnl (S, D): Move if greater or equal (signed) (SF equals OF).
- cmovl / cmovnge (S, D): Move if less (signed) (SF does not equal OF).
- cmovle / cmovng (S, D): Move if less or equal (signed) (SF does not equal OF or ZF is set).
- cmova / cmovnbe (S, D): Move if above (unsigned) (CF is not set and ZF is not set).
- cmovae / cmovnb (S, D): Move if above or equal (unsigned) (CF is not set).
- cmovb / cmovnae (S, D): Move if below (unsigned) (CF is set).
- cmovbe / cmovna (S, D): Move if below or equal (unsigned) (CF is set or ZF is set).
Popcount Example
Counts ‘1’ bits in a binary number using loops in C and assembly.
Additional Instructions
- add: Add source to destination. addl rax, rsi results in rsi = rax + rsi.
- call: Call a function, with the return value stored in %eax.
- imul: Multiply destination by source.
- movq %rdi, %rbx: Move the value of %rdi into %rbx.
- subq %rsi, %rax: %rax = %rax – %rsi.
- Example: cmpl %edx, -4(%rdi,%rax,4) uses Mem[Base Register + (Index Register × Scale Factor) + Displacement] which gives: Base Register = %rdi, Index Register = %rax, Scale Factor = 4, Displacement = -4, resulting in a[n-1].
Chapter 13: Arrays
Integer array A is in %rdx, i is in %rcx.
- movl 4(%rdx,%rcx,4), %eax corresponds to A[i+1].
- addl $1, (%rdi,%rax,4) corresponds to z[i]++.
Example: Finding M and N for arrays a[M][N] and b[N][M]. Copying element b[j][i] to a[i][j] requires calculating j*M+i for b[j][i] and i*N+j for a[i][j].
x86-64 Alignment
- 1 byte (char): No address restrictions.
- 2 bytes (short): The lowest bit of the address must be 0.
- 4 bytes (int, float): The lowest 2 bits of the address must be 00.
- 8 bytes (double, long, char *): The lowest 3 bits of the address must be 000.
Chapters 17-18: Cache Memory
To find the number of sets in a cache: cache size / (number of ways * block size).
In a 2-way set associative cache, 2 blocks can be stored in the same set.
Example: A system with 20-bit addresses, a total cache size (C) of 2048 bytes, a block size (B) of 16 bytes, and 32 sets (S). The associativity is calculated as 32 * E * 16 = 2048, which simplifies to E = 4. This is a 4-way set associative cache.
- Block Offset: 4 bits (24 = 16 bytes).
- Set Index: 5 bits (25 = 32 sets).
- Tag: 11 bits (remaining from the 20-bit address).
Reading a 4-byte word with the 20-bit address 0x0B1E4:
- Binary address: 00001011000111100100
- Tag: 00001011000 (88 in decimal)
- Set Index: 11110 (30 in decimal)
- Block Offset: 0100 (4 in decimal)
Addresses that can conflict in set number 23 (0x17) depend only on the set index.
Memory Hierarchy
- Levels: From right to left, if there’s a drop, then L1, L2, etc. where the drop indicates the size.
- Main memory = small written (no size).
The more efficient computer is the one that accesses most data in L1.
If the cache is 512 bytes, with a line size (B) of 16 bytes and uses direct mapping (E=1), the number of sets is calculated as 512 = S * 1 * 16, so S = 32. Number of Sets (S) = Cache Size / Block Size.
Chapter 17: Cache Hits and Misses
- Cache Hit: Data can be delivered quickly from the cache.
- Cache Miss: Data is not in the cache; it’s retrieved from memory and then stored in the cache. The placement policy determines where the item goes, and the replacement policy determines which block is evicted.
- Cold Miss: Occurs because the cache starts empty and it’s the first reference to the block.
- Capacity Miss: Occurs when the set of active blocks (working set) is larger than the cache.
- Conflict Miss: Occurs when the cache at level k is large enough, but multiple active data objects map to the same block at level k (e.g., 0 and 8 mod 4 = 0).
Memory Hierarchy Examples
- Register (Gistu): Fastest, stores smallest units of data.
- Translation Lookaside Buffer (TLB): Speeds up virtual-to-physical address translation.
- Level 1 (L1) Cache: Very fast, on-chip memory.
- Level 2 (L2) Cache: Larger than L1 but slower, on-chip or close to it.
- Main Memory (Sýndarminni/RAM): Slower than caches but larger.
- Buffer Memory (Skráarbiðminni): Buffers files between main memory and disk.
- Disk Cache (Diskabiðminni): Caches data on disk.
- Network Memory (Netbiðminni): Caches network files.
- Web Memory (Veframinni and Vefbiðminni): Caches for web pages (Veframinni is RAM, Vefbiðminni is a distant cache like a CDN).
CPU, RAM, and GPU
- CPU (Central Processing Unit): The “brain” of the computer, performs most calculations.
- RAM (Random Access Memory): Short-term memory, stores actively used data. DRAM = raunminni (physical memory)
- GPU (Graphics Processing Unit): Accelerates graphics rendering, also used for parallel computations.
Chapter 14
Canary value (buffer overflow protection mechanism)
Virtual Memory
- Simplifies memory handling.
- Isolates address spaces (located in different places in physical memory).
In the CPU, the MMU (Memory Management Unit) translates virtual addresses into physical addresses. When a program references a virtual memory location, the MMU checks if the data is in the CPU’s cache or RAM. If not, it triggers a page fault.
- Page Hit: Reference to a virtual memory word that is in physical memory (DRAM hit).
- Page Fault: Reference to a virtual memory word that is not in physical memory (DRAM miss). The data needs to be fetched from virtual memory (disk), causing a page fault.