Assembly Language and Computer Architecture Essentials

Chapter 7: IEEE 32-bit Float

IEEE 32-bit float representation includes:

  • 1 sign bit
  • An 8-bit biased exponent. To calculate, normalize the binary number to the 1.xxxx form, and add 127 (127 = 01111111 in binary).
  • 23 bits for the fraction (mantissa), using the binary number after the decimal.

Chapter 11: Assembly Language

General Purpose Registers

Registers hold temporary data and instructions:

  • %rdi (edi), %rsi (esi), %rdx (edx), %rcx, %rbx, %rax
  • %rsp: Stack Pointer – Points to the top of the stack.
  • %rbp: Base Pointer – Points to the base of the current stack frame.
  • %rip: Instruction Pointer – Address of the next instruction to be executed.

Flags Register

Flags indicate the status of operations:

  • CF (Carry Flag): Set if there’s an unsigned overflow (or underflow in subtraction).
  • ZF (Zero Flag): Set if the result of an operation is zero.
  • SF (Sign Flag): Set if the result of an operation is negative.
  • OF (Overflow Flag): Set if there’s a signed overflow.

Common Instructions

  • cmpq (64-bit): Compares two values by subtracting the second operand from the first. Affects flags.
  • leal: “Load Effective Address Long”. Performs address arithmetic with 32-bit operands. Stores the calculated 32-bit address in the destination operand. Does not affect flags.
  • Jump (jX):
    • jmp: Unconditional jump.
    • je / jz: Jump if equal or zero (ZF set).
    • jne / jnz: Jump if not equal or not zero (ZF not set).
    • js: Jump if sign (SF set).
    • jns: Jump if not sign (SF not set).
    • jg: Jump if greater (signed; SF equals OF and ZF not set).
    • jge: Jump if greater or equal (signed; SF equals OF).
    • jl: Jump if less (signed; SF not equal to OF).
    • jle: Jump if less or equal (signed; ZF set or SF not equal to OF).
    • ja: Jump if above (unsigned; CF and ZF not set).
    • jae: Jump if above or equal (unsigned; CF not set).
    • jb: Jump if below (unsigned; CF set).
    • jbe: Jump if below or equal (unsigned; CF set or ZF set).
  • set: Sets a byte based on flag conditions after a cmp or test. For example, setl %al sets %al to 1 if the last comparison determined the destination was less than the source (signed).
  • test: Performs a bitwise AND. test s2, s1 sets condition codes according to S1&S2. Affects flags (ZF if the result is zero, SF if negative).
  • mov: Transfers data without affecting flags.
  • Bitwise Operations: AND, OR, XOR, NOT.
  • Shift:
    • shl/sal: Shift left (multiplies by 2).
    • shr: Shift right (divides by 2, discarding the LSB).
    • sar: Arithmetic shift right (divides by 2, preserving the sign).
  • Conditional Move (cmovXX): Moves data based on a condition.
    • cmove / cmovz (S, D): Move if equal/zero (ZF is set).
    • cmovne / cmovnz (S, D): Move if not equal/nonzero (ZF is not set).
    • cmovs (S, D): Move if negative (SF is set).
    • cmovns (S, D): Move if nonnegative (SF is not set).
    • cmovg / cmovnle (S, D): Move if greater (signed) (SF equals OF, and ZF is not set).
    • cmovge / cmovnl (S, D): Move if greater or equal (signed) (SF equals OF).
    • cmovl / cmovnge (S, D): Move if less (signed) (SF does not equal OF).
    • cmovle / cmovng (S, D): Move if less or equal (signed) (SF does not equal OF or ZF is set).
    • cmova / cmovnbe (S, D): Move if above (unsigned) (CF is not set and ZF is not set).
    • cmovae / cmovnb (S, D): Move if above or equal (unsigned) (CF is not set).
    • cmovb / cmovnae (S, D): Move if below (unsigned) (CF is set).
    • cmovbe / cmovna (S, D): Move if below or equal (unsigned) (CF is set or ZF is set).

Popcount Example

Counts ‘1’ bits in a binary number using loops in C and assembly.

Additional Instructions

  • add: Add source to destination. addl rax, rsi results in rsi = rax + rsi.
  • call: Call a function, with the return value stored in %eax.
  • imul: Multiply destination by source.
  • movq %rdi, %rbx: Move the value of %rdi into %rbx.
  • subq %rsi, %rax: %rax = %rax – %rsi.
  • Example: cmpl %edx, -4(%rdi,%rax,4) uses Mem[Base Register + (Index Register × Scale Factor) + Displacement] which gives: Base Register = %rdi, Index Register = %rax, Scale Factor = 4, Displacement = -4, resulting in a[n-1].

Chapter 13: Arrays

Integer array A is in %rdx, i is in %rcx.

  • movl 4(%rdx,%rcx,4), %eax corresponds to A[i+1].
  • addl $1, (%rdi,%rax,4) corresponds to z[i]++.

Example: Finding M and N for arrays a[M][N] and b[N][M]. Copying element b[j][i] to a[i][j] requires calculating j*M+i for b[j][i] and i*N+j for a[i][j].

x86-64 Alignment

  • 1 byte (char): No address restrictions.
  • 2 bytes (short): The lowest bit of the address must be 0.
  • 4 bytes (int, float): The lowest 2 bits of the address must be 00.
  • 8 bytes (double, long, char *): The lowest 3 bits of the address must be 000.

Chapters 17-18: Cache Memory

To find the number of sets in a cache: cache size / (number of ways * block size).

In a 2-way set associative cache, 2 blocks can be stored in the same set.

Example: A system with 20-bit addresses, a total cache size (C) of 2048 bytes, a block size (B) of 16 bytes, and 32 sets (S). The associativity is calculated as 32 * E * 16 = 2048, which simplifies to E = 4. This is a 4-way set associative cache.

  • Block Offset: 4 bits (24 = 16 bytes).
  • Set Index: 5 bits (25 = 32 sets).
  • Tag: 11 bits (remaining from the 20-bit address).

Reading a 4-byte word with the 20-bit address 0x0B1E4:

  • Binary address: 00001011000111100100
  • Tag: 00001011000 (88 in decimal)
  • Set Index: 11110 (30 in decimal)
  • Block Offset: 0100 (4 in decimal)

Addresses that can conflict in set number 23 (0x17) depend only on the set index.

Memory Hierarchy

  • Levels: From right to left, if there’s a drop, then L1, L2, etc. where the drop indicates the size.
  • Main memory = small written (no size).

The more efficient computer is the one that accesses most data in L1.

If the cache is 512 bytes, with a line size (B) of 16 bytes and uses direct mapping (E=1), the number of sets is calculated as 512 = S * 1 * 16, so S = 32. Number of Sets (S) = Cache Size / Block Size.

Chapter 17: Cache Hits and Misses

  • Cache Hit: Data can be delivered quickly from the cache.
  • Cache Miss: Data is not in the cache; it’s retrieved from memory and then stored in the cache. The placement policy determines where the item goes, and the replacement policy determines which block is evicted.
  • Cold Miss: Occurs because the cache starts empty and it’s the first reference to the block.
  • Capacity Miss: Occurs when the set of active blocks (working set) is larger than the cache.
  • Conflict Miss: Occurs when the cache at level k is large enough, but multiple active data objects map to the same block at level k (e.g., 0 and 8 mod 4 = 0).

Memory Hierarchy Examples

  • Register (Gistu): Fastest, stores smallest units of data.
  • Translation Lookaside Buffer (TLB): Speeds up virtual-to-physical address translation.
  • Level 1 (L1) Cache: Very fast, on-chip memory.
  • Level 2 (L2) Cache: Larger than L1 but slower, on-chip or close to it.
  • Main Memory (Sýndarminni/RAM): Slower than caches but larger.
  • Buffer Memory (Skráarbiðminni): Buffers files between main memory and disk.
  • Disk Cache (Diskabiðminni): Caches data on disk.
  • Network Memory (Netbiðminni): Caches network files.
  • Web Memory (Veframinni and Vefbiðminni): Caches for web pages (Veframinni is RAM, Vefbiðminni is a distant cache like a CDN).

CPU, RAM, and GPU

  • CPU (Central Processing Unit): The “brain” of the computer, performs most calculations.
  • RAM (Random Access Memory): Short-term memory, stores actively used data. DRAM = raunminni (physical memory)
  • GPU (Graphics Processing Unit): Accelerates graphics rendering, also used for parallel computations.

Chapter 14

N7HT-KDFGEID6I5CUqr-oY8Bd5b_7YsTMLjjNvfP6jXbiaJZ-HLgoxJwK48earileXI6dX2ZATY2EgasufvHWTIyyws-_YRFhPT_lHJeeZyNeGcevjmHRdDWj2rFZCWou0tAeQDQG1g_rh_BbzQ1Ofc

Canary value (buffer overflow protection mechanism)

Virtual Memory

  • Simplifies memory handling.
  • Isolates address spaces (located in different places in physical memory).

In the CPU, the MMU (Memory Management Unit) translates virtual addresses into physical addresses. When a program references a virtual memory location, the MMU checks if the data is in the CPU’s cache or RAM. If not, it triggers a page fault.

  • Page Hit: Reference to a virtual memory word that is in physical memory (DRAM hit).
  • Page Fault: Reference to a virtual memory word that is not in physical memory (DRAM miss). The data needs to be fetched from virtual memory (disk), causing a page fault.