Computer Architecture: Concepts and Evolution

Posted on Dec 17, 2024 in Computers

Chapters on Computer Architecture

Chapter 1

1. The Computer Revolution

Progress in Technology: Enables novel applications like genomics, web services, and search engines.
Applications: Embedded in automobiles, smartphones, and more.
Pervasiveness: Computers are integral to various aspects of life.

2. Classes of Computers

Personal Computers: General-purpose, cost-performance tradeoffs.
Server Computers: Network-based, high capacity and reliability.
Supercomputers: High-end calculations, small market fraction.
Embedded Computers: Hidden in systems, strict cost and power constraints.

3. The PostPC Era

Personal Mobile Devices (PMDs): Battery-operated, internet-connected devices (e.g., smartphones, tablets).
Cloud Computing: Uses Warehouse-Scale Computers (WSC).
Software as a Service (SaaS): Splits software between PMDs and the cloud.

4. Key Concepts and Learning Goals

Translate programs into machine language.
Execute programs and optimize performance.
Design hardware for improved performance and parallel processing.

5. Understanding Performance

Algorithm: Determines the number of operations.
Programming Language & Compiler: Affect the number of instructions.
Processor/Memory: Influence instruction execution speed.
I/O System: Determines I/O operation speed.

6. Seven Great Ideas

1. Use abstraction to simplify design.
2. Optimize for common cases.
3. Leverage parallelism.
4. Utilize pipelining.
5. Employ prediction.
6. Build hierarchical memories.
7. Ensure dependability via redundancy.

7. Computer Components

Standard Components: Input/output devices, processors, memory, and storage.
Below Your Program:
Application Software: High-level languages.
System Software: Compilers and operating systems.
Hardware: Executes machine-level instructions.

8. Inside the Processor

Datapath: Executes data operations.
Control: Manages sequencing of tasks.
Cache Memory: Small, fast memory for immediate data access.

9. Performance Metrics

Response Time: Time to complete a task.
Throughput: Tasks completed per unit time.
Relative Performance: Performance = 1 / Execution Time.

10. CPU Performance

CPU Time: CPU Time = Clock Cycles x Clock Cycle Time.
Optimized by reducing cycles or increasing clock rate.
CPI (Cycles Per Instruction): CPI = Clock Cycles / Instruction Count.

11. Trends and Benchmarks

Technology Trends: Semiconductor advancements increase capacity and performance.
Power limitations (Power Wall) drive innovation.
Benchmarks: SPEC CPU benchmarks measure performance with standard workloads.

12. Key Pitfalls and Fallacies

Amdahl’s Law: Optimization’s impact is limited by the portion of the system that remains unaffected.
Low Power at Idle: Systems consume substantial power even at low workloads.
MIPS Metric Fallacy: Misleading due to varying instruction complexities across systems.

13. Concluding Remarks

Advancements: Hierarchical abstraction drives performance and design.
Execution Time: The ultimate metric for performance.
Future: Parallelism and energy efficiency will dominate innovations.

Chapter 2

1. Instruction Sets

Definition: Repertoire of computer instructions.

MIPS Instruction Set: Used for teaching; widely applied in consumer electronics, networking, & storage.

2. Arithmetic Operations

Design Principle 1: Simplicity favors regularity.

Example: add t0, g, h -> t0 = g + h.

C Code Example:

f = (g + h) – (i + j);

Compiled MIPS:

add t0, g, h

add t1, i, j

sub f, t0, t1

3. Registers & Operands

Registers: 32 registers, numbered $0-$31.

Temporary: $t0-$t9, Saved: $s0-$s7.

Design Principle 2: Smaller is faster.

Example: add $t0, $s1, $s2 # t0 = s1 + s2

4. Memory Operands

Byte Addressed: Each address identifies an 8-bit byte.
Load/Store Example: g = h + A[8];

lw $t0, 32($s3)
add $s1, $s2, $t0

5. Immediate Operands

Immediate Example: addi $s3, $s3, 4 # $s3 = $s3 + 4

6. Binary Numbers

Unsigned Range: 0 to 2^n-1.

Signed Range (2’s Complement): -2^(n-1) to 2^(n-1)-1.

7. Logical Operations

Bitwise Operations:

&: & $t0, $t1, $t2

OR: or $t0, $t1, $t2

NOT: nor $t0, $t1, $zero

sll (Shift Left): Multiplies by 2^i.

srl (Shift Right): Divides unsigned by 2^i.

8. Control Flow

Conditional Branches: beq $s1, $s2, Label # if (s1 == s2) branch

Unconditional Jump: j Label

9. Procedures

Steps:

1. Place arguments in $a0-$a3.

2. Call using jal.

3. Return with jr $ra.

Stack Frame:

Save: sw $ra, 4($sp).

Restore: lw $ra, 4($sp).

10. Instruction Representation

R-Format:

Example:

add $t0, $s1, $s2

11. Addressing Modes

Register: Operands in registers.

Immediate: Constants in instructions.

PC-Relative: Offset added to PC.

Base: Offset added to base address.

12. Synchronization

Atomic Swap Example:

ll $t1, 0($s1)

sc $t0, 0($s1)

beq $t0, $zero, retry

13. Compiling High-Level Code

If Statement:

if (i == j) f = g + h; else f = g – h;

bne $s3, $s4, Else

add $s0, $s1, $s2

j Exit

Else: sub $s0, $s1, $s2

14. Optimization Insights

Compiler Optimizations:

Reduce instruction count & CPI.

Optimize algorithms.

Chapter 3

1. Integer Operations

Addition: Overflow occurs if the result exceeds the signed range.
Positive + Positive -> Overflow if sign bit = 1.
Negative + Negative -> Overflow if sign bit = 0.
Subtraction: Performed as addition of the negated operand.
Overflow occurs if the result exceeds the signed range.
Positive – Negative -> Overflow if sign bit = 1.
Negative – Positive -> Overflow if sign bit = 0.

2. Dealing with Overflow

Languages like C: Ignore overflow using addu, subu.
Languages like Ada, Fortran: Raise an exception using add, addi, sub.
Handling Overflow: Use exception handlers.
Save PC in EPC register, jump to handler.

3. Arithmetic for Multimedia

SIMD (Single Instruction, Multiple Data): Operates on vectors (e.g., 8×8-bit, 4×16-bit).
Uses partitioned carry chain for vector operations.
Saturating Arithmetic: On overflow, returns max representable value.

4. Multiplication

MIPS Instructions:
mult rs, rt: 64-bit product in HI (MSBs) & LO (LSBs).
mfhi rd, mflo rd: Move results to rd.
mul rd, rs, rt: Outputs least-significant 32 bits.

5. Division

Steps:
Check for divisor = 0.
Long division approach: Generate quotient bits one at a time.
Restore divisor if remainder < 0.
Use absolute values for signed division; adjust signs of quotient/remainder.
MIPS Instructions:
div rs, rt: Result stored in HI (remainder) & LO (quotient).
Use mfhi, mflo to access results.

6. Floating-Point Representation

IEEE 754 Standard:
Single Precision (32-bit): 1 sign, 8 exponent, 23 fraction bits.
Double Precision (64-bit): 1 sign, 11 exponent, 52 fraction bits.
Exponent bias: Single = 127, Double = 1023.
Format: (-1)^S * (1.Fraction) * 2^(Exponent – Bias).
Ranges:
Single Precision: ±1.2 x 10^-38 to ±3.4 x 10^38.
Double Precision: ±2.2 x 10^-308 to ±1.8 x 10^308.

7. Parallelism & Subword Operations

Subword Parallelism: SIMD for graphics/audio processing.
Examples: 16×8-bit, 8×16-bit, 4×32-bit operations.
Matrix Multiplication: Optimized using SIMD & pipelining for performance.

8. Floating-Point Accuracy

Rounding Modes: IEEE 754 defines guard, round, & sticky bits.
Options to fine-tune numerical precision.
Programming Considerations: Balance hardware complexity, performance, & accuracy.
Validate parallel computations for associativity.

Chapter 4

1. CPU Performance Factors

Instruction Count: Determined by ISA and compiler.
CPI (Cycles per Instruction) and Cycle Time: Determined by CPU hardware.

2. Instruction Execution

1. Fetch instruction from memory.
2. Decode and read registers.
3. Perform operation via ALU.
4. Access memory if needed.
5. Write results to the register file.

3. Key Processor Components

Datapath: Registers, ALU, multiplexers, memory.
Control Unit: Directs operations and control signals.
Multiplexers: Resolve data path conflicts.

4. Building a Datapath

Instruction Fetch: Increment PC by 4.
R-Format Instructions: Perform arithmetic/logical operations.
Load/Store: Use ALU to compute memory address.
Branch Instructions: Calculate target address, check Zero flag.

5. Pipelining

Stages:
1. IF: Fetch instruction.
2. ID: Decode and read registers.
3. EX: Execute or calculate address.
4. MEM: Access memory.
5. WB: Write back result.
Pipeline Performance: Improves throughput; CPI approaches 1.

6. Hazards in Pipelining

Structural Hazards: Resource conflicts (resolved with separate instruction/data memories).
Data Hazards: Dependency on previous instruction results.
Solutions: Forwarding, stalls, reordering instructions.
Control Hazards: Branch prediction (static or dynamic).

7. Branch Prediction

Static: Predict ‘not taken’ or based on loop behavior.
Dynamic: Use hardware to track branch history and update predictions.
Penalty: Incorrect predictions stall the pipeline.

8. Exception Handling

Exceptions: Events requiring change in control flow (e.g., overflow).
Interrupts: External events requiring service.
Handling: Save PC in EPC register.
Identify cause (Cause register).
Jump to exception handler.

9. Instruction-Level Parallelism (ILP)

Multiple Issue:
Static: Compiler schedules instructions.
Dynamic: CPU schedules instructions, avoiding hazards.
Speculation: Execute instructions ahead of control dependencies, roll back if incorrect.

10. Performance Optimizations

Pipeline Depth: More stages reduce per-stage work but increase hazards.
Loop Unrolling: Expose parallelism by replicating loop bodies.
Register Renaming: Avoid data hazards with temporary registers.

11. Key Challenges

Dependencies: Data, structural, control.
Power Efficiency: Trade-offs between complexity and power consumption.
Instruction Set Design: Must align with pipeline needs.

12. Conclusion

Pipelining: Enhances throughput via parallelism.
Hazards: Addressed with advanced techniques like forwarding, prediction, and dynamic scheduling.
Future Directions: Focus on power-efficient designs and maximizing ILP.

Chapter 5

1. Principle of Locality

Temporal Locality: Recently accessed items are likely to be accessed again soon.
Spatial Locality: Items near recently accessed ones are likely to be accessed soon.

2. Memory Hierarchy Levels

Cache Memory: Fast, close to CPU (SRAM).
Main Memory: DRAM.
Disk Storage: Large, slow.
Key Metrics:
Hit: Data found in upper level.
Miss: Data copied from lower to upper level.
Miss Penalty: Time to retrieve a block from a lower level.

3. Cache Basics

Direct Mapped Cache: One location per block.
Associative Cache: Block can be stored in multiple locations.
Set Associative Cache: N locations for a block.
Replacement Policies:
Least Recently Used (LRU): Replace the least-used block.
Random: Simpler, approximates LRU.

4. Improving Cache Performance

Average Memory Access Time (AMAT): AMAT = Hit Time + (Miss Rate x Miss Penalty).
Techniques: Increase cache size, block size, and associativity.

5. Write Strategies

Write-Through: Update cache and lower levels.
Write-Back: Update lower levels only when block is replaced.
Write Allocation: Fetch block on a write miss.

6. Multilevel Caches

Primary Cache (L1): Small, fast, minimal hit time.
Secondary Cache (L2): Larger, reduces main memory access.

7. Virtual Memory

Uses main memory as a cache for secondary storage.
Pages: Fixed-size blocks in virtual memory.
Page Fault: Occurs when data is not in main memory.

8. Dependability

Fault Types:
Soft Errors: Non-permanent.
Hard Errors: Permanent.
Error-Correcting Codes (ECC): Detect and correct bit errors.

9. Cache Coherence in Multiprocessors

Snooping Protocols: Monitor shared memory operations.
Directory-Based Protocols: Use a directory to track block sharing.

10. Advanced Techniques

Blocking: Optimize memory accesses in algorithms.
Prefetching: Load data before it is needed.
Non-Blocking Caches: Allow multiple cache misses to overlap.

11. Key Challenges

Power Consumption: Balancing performance and efficiency.
Memory Access Patterns: Optimizing software for cache usage.

12. Concluding Remarks

Caching: Key to achieving fast, large memories.
Principle of Locality: Core to memory hierarchy design.

Important Keywords and Definitions

Memory Hierarchy

Temporal Locality: The concept that recently accessed memory locations are likely to be accessed again soon.

Spatial Locality: The concept that memory locations near recently accessed addresses are likely to be accessed in the near future.

Cache Memory

Cache Memory: A small, fast memory located close to the CPU to store frequently accessed data.

Direct Mapped Cache: A cache structure where each block of main memory maps to only one location in the cache.

Associative Cache: A cache structure where a block can be placed in any location, providing flexibility but increasing complexity.

Set Associative Cache: A hybrid cache organization that divides the cache into sets, where a block can be mapped to any location within a set.

Least Recently Used (LRU): A cache replacement policy that removes the least recently accessed block.

Write Strategies

Write-Through: A cache write strategy where data is written to both the cache and main memory simultaneously.

Write-Back: A cache write strategy where data is written to main memory only when it is evicted from the cache.

Write Allocation: A policy where a block is loaded into the cache on a write miss before writing to it.

Virtual Memory

Virtual Memory: A memory management technique where secondary storage (e.g., disk) is used to extend the apparent size of main memory.

Page Fault: An event that occurs when a program accesses a page not currently in main memory, requiring it to be loaded from secondary storage.

Dependability

Error-Correcting Codes (ECC): Techniques used to detect and correct errors in memory or data transmission.

Cache Coherence

Snooping Protocols: Methods used in multiprocessor systems to monitor and maintain consistency of shared cache lines.

Performance Techniques

Blocking: A technique to optimize memory accesses in algorithms by processing blocks of data that fit into the cache.

Prefetching: The process of loading data into the cache before it is requested by the processor.

Non-Blocking Caches: Caches that allow multiple memory operations to proceed simultaneously.

Processor Performance

Instruction-Level Parallelism (ILP): A measure of how many instructions can be executed simultaneously in a pipeline.

Multicore Processors: CPUs with multiple processing cores on a single chip to enhance parallelism and performance.

Pipeline Optimizations

Branch Prediction: A technique to guess the outcome of a branch instruction to reduce pipeline stalls.

Pipeline Stalls: Delays in the pipeline caused by dependencies or unresolved instructions.

Forwarding: A technique to resolve data hazards by using intermediate results directly from the pipeline

Answer to the Questions:

Fastest and Slowest Memory in Traditional Computer Architecture:
- Fastest Memory: Cache Memory (especially L1 cache) because it is small and located close to the CPU, designed for rapid data retrieval.
- Slowest Memory: Disk Storage (e.g., Hard Disk Drives) due to its mechanical nature and significant latency compared to electronic memories like RAM or cache.
Difference Between Direct Mapping and Associative Mapping of Memory:
- Direct Mapping: Each block of main memory maps to exactly one cache line. It is simple and fast but can lead to frequent conflicts if multiple blocks map to the same line.
- Associative Mapping: A block of main memory can map to any cache line. This approach reduces conflicts and improves cache utilization but increases the complexity of searching for a block in the cache.

.data
promptB: .asciiz “Enter the base (B) of the triangle: “
promptH: .asciiz “Enter the height (H) of the triangle: “
result: .asciiz “The area of the triangle is: “

.text
.globl main

main:
# Prompt the user for the base (B)
li $v0, 4 # syscall for printing string
la $a0, promptB # load address of promptB into $a0
syscall

li $v0, 5 # syscall for reading an integer
syscall
move $t0, $v0 # store the base (B) in $t0

# Prompt the user for the height (H)
li $v0, 4 # syscall for printing string
la $a0, promptH # load address of promptH into $a0
syscall

li $v0, 5 # syscall for reading an integer
syscall
move $t1, $v0 # store the height (H) in $t1

# Calculate the area: A = (B * H) / 2
mul $t2, $t0, $t1 # $t2 = B * H
srl $t3, $t2, 1 # $t3 = (B * H) / 2 (using bit shift for division by 2)

# Print the result
li $v0, 4 # syscall for printing string
la $a0, result # load address of result message into $a0
syscall

li $v0, 1 # syscall for printing integer
move $a0, $t3 # move the calculated area to $a0
syscall

# Exit program
li $v0, 10 # syscall for exiting program
syscall

Computer Architecture: Concepts and Evolution

Chapters on Computer Architecture

Chapter 1

1. The Computer Revolution

2. Classes of Computers

3. The PostPC Era

4. Key Concepts and Learning Goals

5. Understanding Performance

6. Seven Great Ideas

7. Computer Components

8. Inside the Processor

9. Performance Metrics

10. CPU Performance

11. Trends and Benchmarks

12. Key Pitfalls and Fallacies

13. Concluding Remarks

Chapter 2

1. Instruction Sets

2. Arithmetic Operations

3. Registers & Operands

4. Memory Operands

5. Immediate Operands

6. Binary Numbers

7. Logical Operations

8. Control Flow

9. Procedures

10. Instruction Representation

11. Addressing Modes

12. Synchronization

13. Compiling High-Level Code

14. Optimization Insights

Chapter 3

1. Integer Operations

2. Dealing with Overflow

3. Arithmetic for Multimedia

4. Multiplication

5. Division

6. Floating-Point Representation

7. Parallelism & Subword Operations

8. Floating-Point Accuracy

Chapter 4

1. CPU Performance Factors

2. Instruction Execution

3. Key Processor Components

4. Building a Datapath

5. Pipelining

6. Hazards in Pipelining

7. Branch Prediction

8. Exception Handling

9. Instruction-Level Parallelism (ILP)

10. Performance Optimizations

11. Key Challenges

12. Conclusion

Chapter 5

1. Principle of Locality

2. Memory Hierarchy Levels

3. Cache Basics

4. Improving Cache Performance

5. Write Strategies

6. Multilevel Caches

7. Virtual Memory

8. Dependability

9. Cache Coherence in Multiprocessors

10. Advanced Techniques

11. Key Challenges

12. Concluding Remarks

Important Keywords and Definitions

Memory Hierarchy

Cache Memory

Write Strategies

Virtual Memory

Dependability

Cache Coherence

Performance Techniques

Processor Performance

Pipeline Optimizations

Answer to the Questions:

Recent Notes

Subjects

Publicidad