I/O System Organization and Performance

Posted on Oct 28, 2024 in Computers

Part 2 – Interface Between Processors and Peripherals

I/O Organization (T.1)

Introduction (1.1)

A computer system comprises three subsystems: CPU, memory, and I/O. The I/O system facilitates data movement between external devices and the CPU-Memory tandem. It includes:

I/O devices: These devices interface the computer with external peripherals, enabling user interaction (e.g., mouse, keyboard) and device-to-device communication (e.g., network, storage).
Interconnections: These are the physical connections between components, including transmission mechanisms, I/O interfaces, and the I/O organization.

I/O system characteristics are often determined by current technology and influenced by other system components.

I/O Features:

Diverse devices with varying transfer speeds.
Peripheral speeds often much slower than CPU or memory.
Different data formats and word sizes.

I/O System Design Parameters:

Performance
Scalability and expandability
Fault tolerance

Performance Measures (1.2)

Bandwidth, latency, and cost are interconnected. Generally, increasing bandwidth increases cost. Improving latency can be challenging without changing technology.

Latency (Response Time): Total time to complete a task, measured in time units or clock cycles.

T_CPU = T_User + T_System
Improved performance = Lower latency

Bandwidth (Throughput): Amount of work done in a given time, measured in quantity per unit time.

Data bandwidth (data rate): Data moved per unit time.
Operations bandwidth (I/O rate): I/O operations per unit time.
Improved performance = Higher bandwidth
Concurrency increases bandwidth beyond 1/latency.

Other Performance Indicators:

I/O interference with the processor (ideally minimized).
Device diversity.
Capacity/Scalability/Expandability.
Storage capacity (for storage devices).

Computer Performance: Often focuses on CPU performance, neglecting other components.

Time = Instruction Count x Cycles per Instruction x Clock Cycle Time

Optimizing CPU time isn’t the only way to improve performance; memory and I/O also play significant roles.

Improvement Options:

Optimize CPU (speed and efficiency).
Optimize memory (access efficiency).
Optimize I/O (operation efficiency).

Systems can be CPU-limited, memory-limited, or I/O-limited.

Speedup (after an upgrade): Performance_After / Performance_Before

Amdahl’s Law: Calculates speedup based on:

Fraction_Enhancement: Fraction of original execution time benefiting from the improvement.
Speedup_Enhancement: Speedup if the entire system benefited.
Speedup_Overall = 1 / [(1 – Fraction_Improvement) + (Fraction_Improvement / Speedup_Enhancement)]

I/O Device Model (1.3)

Two basic components:

Physical Device: The mechanical part performing the peripheral’s tasks.
Device Controller: The electronic interface between the device and the system. Functions include control/timing, data buffering, and error detection.

The CPU communicates with devices using I/O registers.

CPU-I/O Interface (Logical) (1.4)

For CPU access, I/O devices must be addressable. Two approaches:

Memory-Mapped I/O: Reserves a portion of the memory map for I/O. The CPU accesses I/O registers like memory locations, simplifying design.

Cache Line Search:

Index: Selects a cache set.
Tag: Compared with all tags in the set.
Valid bit: Indicates if the entry has a valid address.
Higher associativity increases tag size.

Cache Line Replacement:

Direct-mapped: Only one choice.
Set-associative/Fully associative: Choose from a set or all lines.
Strategies:
- Random
- Least Recently Used (LRU)

Write Operations

Two basic options:

Write-Through: Data written to cache and lower-level memory. Advantages: Simple, less costly read misses, memory always updated.
Write-Back: Data written to cache; updated in memory upon line replacement. Advantages: Faster cache writes, efficient memory writes.

Modified Bit: Indicates if a line has changed in write-back. Reduces write frequency.

Write Miss Options:

Write Allocate/Fetch on Write: Load line and write. Subsequent writes won’t miss.
No-Write Allocate/Write-Around: Modify lower level, don’t load in cache. Subsequent writes may miss.

Cache Performance (3.3)

Average Memory Access Time = Hit Time + (Miss Rate x Miss Penalty)

Cache behavior significantly impacts performance, especially in CPUs with low CPI and faster clocks.

Sources of Misses (3.4)

Compulsory: First access to a line.
Capacity: Cache too small.
Conflict: Occur in set-associative or direct-mapped caches due to limited set size.

Data and Instruction Caches (3.5)

Unified caches can be a bottleneck. Separate caches offer higher bandwidth, independent optimization, and eliminate instruction/data conflicts, but can have lower hit rates and consistency issues.

Miss Rate Reduction (3.6)

Larger Line Size: Reduces compulsory misses but can increase conflict misses and miss penalty.

Higher Associativity: Reduces conflict misses. 8-way set-associative is often as effective as fully associative.