I/O System Organization and Performance
Part 2 – Interface Between Processors and Peripherals
I/O Organization (T.1)
Introduction (1.1)
A computer system comprises three subsystems: CPU, memory, and I/O. The I/O system facilitates data movement between external devices and the CPU-Memory tandem. It includes:
- I/O devices: These devices interface the computer with external peripherals, enabling user interaction (e.g., mouse, keyboard) and device-to-device communication (e.g., network, storage).
- Interconnections: These are the physical connections between components, including transmission mechanisms, I/O interfaces, and the I/O organization.
I/O system characteristics are often determined by current technology and influenced by other system components.
I/O Features:
- Diverse devices with varying transfer speeds.
- Peripheral speeds often much slower than CPU or memory.
- Different data formats and word sizes.
I/O System Design Parameters:
- Performance
- Scalability and expandability
- Fault tolerance
Performance Measures (1.2)
Bandwidth, latency, and cost are interconnected. Generally, increasing bandwidth increases cost. Improving latency can be challenging without changing technology.
Latency (Response Time): Total time to complete a task, measured in time units or clock cycles.
- TCPU = TUser + TSystem
- Improved performance = Lower latency
Bandwidth (Throughput): Amount of work done in a given time, measured in quantity per unit time.
- Data bandwidth (data rate): Data moved per unit time.
- Operations bandwidth (I/O rate): I/O operations per unit time.
- Improved performance = Higher bandwidth
- Concurrency increases bandwidth beyond 1/latency.
Other Performance Indicators:
- I/O interference with the processor (ideally minimized).
- Device diversity.
- Capacity/Scalability/Expandability.
- Storage capacity (for storage devices).
Computer Performance: Often focuses on CPU performance, neglecting other components.
- Time = Instruction Count x Cycles per Instruction x Clock Cycle Time
Optimizing CPU time isn’t the only way to improve performance; memory and I/O also play significant roles.
Improvement Options:
- Optimize CPU (speed and efficiency).
- Optimize memory (access efficiency).
- Optimize I/O (operation efficiency).
Systems can be CPU-limited, memory-limited, or I/O-limited.
Speedup (after an upgrade): PerformanceAfter / PerformanceBefore
Amdahl’s Law: Calculates speedup based on:
- FractionEnhancement: Fraction of original execution time benefiting from the improvement.
- SpeedupEnhancement: Speedup if the entire system benefited.
- SpeedupOverall = 1 / [(1 – FractionImprovement) + (FractionImprovement / SpeedupEnhancement)]
I/O Device Model (1.3)
Two basic components:
- Physical Device: The mechanical part performing the peripheral’s tasks.
- Device Controller: The electronic interface between the device and the system. Functions include control/timing, data buffering, and error detection.
The CPU communicates with devices using I/O registers.
CPU-I/O Interface (Logical) (1.4)
For CPU access, I/O devices must be addressable. Two approaches:
- Memory-Mapped I/O: Reserves a portion of the memory map for I/O. The CPU accesses I/O registers like memory locations, simplifying design.
Cache Line Search:
- Index: Selects a cache set.
- Tag: Compared with all tags in the set.
- Valid bit: Indicates if the entry has a valid address.
- Higher associativity increases tag size.
Cache Line Replacement:
- Direct-mapped: Only one choice.
- Set-associative/Fully associative: Choose from a set or all lines.
- Strategies:
- Random
- Least Recently Used (LRU)
Write Operations
Two basic options:
- Write-Through: Data written to cache and lower-level memory. Advantages: Simple, less costly read misses, memory always updated.
- Write-Back: Data written to cache; updated in memory upon line replacement. Advantages: Faster cache writes, efficient memory writes.
Modified Bit: Indicates if a line has changed in write-back. Reduces write frequency.
Write Miss Options:
- Write Allocate/Fetch on Write: Load line and write. Subsequent writes won’t miss.
- No-Write Allocate/Write-Around: Modify lower level, don’t load in cache. Subsequent writes may miss.
Cache Performance (3.3)
Average Memory Access Time = Hit Time + (Miss Rate x Miss Penalty)
Cache behavior significantly impacts performance, especially in CPUs with low CPI and faster clocks.
Sources of Misses (3.4)
- Compulsory: First access to a line.
- Capacity: Cache too small.
- Conflict: Occur in set-associative or direct-mapped caches due to limited set size.
Data and Instruction Caches (3.5)
Unified caches can be a bottleneck. Separate caches offer higher bandwidth, independent optimization, and eliminate instruction/data conflicts, but can have lower hit rates and consistency issues.
Miss Rate Reduction (3.6)
Larger Line Size: Reduces compulsory misses but can increase conflict misses and miss penalty.
Higher Associativity: Reduces conflict misses. 8-way set-associative is often as effective as fully associative.