I/O System Organization and Performance

Part 2 – Interface Between Processors and Peripherals

I/O Organization (T.1)

Introduction (1.1)

A computer system comprises three subsystems: CPU, memory, and I/O. The I/O system facilitates data movement between external devices and the CPU-Memory tandem. It includes:

  • I/O devices: These devices interface the computer with external peripherals, enabling user interaction (e.g., mouse, keyboard) and device-to-device communication (e.g., network, storage).
  • Interconnections: These are the physical connections between components, including transmission mechanisms, I/O interfaces, and the I/O organization.

I/O system characteristics are often determined by current technology and influenced by other system components.

I/O Features:

  • Diverse devices with varying transfer speeds.
  • Peripheral speeds often much slower than CPU or memory.
  • Different data formats and word sizes.

I/O System Design Parameters:

  • Performance
  • Scalability and expandability
  • Fault tolerance

Performance Measures (1.2)

Bandwidth, latency, and cost are interconnected. Generally, increasing bandwidth increases cost. Improving latency can be challenging without changing technology.

Latency (Response Time): Total time to complete a task, measured in time units or clock cycles.

  • TCPU = TUser + TSystem
  • Improved performance = Lower latency

Bandwidth (Throughput): Amount of work done in a given time, measured in quantity per unit time.

  • Data bandwidth (data rate): Data moved per unit time.
  • Operations bandwidth (I/O rate): I/O operations per unit time.
  • Improved performance = Higher bandwidth
  • Concurrency increases bandwidth beyond 1/latency.

Other Performance Indicators:

  • I/O interference with the processor (ideally minimized).
  • Device diversity.
  • Capacity/Scalability/Expandability.
  • Storage capacity (for storage devices).

Computer Performance: Often focuses on CPU performance, neglecting other components.

  • Time = Instruction Count x Cycles per Instruction x Clock Cycle Time

Optimizing CPU time isn’t the only way to improve performance; memory and I/O also play significant roles.

Improvement Options:

  • Optimize CPU (speed and efficiency).
  • Optimize memory (access efficiency).
  • Optimize I/O (operation efficiency).

Systems can be CPU-limited, memory-limited, or I/O-limited.

Speedup (after an upgrade): PerformanceAfter / PerformanceBefore

Amdahl’s Law: Calculates speedup based on:

  • FractionEnhancement: Fraction of original execution time benefiting from the improvement.
  • SpeedupEnhancement: Speedup if the entire system benefited.
  • SpeedupOverall = 1 / [(1 – FractionImprovement) + (FractionImprovement / SpeedupEnhancement)]

I/O Device Model (1.3)

Two basic components:

  1. Physical Device: The mechanical part performing the peripheral’s tasks.
  2. Device Controller: The electronic interface between the device and the system. Functions include control/timing, data buffering, and error detection.

The CPU communicates with devices using I/O registers.

CPU-I/O Interface (Logical) (1.4)

For CPU access, I/O devices must be addressable. Two approaches:

  1. Memory-Mapped I/O: Reserves a portion of the memory map for I/O. The CPU accesses I/O registers like memory locations, simplifying design.

Cache Line Search:

  • Index: Selects a cache set.
  • Tag: Compared with all tags in the set.
  • Valid bit: Indicates if the entry has a valid address.
  • Higher associativity increases tag size.

Cache Line Replacement:

  • Direct-mapped: Only one choice.
  • Set-associative/Fully associative: Choose from a set or all lines.
  • Strategies:
    • Random
    • Least Recently Used (LRU)

Write Operations

Two basic options:

  1. Write-Through: Data written to cache and lower-level memory. Advantages: Simple, less costly read misses, memory always updated.
  2. Write-Back: Data written to cache; updated in memory upon line replacement. Advantages: Faster cache writes, efficient memory writes.

Modified Bit: Indicates if a line has changed in write-back. Reduces write frequency.

Write Miss Options:

  1. Write Allocate/Fetch on Write: Load line and write. Subsequent writes won’t miss.
  2. No-Write Allocate/Write-Around: Modify lower level, don’t load in cache. Subsequent writes may miss.

Cache Performance (3.3)

Average Memory Access Time = Hit Time + (Miss Rate x Miss Penalty)

Cache behavior significantly impacts performance, especially in CPUs with low CPI and faster clocks.

Sources of Misses (3.4)

  1. Compulsory: First access to a line.
  2. Capacity: Cache too small.
  3. Conflict: Occur in set-associative or direct-mapped caches due to limited set size.

Data and Instruction Caches (3.5)

Unified caches can be a bottleneck. Separate caches offer higher bandwidth, independent optimization, and eliminate instruction/data conflicts, but can have lower hit rates and consistency issues.

Miss Rate Reduction (3.6)

Larger Line Size: Reduces compulsory misses but can increase conflict misses and miss penalty.

Higher Associativity: Reduces conflict misses. 8-way set-associative is often as effective as fully associative.