Parallel Computer Architectures and Optimization Techniques

Parallel Computer Architectures

Instruction and Data Streams

SISD (Single Instruction Single Data): The classical von Neumann architecture, where a single data stream is processed by one instruction stream (uniprocessor).

SIMD (Single Instruction Multiple Data): Each instruction is executed on a different set of data by different processors. This is commonly used in array processing and vector processors.

MISD (Multiple Instructions Single Data): A theoretical concept where multiple processing units operate on one single-data stream. This type is non-existent in practice.

MIMD (Multiple Instructions Multiple Data): The most common type of parallel machine, where each processor has a separate program and instruction stream, operating on different data. This includes shared memory (tightly coupled) and multicomputer (loosely coupled) systems.

Analogy: Airport Check-in Desks

  • SISD: A single check-in desk.
  • SIMD: Many desks with a supervisor giving instructions that every desk follows.
  • MIMD: Many desks working independently, synchronized through a central database.

GPU/CPU

GPU Computing: A type of heterogeneous computing that combines many-core GPUs with multicore CPUs for higher performance. GPUs excel in data-intensive, parallel computations.

Differences between GPU and CPU Threads

  • GPU Threads: Extremely lightweight, requiring thousands for full efficiency.
  • CPU Threads: Heavier, requiring only a few for multi-core CPUs.
FeatureCPUGPU
MemoryLarge, directly accessibleRelatively small, managed by CPU
Control LogicIndependent for each coreShared by groups of compute cores
ExecutionIndependentShared within groups
Cache & SynchronizationCoherent caches between cores, shared & synchronizedShared cache & synchronization within groups, none between groups

Hazards in Pipelining

Structural Hazards: Resource conflicts where multiple instructions request the same resource.

Data Hazards: Data dependency where instructions rely on the results of previous instructions.

Control Hazards: Branching instructions require changes to the program counter (PC).

Types of Cache Misses

Compulsory Miss: First reference to a block, unavoidable regardless of cache size.

Capacity Miss: Cache size is insufficient, leading to block eviction and retrieval.

Conflict Miss: Multiple addresses from different blocks map to the same cache location.

Optimization Techniques

Techniques covered in the course include:

  • Branch prediction
  • GPU acceleration
  • Cache and memory system optimization
  • Pipelining
  • Bus width enhancement
  • VLIW (Very Long Instruction Word)
  • Shared memory architecture
  • Multiprocessors
  • Vector machines
  • Instruction design
  • Multithreading

Topologies in Multiprocessor Systems

Topology: The pattern of connections between processors, impacting performance and cost.

Key Characteristics:

  • Diameter: Maximum distance between two processors.
  • Total Bandwidth: Capacity of communication links multiplied by the number of links.
  • Bisection Bandwidth: Maximum data transfer possible at the bottleneck.

Common Topologies

Shared Bus Topology: Processors communicate via a single bus, handling one transmission at a time.

Ring Topology: Direct connections between processors, allowing simultaneous communication but potentially requiring data to travel through multiple processors.

Tree Topology: Direct connections with each processor having three connections, providing a single unique path between any two processors.

Mesh Topology: Each processor connects to its immediate neighbors (above, below, left, and right).

Hypercube Topology: A multi-dimensional mesh where processors connect based on binary representation differences.