Parallel Computer Architectures and Optimization Techniques

Posted on Aug 24, 2024 in Computers

Parallel Computer Architectures

Instruction and Data Streams

SISD (Single Instruction Single Data): The classical von Neumann architecture, where a single data stream is processed by one instruction stream (uniprocessor).

SIMD (Single Instruction Multiple Data): Each instruction is executed on a different set of data by different processors. This is commonly used in array processing and vector processors.

MISD (Multiple Instructions Single Data): A theoretical concept where multiple processing units operate on one single-data stream. This type is non-existent in practice.

MIMD (Multiple Instructions Multiple Data): The most common type of parallel machine, where each processor has a separate program and instruction stream, operating on different data. This includes shared memory (tightly coupled) and multicomputer (loosely coupled) systems.

Analogy: Airport Check-in Desks

SISD: A single check-in desk.
SIMD: Many desks with a supervisor giving instructions that every desk follows.
MIMD: Many desks working independently, synchronized through a central database.

GPU/CPU

GPU Computing: A type of heterogeneous computing that combines many-core GPUs with multicore CPUs for higher performance. GPUs excel in data-intensive, parallel computations.

Differences between GPU and CPU Threads

GPU Threads: Extremely lightweight, requiring thousands for full efficiency.
CPU Threads: Heavier, requiring only a few for multi-core CPUs.

Feature	CPU	GPU
Memory	Large, directly accessible	Relatively small, managed by CPU
Control Logic	Independent for each core	Shared by groups of compute cores
Execution	Independent	Shared within groups
Cache & Synchronization	Coherent caches between cores, shared & synchronized	Shared cache & synchronization within groups, none between groups

Hazards in Pipelining

Structural Hazards: Resource conflicts where multiple instructions request the same resource.

Data Hazards: Data dependency where instructions rely on the results of previous instructions.

Control Hazards: Branching instructions require changes to the program counter (PC).

Types of Cache Misses

Compulsory Miss: First reference to a block, unavoidable regardless of cache size.

Capacity Miss: Cache size is insufficient, leading to block eviction and retrieval.

Conflict Miss: Multiple addresses from different blocks map to the same cache location.

Optimization Techniques

Techniques covered in the course include:

Branch prediction
GPU acceleration
Cache and memory system optimization
Pipelining
Bus width enhancement
VLIW (Very Long Instruction Word)
Shared memory architecture
Multiprocessors
Vector machines
Instruction design
Multithreading

Topologies in Multiprocessor Systems

Topology: The pattern of connections between processors, impacting performance and cost.

Key Characteristics:

Diameter: Maximum distance between two processors.
Total Bandwidth: Capacity of communication links multiplied by the number of links.
Bisection Bandwidth: Maximum data transfer possible at the bottleneck.

Common Topologies

Shared Bus Topology: Processors communicate via a single bus, handling one transmission at a time.

Ring Topology: Direct connections between processors, allowing simultaneous communication but potentially requiring data to travel through multiple processors.

Tree Topology: Direct connections with each processor having three connections, providing a single unique path between any two processors.

Mesh Topology: Each processor connects to its immediate neighbors (above, below, left, and right).

Hypercube Topology: A multi-dimensional mesh where processors connect based on binary representation differences.

Parallel Computer Architectures and Optimization Techniques