Memory Hierarchy and Main Memory Types in Computers
PART 1: Hierarchy of Memory – (T.1) Basics
LOCATION:
- a) Internal memory: Main memory, CPU registers, memory control unit CPU (microprogrammed control)
- b) External memory: Storage devices and peripherals such as disk and tape
CAPACITY:
Is expressed in bytes or words for internal memory and is usually expressed in bytes for external memory.
UNIT TRANSFER:
- Number of lines of input/output of the memory module (for internal memory)
- Related Concepts:
- Word: “Natural” unit of the organization of memory; its size often coincides with the number of bits used to represent numbers and the length of the instructions.
- Addressable units: Words or bytes. The relationship between the length A of an address and the number N of addressable units is: 2A = N.
- For main memory: Number of bits that are read from or written to memory at a time.
- For external memory: The data is transferred in units larger than the word, called BLOCKS.
Access Method:
- A) SEQUENTIAL ACCESS:
- The memory is organized in units called data records.
- Access is made in a specific linear sequence.
- Addressing information is used for the separation of records and data recovery.
- Shared read/write mechanism.
- Variable access time to a register.
- Example: Tape drives
- B) DIRECT ACCESS:
- Shared read/write mechanism.
- The blocks or records have a unique address based on their physical location.
- Access is accomplished by a jump to the given vicinity, followed by a sequential search.
- Variable access time.
- Example: Disk drives
- C) RANDOM ACCESS:
- Each addressable location in memory has a unique, physically wired-in addressing mechanism.
- Constant access time.
- Example: Main memory
- D) ASSOCIATIVE ACCESS:
- A type of random access that enables one to make a comparison of desired bit locations within a word for a specified match, and to do this for all words simultaneously.
- Access time is constant.
- Example: Cache memory
PERFORMANCE:
- A) Access time:
- For random-access and associative memory, it is the time it takes to perform a write or read operation.
- For other access types, it is the time it takes to position the read/write mechanism at the desired location.
- B) MEMORY CYCLE TIME (applicable to random-access memory): Access time plus any additional time required before a second access can commence.
- Additional time may be required for transients to die out on signal lines or to regenerate data if they are read destructively.
- C) Transfer rate: The rate at which data can be transferred into or out of a memory unit.
- For random-access memories, it is the inverse of the cycle time.
- For other memory types: TN = TA + N / R (TN = Average time to read or write N bits, TA = Average access time, N = Number of bits, R = Transfer rate, in bits per second)
PHYSICAL DEVICE:
Semiconductor memories (LSI and VLSI technologies) and magnetic surface memories (disks and tapes).
PHYSICAL DATA STORAGE:
Volatile/nonvolatile • Erasable/non-erasable • Read-only memories (ROM) are non-erasable semiconductor memories.
ORGANIZATION:
(Key design aspect in random-access memory) is the physical arrangement of bits to form words.
(1.1) Memory Hierarchy:
There are 3 basic restrictions in memory system design: capacity, cost, and access time (you need a compromise between these 3 restrictions). Whatever the technology available at any given time, the following restrictions apply: 1) Faster access time, greater cost per bit 2) Greater capacity, smaller cost per bit 3) Greater capacity, slower access time. How to choose the memory system? Do not use a single component but a hierarchy of memory. The key to the success of this choice is the decrease in the frequency of access by the CPU as one descends the hierarchy. The hierarchy is a pyramid with the secondary memory at the base, above the main memory, above the cache memory, and on the top, the CPU registers. As one descends the hierarchy (a) decreasing cost per bit (b) increasing capacity (c) increasing access time (d) decreasing frequency of access to the memory by the CPU. The design strategy will work if conditions (a) and (d) are applied. Condition (d) generally can be satisfied because of the PRINCIPLE OF LOCALITY OF REFERENCE. During the execution of a program, references to memory by the process tend to be clustered. There are two folds:
- A) TEMPORAL LOCALITY: If an element of memory is referenced, it will tend to be referenced again soon (e.g., iterative loops and subroutines).
- B) SPATIAL LOCALITY: If an item is referenced in memory, nearby elements will tend to be referenced soon (example: operations on matrices).
You can organize data through the hierarchy so that the percentage of hits on each next lower level is substantially less than the previous level. Another form of memory that can be included in the hierarchy is the disk cache: a portion of the main memory is used as a temporary data buffer that will be dumped to disk. This increases the benefits: 1 – the disk writes are done in groups. 2 – some data for outputs can be written as referenced by a program before being dumped to disk.
(T.2) MAIN MEMORY
(2.1) MAIN MEMORY:
Once the technology used was that of magnetic cores, but today, semiconductor technology is universal for main memory.
TYPES OF RANDOM ACCESS MEMORY SEMICONDUCTORS:
- 1) RAM (Random Access Memory): Fast reading and writing of data.
- (1a) Static RAM: Stores binary values using flip-flops.
- (1b) Dynamic RAM: Stores data as charges on capacitors. Refresh circuitry is required to maintain data storage. Simpler cells than static RAM allow for higher density for its size, being cheaper too. Used for larger memory sizes where the fixed cost of refresh circuitry is offset by the lower cost of the cell.
- 2) ROM (Read Only Memory): Applications include microprogramming, subroutine libraries for frequently used functions, system programs, and function tables. Its main drawback is the relatively high fixed cost of manufacturing.
- 3) PROM (Programmable ROM): A cheaper alternative for cases where the number of chips needed is low. Its writing process is electrical and is a post-manufacturing process. Greater flexibility and convenience.
- 4) Read-Mostly Memories (EPROM, EEPROM, FLASH): Useful for applications where reads are much more frequent than writes, but nonvolatile storage is required.
- (4a) EPROM (Erasable Programmable ROM): More expensive than a PROM, with the advantage of being able to update the content multiple times.
- (4b) EEPROM (Electrically Erasable PROM): Can be written at any time without erasing its previous contents. It is nonvolatile and field-upgradable. More expensive and less dense (fewer bits per chip) than EPROMs. The write operation is much slower than the read operation.
- (4c) Flash Memory: High-speed reprogramming, higher density than EEPROM (on the order of EPROM), and many EPROMs can be erased faster.
(2.2) ORGANIZATION:
The memory cell is the basic element of a semiconductor memory and usually has 3 terminals:
- Terminal for selection: Selects the cell for write or read operation.
- Control terminal: Indicates the type of operation.
- Third terminal: Introduces the signal that sets the state to 0 or 1 (write).
MEMORY CHIP LOGIC
A chip containing an array of memory cells (typical encapsulated chip sizes: 4 Mbits, 16 Mbits). A fundamental design aspect is the number of data bits that can be read/written at a time; there are two possibilities:
- a) The physical layout of the cells matches the logical arrangement of words: the array is organized into W words of B bits each.
- b) One-bit-per-chip structure: data is written or read bit by bit.
TYPICAL 16-MBIT DRAM ORGANIZATION
In this case, 4 bits are written or read at once. The memory is structured in 4 square submatrices of 2048×2048 elements. Address lines (A’s) provide the word address to select. The number of required lines is log2W -> 11 lines (211 = 2048). 4 lines (D’s) for input/output to/from a data buffer. To read/write a word, data on the bus must be connected to multiple DRAMs, as is the memory controller. Multiplexing address lines (A0-A10) saves pins: first, the row address is provided, and then the column address. The selection is done via RAS (Row Address Selection) and CAS (Column Address Selection) signals. Refresh circuitry disables the chip while refreshing all cells. A refresh counter ranges over all row values, and the counter output is connected to the row decoder and activates the RAS line.
Module Organization
If a RAM chip contains 1 bit per word, you need as many chips as the number of bits per word. It works when the memory size equals the number of bits per chip.
(2.3) RAM:
Processor speed grows approximately 1% per year with respect to memory, so there are a number of solutions:
- Increased tolerance to latency: Dynamic scheduling of instructions, speculative memory accesses, multithreaded processing.
- Improving the performance of the memory system: Memory hierarchy, interlacing and technologies, buses, and faster protocols.
TYPES OF RAM MEMORY
- 1) SRAM (Static RAM): Each cell is constituted by a flip-flop (requires approximately 6 transistors/bit). Fast but expensive.
- 2) DRAM (Dynamic RAM): Each cell consists of a capacitor (tendency to discharge, requires refresh circuitry). Simpler cell, higher density, lower consumption, and cheaper than SRAMs. They are used in large memory sizes (the cost of refresh circuitry is offset by the lower cost of cells). They are the main computer memory. Slower than static RAM.
- COMPARISONS: To configure an SRAM with the same capacity as DRAM would require up to 16 times the number of DRAM chips. The cost per bit of SRAM is between 8 and 16 times that of DRAM. The DRAM access time is between 8 and 16 times that of an SRAM. DRAM capacity is doubling every 2 years.
- IMPROVEMENT OF DRAM PERFORMANCE: Its approaches have a lot of spatial locality because a cache miss involves reading a full cache line. There are several techniques to improve access to consecutive positions, and the performance of data access is similar.
- ASYNCHRONOUS DRAM MEMORY FAMILY:
- a) FPM DRAM (Fast Page Mode): Keeps the row address constant while reading data from multiple consecutive columns. This access mode is maintained in the following architectures.
- b) EDO (Extended Data Out): Adds a latch between the amplifiers and output terminals and disables CAS before maintaining the state of the terminal while starting to access the following line.
- c) BEDO RAM (Burst EDO): Addition of a counter that generates successive column numbers for access. Avoids generating successive CAS signals.
- SDRAM (Synchronous DRAM): Synchronous interface: exchanges control signals with the memory controller synchronized with a clock signal, allowing the processor to perform other tasks while performing memory access operations instead of waiting.
- a) DDR SDRAM (Double Data Rate): Transfers data on both the rising and falling edges of the clock signal (double bandwidth with the same signal) and reduces voltage by 30% compared to SDRAM.
- b) QBM DDR SDRAM (Quad Band Memory): Various outdated banks in the module with outputs. DDR allows 2 transfers per cycle. Similarity with interlacing. Similar latency to DDR but much larger bandwidth.
- c) DDR2 SDRAM (Double Data Rate 2): The I/O buffers work at twice the core frequency. In each cycle, 4 transfers are made. Much higher bandwidth than DDR. Almost double the latencies of DDR. Reduces voltage by approximately 50%.
- Rambus DRAM (RDRAM): Proprietary design, high price. Greater bandwidth than any other technology. Initially increased latency compared to DDR. Evolved into Direct Rambus DRAM (wider bus and segmented transfer) and XDR RAM (running at 3.2 GHz with 2-byte wide plans, eventually to 8 GHz).
(2.4) Interleaved Memory:
- • • • •