Understanding MTBF, Availability, Speedup, and CPU Time
Understanding Key Performance Metrics
Mean Time Between Failures (MTBF) = MTTF + MTTR. Availability (reliability) = MTTF / MTBF. Failure rate = 1/ MTTF. Speedup of X relative to Y = Execution timeY / Execution timeX. Speedup = Execution time for entire task without using enhancement / Execution time for entire task using enhancement when possible. Speedup_overall = Exectime_old / Exectime_new = 1 / (1 – Fraction_enhanced) + (Fraction_enhanced / Speedup_enhanced). CPU time = CPU clock cycles * clock cycle time = CPU clock cycles / clock rate = IC * Cycles per instruction * clock cycle time = Instructions per program * Clock Cycles per instruction * seconds/clock cycle. CPI = CPU Clock cycles for a program / IC. CPU Clock Cycles = SUM i=1:n ICi * CPIi. CPU Time = CPU Clock Cycles (above) * Clock cycle time.
ALU instructions = add, sub, mul, shift, and, or; Conditional Branches; FP add = addFP, sub FP; Loadstore FP; Other FP = move reg-regFP, compare FP, cond moveFP, other FP; FP multiply; FP divide. loadimm under loadstore. For subFP add 1 more cycle in CPI.
Cache Memory and Write Operations
2) A write miss on the L1 cache will write directly to L2 since it is write-through. The contents of L1 do not change since it is using non-write allocate. There is no danger of replacing a dirty block as blocks are only replaced on read accesses and writes are write-through. A write miss on L2 will load the block containing the address to be written into L2 (writing the block replaced in the cache to the main memory if it is a dirty block) and update the contents of L2. The contents of the write address in main memory will not be updated until the block is replaced in the L2 cache.
3) For a multi-level exclusive cache (a block can only reside in one of the L1 and L2 caches) configuration, describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block. On an L1 write miss, the block to be written to is loaded into L2 as outlined in the previous exercise. Since L1 is write-through, the multilevel cache maintains exclusivity.
Critical Word First and Early Restart
Problem 2-11: (a) With critical word first, the miss service would require 120 cycles. Without critical word first, it would require 120 cycles for the first 16B and 16 cycles for each of the next 3 16B blocks, or 120 + (3*16) = 168 cycles. (b) It depends on the contribution to Average Memory Access Time (AMAT) of the level-1 and level-2 cache misses and the percent reduction in miss service times provided by critical word first and early restart. If the percentage reduction in miss service times provided by critical word first and early restart is roughly the same for both level-1 and level-2 miss service, then if level-1 misses contribute more to AMAT, critical word first would likely be more important for level-1 misses.
Write Buffers and Cache Performance
Problem 2-12: (a). 16B, to match the level 2 data cache write path. (b). Assume merging write buffer entries are 16B wide. Since each store can write 8B, a merging write buffer entry would fill up in 2 cycles. The level-2 cache will take 4 cycles to write each entry. A non-merging write buffer would take 4 cycles to write the 8B result of each store. This means the merging write buffer would be 2 times faster. (c). With blocking caches, the presence of misses effectively freezes progress made by the machine, so whether there are misses or not doesn’t change the required number of write buffer entries. With non-blocking caches, writes can be processed from the write buffer during misses, which may mean fewer entries are needed.
ALU Operations and Memory Access
ALU operations is through I-Mem, Regs, Mux (to select ALU operand), ALU, and Mux (to select value for register write), REG write. Note that the only other path of interest is the PC-increment path through Add (PC + 4) and Mux, which is much shorter. So for the I-Mem, Regs, Mux, ALU, Mux path we have… ; The longest-latency path for LW is through I-Mem, Regs, Mux (to select ALU input), ALU, D-Mem, and Mux (to select what is written to register), REG write. The only other interesting paths are the PC-increment path (which is much shorter) and the path through Sign-extend unit in address computation instead of through Registers. However, Regs has a longer latency than Sign-extend, so for I-Mem, Regs, Mux, ALU, D-Mem, and Mux path we have… ;