CS641 Class
15 hw4 Solution, Midterm Review
Handout: hw4 #6 memtest results
memtest: Although everyone wrote the expected loop, with proper timing before and after, they were frustrated by not seeing any slow-down at some array size. It turns out that to see the slowdown, you need to optimize the compilation with –O2. The handout shows runs with and without optimization. For more info, see the posted hw4 solution. Also, it’s important to have a warm-up loop before the measurement starts.
We see that with optimization, the program can read data from cache at 6500 bytes/us = 6.5 bytes/ns = 6.5GB/s, an impressive rate.
Then, when the array is too big to fit in cache (over 1MB), the rate falls to about 3GB/s, still an impressive rate but only half the rate above. This is the rate of reading data from memory.
Since the rate has fallen to half, we see that half the time is spent waiting for memory in this case.
In the un-optimized run, the rate is about 1.5 GB/s, even from cache. This is about ¼ the rate from cache in the optimized run, so we conclude that the unoptimized loop has 4 times as many instructions in it as the optimized loop, showing the importance of optimization here. With the big slowdown caused by all the “extra” instructions in the loop, the memory has time to mobilize the next cache line while the previous one is being processed. Prefetching is going on here. We have not officially covered prefetching, however. Our “official CPU” doesn’t do prefetching. It send an address out on the bus and then waits for the cache line to finish coming in before doing the next bus cycle.
hw4 #5: Note that this problem should have given a cache line size as part of its setup. We looked at the computations assuming a 64-byte cache line (as is the case on sf06). It’s important to keep in mind that memory access is always done in cache lines. See the hw4 solution.
Earlier hw trouble area: bits and masks in C. See hw3 #1 solution.
Midterm Review
The exam is open book, notes, handouts. You can bring printouts of hw solutions, etc.
Note that the syllabus and lecture list (in the class web page) have been updated to show the actual sections in P&H that we covered and when. The syllabus shows the subsets of pages where applicable.
We went over the sections—some comments relative to exam:
--don’t worry about interpreting assembler output from the cross compiler (it’s too messy)
--I won’t ask about how SPIM does things, just about the code/memory itself.
--you’ll need to write only snippets of assembly code, but should be able to read somewhat larger portions.