M
Martin Brown
I can't say I've run any tests, but I don't see how they are getting
more processing other than adding to the cache sizes. Pipelining and
speculative execution should have been mature some 10 years ago. What
exactly is left to improve on?
I had to think about that since I largely agree. Although register
colouring and other tricks are comparatively recent as a refinements
that keep the execution pipeline from stalling so easily.
Taking CPUs I have most experience of tormenting:
9630 i7-3770K TDP 77W @ 3.5GHz (peak 3.9GHz) and 4 x 2 cores
8962 i7-2700K TDP 95W @ 3.5GHz and 4x2 cores
7130 i5-3570K TDP 77W @ 3.4GHz (peak 3.8GHz) and 4 cores
6402 i5-2500K TDP 95W @ 3.3GHz and 4 cores
2962 Q6600 TDP 105W @ 2.4GHz and 4 cores
Back of the envelope calculations suggest that the most recent benchmark
improvements have come from on demand turbo boost - that is the
difference between 3770 & 2700 and 3570 & 2500 can be largely explained
by the 10% faster clock when asked to work really hard.
And even my old Q6600 with a toasty 105W TDP if it could be scaled to
the same hardware spec as the new CPU would improve by the ratio of the
clock speeds and a factor of two for hyperthreading (optimistic).
2962 x (3.9/2.4) = 4813 and double that for hyperthread = 9626
Suspiciously close agreement! So arguably they are gaming the standard
benchmarks now to make new chips look more attractive. Underlying
performance is rather similar except when heavily loaded by optimally
designed parallel algorithms designed to use all the cores at once.
That assumption is a big one! The study I read said the turn was less
than 4 CPUs. Many apps just won't see much improvement with even two
processors. The observed increase in performance is because the OS
needs elbow room, so a second processor helps get it out of the way of
the user app.
It depends on the application. Some things benefit whereas others don't.
Hyatt did a lot of work on optimising chess search algorithms on
multiple processors (and that is a tricky algorithm to parallelise).
N_CPUS 1 2 4 8 16
Naive 1 1.8 3 4.1 4.6
EVP 1 1.9 3.4 5.4 6
DTS 1 2 3.7 6.6 11.1
Taken from his paper
http://www.cis.uab.edu/hyatt/search.html
Naive is typical of what happens if you try and parallelise without
thinking very carefully about the bottlenecks and DTS is more typical of
what you get with a streamlined optimised multiprocessor algorithm.
Most decent multicore code is somewhere between these two extremes ~ EVP
where you do OK on up to 4 cores.
I'm taking a look at the combined drives now. I'm not going to pay an
arm and a leg for one. I can get a SSD for under $200 that is bigger
than what I have now. A combined drive should be close to $100 I am
thinking.
I found the Samsung drives performed better on incompressible data which
was important to me.