But main point for desktop parallerism isn't about what cannot be
For some fairly trivial meaning of the word "important". While the
game market was miniscule until a couple of decades back, it has now
reached saturation. Yes, it dominates the benchmarketing, but it
doesn't dominate CPU design, and there are very good economic reasons
for that. Sorry, that one won't fly.
My view is that Benchmarketing WILL dominate consumer CPU:s. Twice the
cores,twice the performance that Joe Consumer will see. What I was
saying was that which is important in HOME USER perspective, which is
substantial part of X86 market don't you think. The normal businesses
have already stopped looking for top performance in their desktop
PC:s, and go for low cost options. The workstation market is
different, but many workstation apps do parallerize atleast for couple
of threads. My view is that what sells that they will deliver. Its
important because those numbers will determince whose computer is
faster in homeuser market, and business desktops have long gone the
idea of whose is faster. They look just OEM name and price and some
other variables like is it Intel-inside and go for celeron.
Besides people who write software will typicly have TWO years for
doubling the number of cores
[Except there is probably one shrink
that goes for increasing cache instead of number of cores, and that is
more probably earlier than later.]
Hmm. I have been told that more times than I care to think, over a
period of 30+ years. It's been said here, in the context of 'cheap'
computers at least a dozen times over the past decade. That one
won't even start moving.
Paying exponential amount of die area on logaritmic performance
increase vs, increasing number of cores is something that should get
when ondie caches are big enough. The reason why it should happen now,
is NOT that software people wan't it to happen, but its just that
doubling the number of cores Vs gaining 20% single thread performance
is something really important. 1st do you really think x86 core could
go much wider and give big performance from that? Do you think
lenghtening the pipeline for better clock speed would be possible
(over P4). No there is not enough ILP in x86 code, and costs of
circuitry extracting ILP goes up so much faster than the gained ILP
that its dead end too. So they have to turn to what they can increase
caches put ondie memory controllers and multiple cores, but after dual
core what they are going to do?
The ILP vs clockspeed vs coresize kind of question. The
Interconnection delays, and power density issues harm so that after a
certain point bigger core extract less ILP than it looses in
clockspeed. Well what does has to do with this. Well Interconnection
delays relative to transistorspeed will increase, AND that reduces the
optimal size for the core heavily. Trends are there, I can give you
figured that mr Demone deduced, that has to be taken with grain of
salt.
http://www.realworldtech.com/page.cfm?ArticleID=RWT062004172947&p=7
But if this happens as it looks, tha at 0.45u thats 2007 on intel
roadmap the optimal core size would be 20mm² rest is L2 cache and
other cores, and what ever they bring to the die. And intel seems to
keep desktop CPU die size about 100-200mm² So thats 4 cores and their
caches.
The reason for multicore is not because multithreading becomes
extremely usefull but because gaining single threaded performance
becomes MUCH harder.
There was essentially NO progress in the 1970s, and the 'progress'
since then has been AWAY FROM parallelism. With the minor exception
of Fortran 90/95.
There is already companies that use internal parallel languages for
their consumer products to cope with SSE, 3Dnow, and SMP. There ARE
parallel languages that are easy to use for application developement.
I'd say that when there is >500 million desktops with multicore CPU:s
(n>2) and highly competitive software market that runs on them that
needs performance as a distinctive method SOME ONE will see a business
opportunity. What I see is that there are millions of coders out there
who are looking for a solutions and still on dekstop the
syncronisation latencies will make their problem much easier than the
supercomputer folks, as multicore systems will have syncronization
latencies way lower than mainmemory latency. The progress to other
direction comes out of opportunity and necessity, not because what was
previous trend. When there is two cores as mainstream, and 4 cores in
roadmap, people who need the power on DESKTOP will go looking how to
use more threads. And at some point there will be parallel language,
out of necessity. Perhaps when there is 16 cores or more in
mainstream. But for 16 core to happen there should be a situation when
going from 8->16 gives more performance than a doubling of L2orL3
cache will as average case. Yes thats the real reason, there is not
much available for improving singlethreaded performance while keeping
the x86 ISA, and scaling trends hurt even more.
Your first sentence is optimistic, but not impossible. Your second
is what most of the experienced people have been saying. Multiple
cores will be used to run multiple processes (or semi-independent
threads) on desktops, for the forseeable future.
What I'm saying that performance limited applications on HIGHEND
systems, have semi-independent threads for 8 cores, as soon as there
is motivation to utilize that. People keep looking for how to make
things semi-independent, but at some point there has to be a better
way to write the parallel code, or there will be nothing extra
transistors could to do for performance improving than doubling of
ondie caches, after that. In 6-10 years there will be 16 core at
desktop. Unless there is some really disturband technology that gives
us MUCH better use for transistors. Like intel make 4 core EV8 for
desktop
Or that the quantum computing gets to desktop, and makes
normal semiconductor devices obsolete [Very improbable;]
Jouni Osmala