65nm news from Intel

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
Indeed. One thing we noticed in the RISC revolution (may it rest in
peace) was that a dual processor workstation did not get an application
done any faster, but it made the person interacting with the application
a lot happier!

Hmmm. My experience was a bit different; back in the early 1990's, based
on watching SMP clients' experiences, I proposed a rule something along
the lines that "disappointment with multiprocessor systems scales at
least linearly with the number of CPUs". But that was when a lot of the
clients were engineering-savvy....

Hamish
 
In comp.arch Jan Vorbr?ggen said:
What kind of transaction - by itself - would take long enough to warrant
that?

any transaction that goes off and does some data mining in the middle?
I wonder myself. I put it down to general incompetence - in particular,
because some much data is unnecessarily slung around over none-too-fast
networks. Of course, anything XML-based will make things only worse.

Just a symptom of database centric things being designed and run by
people who just want to add another tier to fix all problems. Oh,
and clusters are cool and should be used at all cost.
 
Protein folding comes in two forms - which I would describe as

i) "Hamiltonian evolution of a (semi-)classical approximation"
- this is similar to galactic dynamics etc... and necessarily
serialises in time.

ii) Statistical mechanical prediction of the probability
distribution of a protein configuration.
This neither has the time problem, and can be
done using exact Monte-Carlo algorithms, such as Metropolis,
Hybrid Monte-Carlo, etc...

The latter has very nice parallelisation and numerical error
insensitivity properties. It also accounts better for quantum effects.
No doubt, it is much more numerically expensive.

Peter

Speaking of which: It seems to me that a big problem with protein
folding and similar jobs (e.g. simulating galaxy collisions) is:

- If you want N digits of accuracy in the numerical calculations, you
just need to use N digits of numerical precision, for O(N^2)
computational effort.

- However, quantizing time produces errors; if you want to reduce
these to N digits of accuracy, you need to use exp(N) time steps.

Is this right? Or is there any way to put a bound on the total error
introduced by time quantization over many time steps?

(Fluid dynamics simulation has this problem too, but in both the space
and time dimensions; I suppose there's definitely no way of solving it
for the space dimension, at least, other than by brute force.)

Peter Boyle (e-mail address removed)
 
Protein folding comes in two forms - which I would describe as

i) "Hamiltonian evolution of a (semi-)classical approximation"
- this is similar to galactic dynamics etc... and necessarily
serialises in time.

Like any other form of "extended ODE" with a time component.
ii) Statistical mechanical prediction of the probability
distribution of a protein configuration.
This neither has the time problem, and can be
done using exact Monte-Carlo algorithms, such as Metropolis,
Hybrid Monte-Carlo, etc...

The latter has very nice parallelisation and numerical error
insensitivity properties. It also accounts better for quantum effects.
No doubt, it is much more numerically expensive.

It also doesn't deal with the multiple minimum problem - which is
critical for at least some proteins, such as prions. You need to
know how the protein foilds to be sure that there isn't a barrier,
and/or to estimate the probability and conditions for folding into
different configurations.


Regards,
Nick Maclaren.
 
Peter said:
Protein folding comes in two forms - which I would describe as

i) "Hamiltonian evolution of a (semi-)classical approximation"
- this is similar to galactic dynamics etc... and necessarily
serialises in time.

By the way that massive parallelism is currently most commonly done,
this is a communication problem masquerading as a computational problem.

The real problem here is that you need to take a large number of time
steps (~10^11, in the example from IBM's Blue Gene document) and,
because of long range forces, you need nearly global communication at
every step (since every particle needs to know where every other
particle within any arbitrary cutoff for long-range forces is). The
expensive sum over particles can be made parallel, but at the cost of
putting a copy of nearly every particle position into the memory space
of every processor over which the sum is made parallel (you can take out
the nearly if there is no long range cutoff).

The last time this problem was mentioned, the ops count (estimated 1000
machine instructions per force calculation) from Table 1 of Allen, et. al.

www.research.ibm.com/journal/sj/402/allen.pdf

was discussed here. There are other ways you can organize the
calculation, but if you follow the naive sum-over-particles calculation
implied by Table 1, then the ops count should include the cost of moving
the particle position for that particular particle from the memory space
of the processor where it was most recently updated to the place where
its position will be used to update the position of another particle.

If you have a processor fast enough (possibly by being
multiply-threaded) to handle updating multiple particles in a single
memory space, you can amortize that communication cost over the number
of particles being updated. The usual issues with shared memory don't
arise because the shared variables can be read-only. A large number of
threads or cores on a chip with a shared L3 would be just dandy. The
entire shared space (six degrees of freedom for 32000 particles) would
probably fit into L3.

RM
 
Paul Repacholi said:
In fact, Emacs IS a good candidate. Very little context that is not
buffer or window/frame local. Going into that swamp is another issue!

I have a better reason why emacs is a great candidate for
parallerization.
Its written in lisp, and in reality its a lisp operating system with
embedded wordprocessor included as a major app in it. Now The lisp
code could be autoparallized by autoparallerizing compiler. So you
would need to do some work to improve the underlying lisp compiler/OS
to handle mutliprocessing needs. BTW: I think that EMACS is going to
be one of the desktop aplications that are going to be parallerized
well. [If it hasn't already.] Simply because parallerizing it is geeky
enough trick that someone in OSS developement may wan't to do just for
the kicks, and most of it is written in parallerisable language.

Jouni Osmala
 
In comp.arch Nick Maclaren said:
It also doesn't deal with the multiple minimum problem - which is
critical for at least some proteins, such as prions. You need to
know how the protein foilds to be sure that there isn't a barrier,
and/or to estimate the probability and conditions for folding into
different configurations.

No current method adequately deals with the multiple minimum problem
because they can't adequately explore the space to find even a single
minimum.

I think there are very useful things that could be done by molecular
dynamics on a large ensemble of starting configurations for fairly
modest-length trajectories. That would be trivially parallelizable
with current codes.

-- Dave
 
It won't surprise me in the least if 15 years from now, when the
Nor would it surprise me. Raymond makes one good point, though he
gets it slightly wrong!

There is effectively NO chance of automatic parallelisation working
on serial von Neumann code of the sort we know and, er, love. Not
in the near future, not in my lifetime and not as far as anyone can
predict. Forget it.

This has the consequence that large-scale parallelism is not a viable
general-purpose architecture until and unless we move to a paradigm
that isn't so intractable. There are such paradigms (functional
programming is a LITTLE better, for a start), but none have taken
off as general models. The HPC world is sui generis, and not relevant
in this thread.

So he would be right if he replaced "beyond 2 cores" by "beyond a
small number of cores". At least for the next decade or so.


Regards,
Nick Maclaren.

There are few hints why parallel CPU:s are beneficial at the HOME
desktop.
Most programs do run fast enough. Look for exceptions..
a) Games
b) 3D editing software.
c) videoediting

Now what about future and parallerisation of said things.

a) Well, One thread for UI, couple of threads for physics,[Split by
area], and HUGE amount of threads for AI available. [Think each
monster only READS shared area while writes on its own specific area,
so no sharing, only syncronization with physics and game mechanics
threads, and those syncronizations could be handled by keeping the
LAST frame intact for AI threads that work with one frame delay.] I
have reasons to believe that the main benefits off adding more than 4
cores is improving AI algorithms in games.
b) Already showing some parallerism, don't know about how much
inherited parallerism for CPU that is reasonably easy to get. [For
gains.]
c) Same here.

But main point for desktop parallerism isn't about what cannot be
parallerised, but that IS there enough important aplications that CAN
be parallerised. And I'm saying yes there is, and the wall that hits
it is that at some point no desktop apps need more computing power,
not that getting more cores would be anyparallel. You don't need
parallerize every task for desktop to embrace multicore paradigma,
just the games.

Besides people who write software will typicly have TWO years for
doubling the number of cores ;) [Except there is probably one shrink
that goes for increasing cache instead of number of cores, and that is
more probably earlier than later.]

And for languages, hmmm. They will evolve, simply since using C for
writing app for 16 cores don't look promising, as a general case. SOME
like games are parallel. [No you don't need to parallerize every
single small task, just run different TASKS in parallel(physics,
ai1,ai2...ai[n], ui), and if some task takes more than 1/60th of a
second then there is need to parallerise that.]
But for for many cases things will adapt. 2ndly, there are multiple
processes for CPU:s to be utilized. For instance, OS, P2Papp,MP3player
in background while the actual game in foreground.
If its 2 cores in 2005 its 4 cores in 2007 or 2009 and 8 in 2011,
assuming intel roadmap holds for processes, And in this time scale I
think there is reasonable parallerism available to the cores within a
year of the introduction. For two cores its mostly background
processes, for 4 cores its that there is aplications that do work in 3
threads, +background stuff, and for 8 cores there have been several
years for desktop application writers to deal with it. Some are fast
enough anyway and don't need the parallerism while those who do need
will have to found to way to use or their competitors will for
desktop. Yess there are problems that are inherently
non-parallerisable, but as long as those tasks that drive the sales
are parallerisable there will be exponential trend of increasing
number of cores per die. And no what we learned of supercomputers
won't hold for desktop.
a) Desktop runs n aplications and background processes for sametime
and there is benefit for them having a CPU so that foreground is not
stalled, then the fore ground process may have some coarse grain
parallerism available like games have...
b) Faster desktop processors are sold by numbers for brainwashed
masses.
c) There is need for more processors as long as there is ANY tasks
that could utilize more processors.
d) There is NO long latency communcation problem between the nodes as
with supercomputers, your inter process communication latency happens
in a single die, which makes it a LOT faster than the supercomputers
so you DON'T need to replicate the read only data for different
processes.


Jouni Osmala
-I know I should write more software and write less in comp.arch ...
 
Jouni Osmala wrote:

[SNIP]
There are few hints why parallel CPU:s are beneficial at the HOME
desktop.
Most programs do run fast enough. Look for exceptions..
a) Games
b) 3D editing software.
c) videoediting

Now what about future and parallerisation of said things.

a) Well, One thread for UI, couple of threads for physics,[Split by
area], and HUGE amount of threads for AI available. [Think each
monster only READS shared area while writes on its own specific area,
so no sharing, only syncronization with physics and game mechanics
threads, and those syncronizations could be handled by keeping the
LAST frame intact for AI threads that work with one frame delay.] I
have reasons to believe that the main benefits off adding more than 4
cores is improving AI algorithms in games.

[SNIP]

None of this is new. One of the early apps written for the transputer
was in fact the fabled "Flight Simulator". The thing is these days
there is a lot of work off-loaded to specialised hardware (eg: your
shiny new Nvidia 6800), the question I have to ask is : How much do
CPUs hold games back now ?

My suspicion is : In the general case, not much because developers
target median hardware in order to maximise their potential market.

AI as it's done in games now is basically scripting, I haven't seen
many signs of that changing. Game developers really don't seem to be
much interested in anything else, genuine adaptive AI would be a
complete bastard to test (and they do play-test quite heavily).

We shall see how it pans out, the next cut of XBox may well confirm
your hypothesis.

Cheers,
Rupert
 
None of this is new. One of the early apps written for the transputer
was in fact the fabled "Flight Simulator". The thing is these days
there is a lot of work off-loaded to specialised hardware (eg: your
shiny new Nvidia 6800), the question I have to ask is : How much do
CPUs hold games back now ?

Lots. Pathfinding alone would in many cases be quite adequate to keep
tens of CPUs pegged at 100% load.
 
But main point for desktop parallerism isn't about what cannot be
parallerised, but that IS there enough important aplications that CAN
be parallerised. And I'm saying yes there is, and the wall that hits
it is that at some point no desktop apps need more computing power,
not that getting more cores would be anyparallel. You don't need
parallerize every task for desktop to embrace multicore paradigma,
just the games.

For some fairly trivial meaning of the word "important". While the
game market was miniscule until a couple of decades back, it has now
reached saturation. Yes, it dominates the benchmarketing, but it
doesn't dominate CPU design, and there are very good economic reasons
for that. Sorry, that one won't fly.
Besides people who write software will typicly have TWO years for
doubling the number of cores ;) [Except there is probably one shrink
that goes for increasing cache instead of number of cores, and that is
more probably earlier than later.]

Hmm. I have been told that more times than I care to think, over a
period of 30+ years. It's been said here, in the context of 'cheap'
computers at least a dozen times over the past decade. That one
won't even start moving.
And for languages, hmmm. They will evolve, simply since using C for
writing app for 16 cores don't look promising, as a general case. ...

Ditto, but redoubled in spades. If that were an aircraft, it would
a Vickers Viscount that has been sitting at Kinshasa airport since
the Belgians left.

There was essentially NO progress in the 1970s, and the 'progress'
since then has been AWAY FROM parallelism. With the minor exception
of Fortran 90/95.
But for for many cases things will adapt. 2ndly, there are multiple
processes for CPU:s to be utilized. For instance, OS, P2Papp,MP3player
in background while the actual game in foreground.

Your first sentence is optimistic, but not impossible. Your second
is what most of the experienced people have been saying. Multiple
cores will be used to run multiple processes (or semi-independent
threads) on desktops, for the forseeable future.


Regards,
Nick Maclaren.
 
Nick said:
Your first sentence is optimistic, but not impossible. Your second
is what most of the experienced people have been saying. Multiple
cores will be used to run multiple processes (or semi-independent
threads) on desktops, for the forseeable future.

Well, Sun is supposedly working on Thruput Computing. The only example
they gave was speeding up network stacks. I'm not sure how much will
come out of it since Sun is off working in their own la-la land. I
suspect it will be closer to what someone said earlier, if you build
it, the apps will come. That is, the creative ideas will appear when
you get these multi-core SMP machines into the hands of other than
those with preconceived ideas about the applications of parallel and
concurrent programming.

Joe Seigh
 
Well, Sun is supposedly working on Thruput Computing. The only example
they gave was speeding up network stacks. I'm not sure how much will
come out of it since Sun is off working in their own la-la land. I
suspect it will be closer to what someone said earlier, if you build
it, the apps will come. That is, the creative ideas will appear when
you get these multi-core SMP machines into the hands of other than
those with preconceived ideas about the applications of parallel and
concurrent programming.

Obviously, I can't speak for Sun. But I am prety sure that their
intent with that is to kick-start some radical rethinking, and they
are hoping to shake up the industry rather than follow a predicted
path.


Regards,
Nick Maclaren.
 
Nick said:
There was essentially NO progress in the 1970s, and the 'progress'
since then has been AWAY FROM parallelism. With the minor exception
of Fortran 90/95.

So, how bad is it? As bad as hot fusion, on which the US has finally
given up (except in the context of international cooperation, which is a
sure sign the US feels it has no future)?

You don't count ada as at least a feint in the right direction? Some
people actually use it, and it can be formally analyzed. If people
aren't using better tools than c, it's not because those tools aren't
available.

RM
 
Nick said:
Obviously, I can't speak for Sun. But I am prety sure that their
intent with that is to kick-start some radical rethinking, and they
are hoping to shake up the industry rather than follow a predicted
path.

I agree that's probably what their intent is. The problem is Sun is
stuck in that propietary hardware and software business model and that
kind of narrows their view point.

What they should be doing is creating an open api that runs well enough
on current hardware that it gets widespread adoption but runs even better
on their propietary hardware that Sun has a competitive advantage over
commodity hardware.

That's sort of what hw vendors do with existing api's but with new api's
you have the advantage of patenting all the more obvious implementations
before it can occur to anyone else.

It's a bit of timing thing. You want general adoption of the api before
everyone realized you've locked up the copetitive advantage.

Joe Seigh
 
Rupert Pigott said:
The thing is these days
there is a lot of work off-loaded to specialised hardware (eg: your
shiny new Nvidia 6800), the question I have to ask is : How much do
CPUs hold games back now ?

Quite a bit. See results at

http://www.complang.tuwien.ac.at/anton/umark/

Well, I better reproduce the results here:

--------------Machines-------------------
qual Resolut. Markus Anton Franz calis5 calis2a calis2b
Low 1280x1024 59.2 67.9 44.6 39.4
Low 800x600 62.0 72.0 46.5 44.7 37.4
High 1280x1024 40.0 21.3 31.7 18.0 10.8 22.7
High 800x600 47.2 44.1 33.9 32.2 23.5 25.3

Machines:
Markus: Pentium 4 3000MHz (512 KB L2), Gforce FX5900 256MB, i875, 1GB DDR400 dual channel
Anton: Athlon 64 3200+ (2000MHz 1MB L2), Gforce4Ti4200 64MB, K8T800, 512MB DDR333 ECC
Franz: Athlon XP 2800+ (2083MHz 512KB L2), Radeon 9600 XT 128MB?, KT400A, 512MB DDR333
calis5: Athlon XP 2700+ (2166MHz, 256KB L2), Gforce FX5600
calis2a: Athlon XP 1900+ (1600MHz, 256KB L2), Gforce FX 5600, KT266A, 256MB RAM
calis2b: Athlon XP 1900+ (1600MHz, 256KB L2), Radeon 9600, KT266A, 256MB RAM

All the low-quality results seem to be CPU-limited (little difference
between resolutions). And even for the high-quality results, the
results with a Radeon 9600, 9600XT, and Gforce FX5900 are mostly
CPU-limited, and only those with Gforce 4Ti4200 and FX5600 are
graphics-card-limited; and comparing "calis5" to "Franz" and "calis2a"
to "calis2b" at "High 800x600", these machines are probably also
CPU-limited at this settings. And that's with machines with pretty
fast CPUs (there is certainly less difference between these CPUs than
between a Radeon 9600 and a Gforce FX5900).

One other interesting point in these results is that the Athlon 64
outdoes similarly-clocked Athlon XPs by a factor >1.5 on the
CPU-limited low-quality settings. Looking at the Doom3 results at
<http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2149>, a similar
thing happens for Doom 3 and that's not due to the cache size.

Followups set to comp.arch

- anton
 
Rupert said:
AI as it's done in games now is basically scripting, I haven't seen
many signs of that changing. Game developers really don't seem to be

I think that's at least partly wrong. Even Quake3 had some man-year++
effort in developing different kinds of robots that you could play with
or against, and that was 4-5 (?) years ago.
much interested in anything else, genuine adaptive AI would be a
complete bastard to test (and they do play-test quite heavily).

According to John Cash (who wrote most of that Q3 code), it was indeed a
lot of work to test it, not least since John Carmack tended to tear
apart all the internals of his engine every two or three weeks. :-(

Terje
 
So, how bad is it? As bad as hot fusion, on which the US has finally
given up (except in the context of international cooperation, which is a
sure sign the US feels it has no future)?

Yes, precisely.

The approaches of trying to autoparallelise arbitrary serial von
Neumann code are probably closer to cold fusion, though ....
You don't count ada as at least a feint in the right direction? Some
people actually use it, and it can be formally analyzed. If people
aren't using better tools than c, it's not because those tools aren't
available.

I suppose so, but it is negligibly better than many of the late
1960s languages. And, even if it were wholly positive, it is
outweighed by the C/C++/Java/etc. regressions.


Regards,
Nick Maclaren.
 
Rupert Pigott said:
None of this is new. One of the early apps written for the
transputer was in fact the fabled "Flight Simulator". The thing is
these days there is a lot of work off-loaded to specialised hardware
(eg: your shiny new Nvidia 6800), the question I have to ask is :
How much do CPUs hold games back now ?

If I remember the discussions on the PS2 here when details forst came
out, there was concern expressed by some that the non-video CPU was
marginal for some game play. That would have got a lot worse I suspect
with time.
My suspicion is : In the general case, not much because developers
target median hardware in order to maximise their potential market.

Plus they have to predict what the mainstream WILL BE :(

--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be.
 
Robert said:
So, how bad is it? As bad as hot fusion, on which the US has finally
given up (except in the context of international cooperation, which is a
sure sign the US feels it has no future)?

There's been a great deal of progress in magnetic fusion in the last
few decades. Confinement parameters for current machines are orders
of magnitude ahead of where they were in the 1970s; understanding of
plasmas has also greatly advanced.

Whether tokamaks are going to be economically competitive is another
matter. Fortunately, there are exciting ideas for more compact
reactors.

Paul
 
Back
Top