65nm news from Intel

Robert Myers · Sep 7, 2004

Terje said:
You might be right, but that's still in the 'famous last words'
cathegory.

I believe the relevant quote is something like this:

"When an established expert in a field tell you that something is
possible, he is almost certainly right, but when he tells you that
something is impossible, he is very likely wrong."

I'm sure I don't have to say this for your benefit, but, as to what I
said on the subject, I really want to stick with my own exact words,
which I chose with some care.

RM

Rob Warnock · Sep 7, 2004

+---------------
| I believe the relevant quote is something like this:
|
| "When an established expert in a field tell you that something is
| possible, he is almost certainly right, but when he tells you that
| something is impossible, he is very likely wrong."
+---------------

You're probably thinking of Clarke's First Law:

When a distinguished but elderly scientist states that something
is possible, he is almost certainly right. When he states that
something is impossible, he is very probably wrong.
Arthur C Clarke, "Profiles of the Future" (1962; rev. 1973)
``Hazards of Prophecy: The Failure of Imagination''

But one should always temper that with Isaac Asimov's comment:

When, however, the lay public rallies round an idea that is
denounced by distinguished but elderly scientists and supports
that idea with great fervor and emotion--the distinguished but
elderly scientists are then, after all, probably right.
Isaac Asimov (1920-1992), in "Fantasy & Science Fiction" 1977
[In answer to Clarke's First Law]

-Rob

Refs: <http://www.phantazm.dk/sf/arthur_c_clarke/s.htm>
<http://www.xs4all.nl/~jcdverha/scijokes/8_4.html>
and many others...

Terje Mathisen · Sep 7, 2004

Rob said:
+---------------
| I believe the relevant quote is something like this:
|
| "When an established expert in a field tell you that something is
| possible, he is almost certainly right, but when he tells you that
| something is impossible, he is very likely wrong."
+---------------

You're probably thinking of Clarke's First Law:

When a distinguished but elderly scientist states that something
is possible, he is almost certainly right. When he states that
something is impossible, he is very probably wrong.
Arthur C Clarke, "Profiles of the Future" (1962; rev. 1973)
``Hazards of Prophecy: The Failure of Imagination''

Right, thanks for the reference!

But one should always temper that with Isaac Asimov's comment:

When, however, the lay public rallies round an idea that is
denounced by distinguished but elderly scientists and supports
that idea with great fervor and emotion--the distinguished but
elderly scientists are then, after all, probably right.
Isaac Asimov (1920-1992), in "Fantasy & Science Fiction" 1977
[In answer to Clarke's First Law]

Disproof by popular acclaim?

:-)

Terje

Robert Myers · Sep 7, 2004

Nick said:
|> >
|> >>No. I said "insoluble", not "unsolved".
|> >
|> > So, it is your position that it cannot be solved now, nor anytime in the
|> > future?
|>
|> Hot fusion plainly has its ready defenders. The expectations for
|> programming multiprocessors are apparently low, with no apparent and
|> certainly no strenuous dissent.

Eh? I will dissent, strenuously, against such a sweeping statement!
My comment was about such programming by the mass of 'ordinary'
programmers, not about its use in HPC and embedded work (including
games).

And then there is Jouni Osmala ....

I wouldn't want to discourage Jouni or anyone else from optimism, even
overoptimism, even naive overoptimism.

I was, on the other hand, trying to provoke a dissent from you to the
extent that you would say what you do think is possible.

If SGI can get 1024 Itanium to cooperate in a single Linux system image
on a single system image, then somebody must know what they're doing.
NASA Ames apparently has enough confidence in its ability to program big
SMP boxes that it is buying 20 with 512 processors apiece.

On the other hand, the suggestion was recently made here that maybe we
should just banish SMP as an unacceptable programming style (meaning, I
think, that multiprocessor programming should not be done in a
globally-shared memory space, or at least that the shared space should
be hidden behind something like MPI).

The situation is _so_ bad that it doesn't seem embarrassing, apparently,
for Orion Multisystems to take a lame processor, to hobble it further
with a lame interconnect, and to call it a workstation. If the future
of computing really is slices of Wonder Bread in a plastic bag and not a
properly cooked meal, then the Orion box makes some sense. Might as
well get used to it and start programming on an architecture that at
least has the right topology and instruction set, as I believe Andrew
Reilly is suggesting.

If big computers are to be used to solve problems, they are inevitably
going to fall into the hands of people who are more interested in
solving problems than they are in the computers...as should be. If we
really can't conjure tools for programming them that are reliable in the
hands of relative amateurs, I see it as a more pressing issue than not
being able to do hot fusion (the prospects for wind and solar having
come along very nicely).

RM

Nick Maclaren · Sep 7, 2004

|>
|> I was, on the other hand, trying to provoke a dissent from you to the
|> extent that you would say what you do think is possible.
|>
|> If SGI can get 1024 Itanium to cooperate in a single Linux system image
|> on a single system image, then somebody must know what they're doing.
|> NASA Ames apparently has enough confidence in its ability to program big
|> SMP boxes that it is buying 20 with 512 processors apiece.

Yes, it can be done.

|> On the other hand, the suggestion was recently made here that maybe we
|> should just banish SMP as an unacceptable programming style (meaning, I
|> think, that multiprocessor programming should not be done in a
|> globally-shared memory space, or at least that the shared space should
|> be hidden behind something like MPI).

My view is that, if it is to be done, it should be done properly.
And currently, it isn't. There are hardware issues where the
primitives provided are unsuitable, the operating system ones are
definitely unsuitable, and the language situation beggars belief.
All soluble, in theory.

Whether it is the BEST approach is unclear. Explicit synchronisation
of incoherent shared memory is a good model, too, as is message
passing. I can live with any of them, and so can most good parallel
programmers.

|> If big computers are to be used to solve problems, they are inevitably
|> going to fall into the hands of people who are more interested in
|> solving problems than they are in the computers...as should be. If we
|> really can't conjure tools for programming them that are reliable in the
|> hands of relative amateurs, I see it as a more pressing issue than not
|> being able to do hot fusion (the prospects for wind and solar having
|> come along very nicely).

And we need to start by developing some defined parallel programming
languages and paradigms that are acceptable to such users.

Regards,
Nick Maclaren.

Joe Seigh · Sep 7, 2004

Robert said:
On the other hand, the suggestion was recently made here that maybe we
should just banish SMP as an unacceptable programming style (meaning, I
think, that multiprocessor programming should not be done in a
globally-shared memory space, or at least that the shared space should
be hidden behind something like MPI).

With the latter presenting a different api to the programmer or do you
mean doing the shared memory virtualization in software rather than hardware?
Distributed algorithms are attractive from a hardware point of view because
they force some nastier error checking into software. Do you really want
programmers who can't handle shared memory doing distributed programming?

Joe Seigh

Robert Myers · Sep 7, 2004

Nick Maclaren wrote:

All soluble, in theory.

And we need to start by developing some defined parallel programming
languages and paradigms that are acceptable to such users.

It _does_ seem rather like hot fusion.

RM

Andrew Reilly · Sep 7, 2004

Robert said:
On the other hand, the suggestion was recently made here that maybe we
should just banish SMP as an unacceptable programming style (meaning, I
think, that multiprocessor programming should not be done in a
globally-shared memory space, or at least that the shared space should
be hidden behind something like MPI).

I wonder how much SMP style, and the uniform address spaces that
go with it, can be hidden under VM, pointer swizzling and layers
of software-based caching. Probably not much, really.

The situation is _so_ bad that it doesn't seem embarrassing, apparently,
for Orion Multisystems to take a lame processor, to hobble it further
with a lame interconnect, and to call it a workstation. If the future
of computing really is slices of Wonder Bread in a plastic bag and not a
properly cooked meal, then the Orion box makes some sense. Might as
well get used to it and start programming on an architecture that at
least has the right topology and instruction set, as I believe Andrew
Reilly is suggesting.

Well, I think that the specific instruction set is probably a red
herring. I reckon that an object code specifically designed to be
a target for JIT compilation to a register-to-register VLIW engine
of indeterminate dimensions will turn out to be better ultimately.
There are projects moving in that direction:
http://llvm.cs.uiuc.edu/, and from long, long ago: TAO-group's VM.
Stack-based VM's like JVM and MS-IL might or might not be the
right answer. I guess we'll find out soon enough.

Code portability and density is important, of course, but the main
thing is winning back with dynamic recompilation some of the
unknowables that plain VLIW in-order RISC visits on code.

The Transmeta Eficieon is just the first widely available
processor with embedded-levels of integration (memory and some
peripheral interfaces and hyper-channel for other peripherals) and
power consumption that can do pipelined double-precision floating
point multiply/additions at two flops/clock at an interesting
clock rate. 1.5Ghz is significantly faster than the DSP
competitors. TIC6700 tops out at 300MHz and only does single
precision at the core rate. PowerPC+Altivec doesn't have the
memory controller or the peripheral interconnect to drive up the
areal density. The BlueGene core is about the right shape, but I
haven't seen any industrial/embedded boxes with a few dozen of
them in it, yet. The MIPS and ARM processors that have the
integration don't have the floating point chops. Modern versions
of VIA C3 might be getting interesting (or not: I haven't looked
at their double-precision performance), but have neither the
memory controller nor the hyperchannel, nor quite the MHz. Of
course, Opterons fit that description too, and clock much faster,
but I thought that they consumed considerably more power, too.
Maybe their MIPS/watt is closer than I've given it credit for.

If big computers are to be used to solve problems, they are inevitably
going to fall into the hands of people who are more interested in
solving problems than they are in the computers...as should be. If we
really can't conjure tools for programming them that are reliable in the
hands of relative amateurs, I see it as a more pressing issue than not
being able to do hot fusion (the prospects for wind and solar having
come along very nicely).

For such people, I suspect that the appropriate level of
programming is that of science fiction starship bridge computers:
"here's what I want: make it so". I wonder if anyone has looked
at something like simulated annealing or genetic optimisation to
drive memory access patterns revealed by problems expressed at an
APL or Matlab (or higher) level. For most of the "big science"
problems, I suspect that the "what I want" is not terribly
difficult to express (once you've done the science-level thinking,
of course). The tricky part, at the moment, is having a human
understand the redundancies and dataflows (and numerical
stability) issues well enough to map the direct-form of the
solution to something efficient (on one or on a bunch of
processors). I think that from a sufficient altitude, that looks
like an annealing problem, with dynamic recompilation being the
lower tier mechanism of the optimisation target. The lucky thing
about "big science" problems is that by definition they have big
data, and run for a long time. That time and that amount of data
might as well be used by the machine itself to try to speed the
process up as by a bunch of humans attempting the same thing
without as intimate access to the actual values in the data sets
and computations.

It's late, I've had a few glasses of a nice red and I'm rambling.
Sorry about that. Hope the ramble sparks some other ideas.

Robert Myers · Sep 7, 2004

Joe said:
With the latter presenting a different api to the programmer or do you
mean doing the shared memory virtualization in software rather than hardware?

I mean that one writes modules as if for a von Neumann
architecture--never any possibility of a variable being corrupted
because of concurrency. Data that fall outside the purview of the
module are received from or sent to an outside agent through a perfectly
encapsulated interface. How that agent does its work, whether in
hardware or software, is immaterial, so long as it does it according to
specification without intervention or oversight from the application
programmer.

Distributed algorithms are attractive from a hardware point of view because
they force some nastier error checking into software. Do you really want
programmers who can't handle shared memory doing distributed programming?

I believe that it is possible to write formally incorrect programs in
any language currently in practical use. It seems likely that anyone
using such a language, no matter how competent, will eventually write a
formally incorrect program and introduce a bug that will prove to be
very hard to find.

Artificial boundaries (separate processors, separate memory spaces,
separate processes, separate threads, separate system images) might help
in debugging and create an illusion of safety, but, without formal
verification, an illusion is what it is.

RM

Rupert Pigott · Sep 7, 2004

Joe said:
With the latter presenting a different api to the programmer or do you
mean doing the shared memory virtualization in software rather than hardware?
Distributed algorithms are attractive from a hardware point of view because
they force some nastier error checking into software. Do you really want
programmers who can't handle shared memory doing distributed programming?

Too late to worry about whether we want them to be doing that kind
of stuff, they already have... Outlook has been terrorizing the
Internet for many years now, surely a decade by now in fact.

Cheers,
Rupert

spinlock · Sep 7, 2004

Nick,

Many scientists disagree with you, there are
6 different hot fusion projects at MIT alone.

http://web.mit.edu/ned/www/research/fusion&plasmaphysics.html

Stefan Monnier · Sep 7, 2004

I think that EMACS is going to be one of the desktop aplications that are

going to be parallerized well.

That statement is simply hilarious,

Stefan "an Emacs maintainer"

Nick Maclaren · Sep 7, 2004

Many scientists disagree with you, there are
6 different hot fusion projects at MIT alone.

This is why it is such an appropriate analogy. A hell of a lot of
money has been poured into it over many decades, there has been
some progress, there are a lot of people who claim that there is
a breakthrough just round the corner, that has been true all along,
there are some very solid analyses to cast such claims into doubt,
the benefits of a breakthrough would be considerable, and probably
a few more similarities.

I am not intending to hold my breath in either case.

Regards,
Nick Maclaren.

Robert Myers · Sep 8, 2004

Andrew said:
Robert Myers wrote:

Well, I think that the specific instruction set is probably a red
herring...

The Transmeta Eficieon is just the first widely available processor with
embedded-levels of integration (memory and some peripheral interfaces
and hyper-channel for other peripherals) and power consumption that can
do pipelined double-precision floating point multiply/additions at two
flops/clock at an interesting clock rate. 1.5Ghz is significantly
faster than the DSP competitors. TIC6700 tops out at 300MHz and only
does single precision at the core rate. PowerPC+Altivec doesn't have
the memory controller or the peripheral interconnect to drive up the
areal density. The BlueGene core is about the right shape, but I
haven't seen any industrial/embedded boxes with a few dozen of them in
it, yet. The MIPS and ARM processors that have the integration don't
have the floating point chops. Modern versions of VIA C3 might be
getting interesting (or not: I haven't looked at their double-precision
performance), but have neither the memory controller nor the
hyperchannel, nor quite the MHz. Of course, Opterons fit that
description too, and clock much faster, but I thought that they consumed
considerably more power, too. Maybe their MIPS/watt is closer than I've
given it credit for.

Maybe by the time Whitefield and Niagara are available, Transmeta will
have a similar product, too. A ULV Whitefield is where I'd want to
start, and I don't think I'd be too bothered by the separate controller,
which I'd get to amortize over at least four cores. By the time
Whitefield is available, Intel should have more complete infrastructure
like Advanced Switching as an interconnect.

A dual-processor box would already have eight pipes with very little
fuss (and will probably be available as a standard workstation product).
If I wanted to do something more exotic, I'm absolutely certain I
wouldn't wind up with a gigabit ethernet cluster in a box. I think I
could do all that and still compete on performance/watt.

All told, I think you either have to have an application that's
well-suited to the architecture and be cramped for space or power,
and/or believe that the architecture (cluster with lame interconnect)
really is the future of computing to find the box attractive. The
architecture manifestly _isn't_ the future of computing, though, with
chips like Whitefield on the way.

RM

Bill Davidsen · Sep 8, 2004

Russell said:
As I understand it, you could indeed hit, say, 5 GHz with a 90 nm
process (and Prescott's design - longer pipeline, etc - indicates
Intel were hoping to do just that), except that the chip would melt?

You say that as if it were a bad thing ;-)

Bill Davidsen · Sep 8, 2004

Sander said:
Is any kind of itanium actually available on the open market (and
i mean openmarket for new chips, not resale of systems)?

There are people advertizing such. Who would buy onw without a
motherboard is an interesting question, I learned a long time ago that
buying the CPU and M/B as a usit avoid finger pointing if there's any issue.

Bill Davidsen · Sep 8, 2004

Nick said:
And 30 years ago. I wasn't in this game 45 years ago.

Nor would it surprise me. Raymond makes one good point, though he
gets it slightly wrong!

There is effectively NO chance of automatic parallelisation working
on serial von Neumann code of the sort we know and, er, love. Not
in the near future, not in my lifetime and not as far as anyone can
predict. Forget it.

There are some problems which can not be made parallel. As in "can be
proved not to be parallelizable" rather than "we don't know how yet."
But the world is not full of those problems, and desktops are REALLY not
running them. So in a practical sense, SMP is as good as a faster CPU
*IF* you have multiple taks or threads, and the overhead doesn't eat the
extra power.

This has the consequence that large-scale parallelism is not a viable
general-purpose architecture until and unless we move to a paradigm
that isn't so intractable. There are such paradigms (functional
programming is a LITTLE better, for a start), but none have taken
off as general models. The HPC world is sui generis, and not relevant
in this thread.

So he would be right if he replaced "beyond 2 cores" by "beyond a
small number of cores". At least for the next decade or so.

20-25 years ago there was a company called Convex which had a killer C
compiler which did parallelizing. For many problems it would give
Cray-like results for far fewer bucks. I have no idea why they
concentrated on hardware when they had the best software of the time.

Bill Davidsen · Sep 8, 2004

Nick said:
Not merely do people sweat blood to get such parallelism, they
often have to change their algorithms (sometimes to ones that are
less desirable, such as being less accurate), and even then only
SOME problems can be parallelised.

I think you are looking at huge problems, when the money is on the
desktop. As network speeds go up you can be running several java apps,
unpacking a jpg, pulling something out of a local database, updating the
display... In other words programs don't need to be massively rewritten,
opening a web page may generate enough totally autonomous tasks to make
SMP useful. And games, thread per character?

I don't think we need to wait for any breakthroughs to benefit, the only
question is how much.

Nick Maclaren · Sep 8, 2004

|>
|> > Not merely do people sweat blood to get such parallelism, they
|> > often have to change their algorithms (sometimes to ones that are
|> > less desirable, such as being less accurate), and even then only
|> > SOME problems can be parallelised.
|>
|> I think you are looking at huge problems, when the money is on the
|> desktop. ...

You weren't following the thread. I and others were pointing out
that small-scale process-level parallelism is useful on the desktop,
but serious parallelisation of applications is a wide blue yonder
project. The context of the above is where I was telling someone
that the fact that HPC applications have been parallelised does not
mean that desktop ones can easily follow.

Regards,
Nick Maclaren.

Nick Maclaren · Sep 8, 2004

|> >
|> > There is effectively NO chance of automatic parallelisation working
|> > on serial von Neumann code of the sort we know and, er, love. Not
|> > in the near future, not in my lifetime and not as far as anyone can
|> > predict. Forget it.
|>
|> There are some problems which can not be made parallel. As in "can be
|> proved not to be parallelizable" rather than "we don't know how yet."
|> But the world is not full of those problems, and desktops are REALLY not
|> running them. So in a practical sense, SMP is as good as a faster CPU
|> *IF* you have multiple taks or threads, and the overhead doesn't eat the
|> extra power.

Yes, but there are many more that can be parallelised, but not using
automatically - this is a variant of the halting problem - and it was
actually that one I was referring to.

|> 20-25 years ago there was a company called Convex which had a killer C
|> compiler which did parallelizing. For many problems it would give
|> Cray-like results for far fewer bucks. I have no idea why they
|> concentrated on hardware when they had the best software of the time.

C-like compiler. Semantically, it was very unlike C. There has
been no problem about autoparallelising some codes (the ones that
I call vectorisable) for 30+ years. Some language systems have
handled other classes of problem, but the state of the art has
not advanced much beyond that.

Regards,
Nick Maclaren.

65nm news from Intel

Robert Myers

Rob Warnock

Terje Mathisen

Robert Myers

Nick Maclaren

Joe Seigh

Robert Myers

Andrew Reilly

Robert Myers

Rupert Pigott

spinlock

Stefan Monnier

Nick Maclaren

Robert Myers

Bill Davidsen

Bill Davidsen

Bill Davidsen

Bill Davidsen

Nick Maclaren

Nick Maclaren