65nm news from Intel

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
Stefan said:
That statement is simply hilarious,

Stefan "an Emacs maintainer"

Well I don't much care either way, myself, but in the interests of
maintaining a standard of debate in this newsgroups, would you care
to rebut the supporting arguments?
 
|> >> I think that EMACS is going to be one of the desktop aplications
|> >> that are going to be parallerized well.
|> >
|> > That statement is simply hilarious,
|> >
|> > Stefan "an Emacs maintainer"
|>
|> Well I don't much care either way, myself, but in the interests of
|> maintaining a standard of debate in this newsgroups, would you care
|> to rebut the supporting arguments?

Yes. No problem. I humbly submit the source of Emacs as evidence,
and claim that the conclusion is obvious.

Note that Stefan Monnier did not say that Emacs could not be
parallelised well, at least in theory, but was responding to a
comment that it was going to be.


Regards,
Nick Maclaren.
 
I think that EMACS is going to be one of the desktop applications
Well I don't much care either way, myself, but in the interests of
maintaining a standard of debate in this newsgroups, would you care
to rebut the supporting arguments?

Well, as Nick points out, there's the source code, riddled with global
variables, dynamic data structures, indirections, ...

Then there's the Lisp, its dynamic scoping and its interaction with
buffer-local variables. Most of the Lisp code doesn't rely on the precise
semantics of the current implementation of dynamic scoping (which relies
extensively on global variables), but some do and it's extremely difficult
to figure out which part does.

Then there's the display semantics: to redisplay a window, you need to walk
the buffer sequentially, interpreting each char and its associated
text-properties in sequence. Why is that? Because the current char might be
displayed as one big image, so you can't know whether the next char will
need to be displayed (and where) until you've processed the current char.
Of course, it can still be parallelised, using speculation (which might
work very well here).

....

BTW, does anyone know of any work on parallelizing regexp-matching?


Stefan
 
|>
|> Then there's the display semantics: to redisplay a window, you need to walk
|> the buffer sequentially, interpreting each char and its associated
|> text-properties in sequence. Why is that? Because the current char might be
|> displayed as one big image, so you can't know whether the next char will
|> need to be displayed (and where) until you've processed the current char.
|> Of course, it can still be parallelised, using speculation (which might
|> work very well here).

It would for me - one line to show the style of my .emacs:

(setq auto-mode-alist '((".*" . fundamental-mode)))

|> BTW, does anyone know of any work on parallelizing regexp-matching?

I believe that I saw some once. Anyway, I thought about it, and
felt that it should be straightforward. In particular, it arose
out of the algorithm that I developed to check if two regular
expressions overlapped.

Unfortunately, that was only parallelising the NFA to give the same
sort of performance as the DFA, and needed a VASTLY more lightweight
parallel model than any I know of :-(

If you can give me a more detailed description of what sort of use
you are interested in parallelising, I may be able to help.


Regards,
Nick Maclaren.
 
Nick said:
Yes. No problem. I humbly submit the source of Emacs as evidence,
and claim that the conclusion is obvious.

Note that Stefan Monnier did not say that Emacs could not be
parallelised well, at least in theory, but was responding to a
comment that it was going to be.

I disagree. Jouni's post began...

I have a better reason why emacs is a great candidate for
parallerization.

....which is certainly starting from a "could" rather than "would"
viewpoint.

Its written in lisp, and in reality its a lisp operating system
with embedded wordprocessor included as a major app in it. Now
the lisp code could be autoparallized by autoparallerizing compiler.
So you would need to do some work to improve the underlying lisp
compiler/OS to handle mutliprocessing needs.

Here he makes a specific supporting argument for his claim. When I
asked for rebuttals, I was rather hoping that someone would address
this one. Auto-parallelisation of Lisp may be significantly easier
than the same task for C (which I happily accept hasn't really
happened yet, despite efforts) so emacs may be much better placed
than "the average app".

BTW: I think that EMACS is going to be one of the desktop
applications that are going to be parallerized well. [If it
hasn't already.]

OK, here he switches to "could" mode, but if he blows both ways in the
same post I think its unfair to claim he went in just one direction.

Simply because parallerizing it is geeky enough trick that someone
in OSS developement may wan't to do just for the kicks [...]

Here's a second line of argument, differentiating emacs from the average
app. It is surely undeniable that "cult" OSS software gets ported and
twisted in far more ways than its intrinsic quality would justify. If
I had to place money on which applications would get ported first and
best to any new architecture, I'd bet on emacs and GNU C.
 
There are some problems which can not be made parallel. As in "can be
proved not to be parallelizable" rather than "we don't know how yet."
But the world is not full of those problems, and desktops are REALLY not
running them. So in a practical sense, SMP is as good as a faster CPU
*IF* you have multiple taks or threads, and the overhead doesn't eat the
extra power.

20-25 years ago there was a company called Convex which had a killer C
compiler which did parallelizing. For many problems it would give
Cray-like results for far fewer bucks. I have no idea why they
concentrated on hardware when they had the best software of the time.

The focus on hardware, at least early on, probably made some sense.
They seemed to fill a niche for a "low-cost" mini-super, which coupled
with great compilers (and VAX and CRAY source compatibility), could
attract VAX users (for example) wanting more performance without
CRAY price. By the early 1990's maybe it wasn't so sensible anymore...
And getting started on a microprocessor-based parallel machine
(SMP-like nodes, but with a crossbar) so late (and with little
experience with such machines) was the end-- or, if you see the
bright side of things, a new beginning after being acquired by HP.

I agree that the compiler technology was a great strength
of Convex, though ultimately I'm not sure we would have had much more
success as a software company. Looking back, we had great technical people,
but I'm not so sure we had the right high-level decision makers to
take the company any further.

Interestingly, on the back patio of our Richardson campus, there
used to be the "graveyard". Whenever one of our "competitors"
(Alliant, etc) flopped, they'd get a tombstone. I bailed before I
ever learned whether we received our own tombstone in the graveyard.
 
Stefan said:
BTW, does anyone know of any work on parallelizing regexp-matching?

Last I knew a while back was everybody (from running a Google query and
getting a million PHD theses) was working on non-determistic FSM based
regexp matching. Unfortunately just the theses, no libraries. You sure
can tell when you hit a fad for PHD dissertations. The NDFSM's were interesting
since there was no backtracking which was useful if you wanted to do Expect
like regexp matching without having to rescan the entire buffer everytime you
got input. It's parallelization but not the way you are thinking.

Joe Seigh
 
Ken Hagan said:
I disagree. Jouni's post began...

I have a better reason why emacs is a great candidate for
parallerization.

...which is certainly starting from a "could" rather than "would"
viewpoint.

Its written in lisp, and in reality its a lisp operating system
with embedded wordprocessor included as a major app in it. Now
the lisp code could be autoparallized by autoparallerizing compiler.
So you would need to do some work to improve the underlying lisp
compiler/OS to handle mutliprocessing needs.

Here he makes a specific supporting argument for his claim. When I
asked for rebuttals, I was rather hoping that someone would address
this one. Auto-parallelisation of Lisp may be significantly easier
than the same task for C (which I happily accept hasn't really
happened yet, despite efforts) so emacs may be much better placed
than "the average app".

I don't buy this: alias analysis for lisp is not significantly easier (to
implement) than for C. The results might be slightly more precise (you
don't have to worry about some of the weird tricks you can play with
pointers and memory in C), but I doubt that it makes much difference in
practice (anybody know of a comparative study?). Without good alias
analysis, you're not going to do much auto-parallelisation of an imperative
language (yes, lisp is an imperative language before somebody claims
otherwise).
Here's a second line of argument, differentiating emacs from the average
app. It is surely undeniable that "cult" OSS software gets ported and
twisted in far more ways than its intrinsic quality would justify. If
I had to place money on which applications would get ported first and
best to any new architecture, I'd bet on emacs and GNU C.

Speaking as someone who has also extensively hacked emacs, I'll have to
agree with Stefan. It won't happen (it would be easier to reimplement it
from scratch if you wanted a parallelised version). And what do you
mean by GNU C? An auto-parallelising gcc? Or a parallelised version of
gcc? (and why would you want the latter, when make -j gives you lots
of parallelism already?)
 
Nick Maclaren said:
|> >
|> > By whom is it expected? And how is it expected to appear? Yes,
|> > someone will wave a chip at IDF and claim that it is a Montecito,
|> > but are you expecting it to be available for internal testing,
|> > to all OEMS, to special customers, or on the open market?
|>
|> In November 2003, Intel's roadmap claimed Montecito would appear in
|> 2005. 6 months later, Otellini mentioned 2005 again. In June 2004, Intel
|> supposedly showcased Montecito dies, and claimed that testing had begun.
|>
|> Perhaps Intel is being overoptimistic, but, as far as I understand, they
|> claim Montecito will be ready in 2005.

I am aware of that. Given that Intel failed to reduce the power
going to 90 nm for the Pentium 4, that implies it will need 200
watts. Given that HP have already produced a dual-CPU package,
they will have boards rated for that. Just how many other vendors
will have?

This is wrong. As described by Paul Otellini during his keynote speech at
IDF yesterday, and documented here (watch for URL wrap):

ftp://download.intel.com/pressroom/kits/events/idffall_
2004/otellini_presentation.pdf#page=38

the dual-core, multithreaded, Montecito package actually consumes *less*
power than current Itanium 2 processors.

-- Jim Hull
Itanium Processor Architect at HP
 
This is wrong. As described by Paul Otellini during his keynote speech at
IDF yesterday, and documented here (watch for URL wrap):

ftp://download.intel.com/pressroom/kits/events/idffall_
2004/otellini_presentation.pdf#page=38

Most interesting. Unfortunately, that failed to download.

It is amusing that, at the time I posted my statement, it was based
on the best available information, but that was negated within days.
I don't suppose that you can say WHY the Montecito manages to make
good use of the 90 nm process and the Prescott failed to?

I have a very similar confusion over the IBM G5, with reliable reports
of 200 watts and other ones of (if I recall) 50 watts.


Regards,
Nick Maclaren.
 
In comp.arch Nick Maclaren said:
Most interesting. Unfortunately, that failed to download.
It is amusing that, at the time I posted my statement, it was based
on the best available information, but that was negated within days.
I don't suppose that you can say WHY the Montecito manages to make
good use of the 90 nm process and the Prescott failed to?

Read this..

http://whatever.org.ar/~module/resources/computers/computer-arch/ia-64/vail_slides_2003.pdf

Re-posted from RWT.

http://www.realworldtech.com/forums...PostNum=2668&Thread=1&entryID=37912&roomID=11
I have a very similar confusion over the IBM G5, with reliable reports
of 200 watts and other ones of (if I recall) 50 watts.

Perhaps system power draw versus CPU:typical power draw.
 
Jim Hull said:
This is wrong. As described by Paul Otellini during his keynote speech at
IDF yesterday, and documented here (watch for URL wrap):

ftp://download.intel.com/pressroom/kits/events/idffall_
2004/otellini_presentation.pdf#page=38

the dual-core, multithreaded, Montecito package actually consumes *less*
power than current Itanium 2 processors.

-- Jim Hull
Itanium Processor Architect at HP

There is also:
http://www28.cplan.com/cbi_export/MA_OSAS002_266814_68-1_v2.pdf
which gives the specific quote:
2 cores, 2 threads, 26.5MByte of cache, and 1.72 billion
transistors at 100W
(2 threads means "2 threads per core" in case it is not clear. Slide
elsewhere indicates SMT.)

(The crypto folks will appreciate y'all adding an extra shifter per
core too. Its the little extra touches that count :-))

-Z-
 
Jim Hull said:
This is wrong. As described by Paul Otellini during his keynote speech at
IDF yesterday, and documented here (watch for URL wrap):

ftp://download.intel.com/pressroom/kits/events/idffall_
2004/otellini_presentation.pdf#page=38

the dual-core, multithreaded, Montecito package actually consumes *less*
power than current Itanium 2 processors.

-- Jim Hull
Itanium Processor Architect at HP

So all this stuff came good then ? ;-)

http://whatever.org.ar/~module/resources/computers/computer-arch/ia-64/vail_slides_2003.pdf
 
Sigh. You are STILL missing the point. Spaghetti C++ may be about
as bad as it gets, but the SAME applies to the cleanest of Fortran,
if it is using the same programming paradigms. I can't get excited
over factors of 5-10 difference in optimisability, when we are
talking about improvements over decades.
Simple...

Let's all dust off our old APL manuals, and then practically ALL of
our code will be vectorizable/parallel.

GDR,
Dale Pontius
 
[...]

Has anyone even done JIT to native code for elisp yet? That would be
much easier, and would provide more broadly applicable performance
gains. (At the cost of portability, though there are some fairly
portable JIT systems now. And it is an active area for research.)

As to Ken's kvetch, Stefan did excerpt a very specific line from
Jouni's post which he indicated as "hilarious." Also having had a fair
bit of experience with the emacs C source and elisp code, I found
Stefan's post dead on target.

To someone who knows emacs internals, Jouni's post comes across as
naively optimistic. Much of the performance critical stuff in emacs is
in C. (E.g the regexp matcher.) And even if one takes the Amdahl's law
hit and ignores that, many functions written in native code have
interesting side-effects such as filesystem modifications. So just
getting correctness takes a lot of work.

Beyond the emacs specific part, lisp dialects vary in how ammenable
they are to automatic parallel execution. Even in the best of cases,
completely automatic exploitation of multiprocessor hardware has not
been widely used. Usually some sort of programmer visible concurrency
is exposed, such as futures.

There are likely designs for editors and word processors that can
profitably use multiprocessors. E.g partitioning user input, disk I/O,
source parsing, and display tasks into threads. Though this type of
structure is not going to happen via automatic parallelization of
dusty deck codes.

-Z-
 
Nick said:
|>
|> > Not merely do people sweat blood to get such parallelism, they
|> > often have to change their algorithms (sometimes to ones that are
|> > less desirable, such as being less accurate), and even then only
|> > SOME problems can be parallelised.
|>
|> I think you are looking at huge problems, when the money is on the
|> desktop. ...

You weren't following the thread. I and others were pointing out
that small-scale process-level parallelism is useful on the desktop,
but serious parallelisation of applications is a wide blue yonder
project. The context of the above is where I was telling someone
that the fact that HPC applications have been parallelised does not
mean that desktop ones can easily follow.

Why do they need to be? The typical desktop is running multiple threads
most of the time. At a minimum the application and the kernel, but
things like browsers do multiple things (I don't know if IE uses this,
other browsers do). Clearly not always running many threads, but if you
have multiple processes many CPUs are useful, even virtual ones.

Don't just think of making a single application run faster, the browser
is the only low hanging fruit, but many things happen at once, and can
run as separate processes as well as threads.

I follow what you say, but just breaking up a single app is not the only
benefit.
 
Bill said:
I follow what you say, but just breaking up a single app is not the only
benefit.

One (possibly) significant benefit is the increase in net L1 (and
L2?) cache size, one per core, which will spend a higher portion
of its time hot with the application/OS/whatever. Assuming the OS
knows about processor affinity.
 
|>
|> > >ftp://download.intel.com/pressroom/kits/events/idffall_
|> > >2004/otellini_presentation.pdf#page=38
|>
|> > Most interesting. Unfortunately, that failed to download.

I have now seen it. Without Jim Hull's statement, I would have
regarded "lower power" as being normal executive waffle - i.e.
it didn't say what it was lower than ....

|> http://whatever.org.ar/~module/resources/computers/computer-arch/ia-64/vail_slides_2003.pdf

I am extremely impressed. Foil 7 gives the same order of magnitude
as I got to, but my current understanding is that the power has
been reduced by 2.5-3 times below that.

From my point of view, that changes the IA64 line from something
that we would simply rule out of consideration to something that
we shall have to consider seriously.

|> > I have a very similar confusion over the IBM G5, with reliable reports
|> > of 200 watts and other ones of (if I recall) 50 watts.
|>
|> Perhaps system power draw versus CPU:typical power draw.

Perhaps. It makes a LOT of difference for the HPC people, where
the 'idle' mode savings typically don't help.


Regards,
Nick Maclaren.
 
|> >
|> > You weren't following the thread. I and others were pointing out
|> > that small-scale process-level parallelism is useful on the desktop,
|> > ...
|>
|> Don't just think of making a single application run faster, the browser
|> is the only low hanging fruit, but many things happen at once, and can
|> run as separate processes as well as threads.

Yes, that's what several of us had said earlier in the thread.

The consequence is that SMALL-SCALE parallelism (i.e. 2-8 way)
will be nearly universal within a few years. LARGE-SCALE
parallelism is another matter.


Regards,
Nick Maclaren.
 
|> >
|> > Sigh. You are STILL missing the point. Spaghetti C++ may be about
|> > as bad as it gets, but the SAME applies to the cleanest of Fortran,
|> > if it is using the same programming paradigms. I can't get excited
|> > over factors of 5-10 difference in optimisability, when we are
|> > talking about improvements over decades.
|> >
|> Simple...
|>
|> Let's all dust off our old APL manuals, and then practically ALL of
|> our code will be vectorizable/parallel.

Hmm. Do you have a good APL Dirichlet tesselation code handy?


Regards,
Nick Maclaren.
 
Back
Top