65nm news from Intel

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
Del Cecchi wrote:

[SNIP]
It is interesting the way that some posters post as if the chip and system
designers working for large computer companies were all a bunch of
shuffling
morons. Duh, you mean folks actually run code on these here thangs? Hyuk
hyuk.

You could be forgiven for thinking the worst if you have
used a pre-PCI PC.

Cheers,
Rupert
 
Has anyone even done JIT to native code for elisp yet?

I don't believe that has been done. However, a number of people have
been attempting to implement a Emacs Lisp compatible dialect on top of
Common Lisp. In doing so, it would be possible to compile Emacs with a
Common Lisp compiler such as the Python compiler in CMU/CL.

I believe, however, that most of the current activities has been
targeted at running Emacs Lisp code in Hemlock, the Emacs like editor
that comes with CMU/CL.

It would be interesting to rewrite most of the C editing primitives in
Emacs in Common Lisp and then, using for instance Ingvar Mattsons'
existing elisp implementation, to move more of Emacs over to a CL
environment.
 
It is interesting the way that some posters post as if the chip and system
designers working for large computer companies were all a bunch of shuffling
morons. Duh, you mean folks actually run code on these here thangs? Hyuk
hyuk.

Sometimes I wonder. Please note that I never, EVER wonder whether 99%
of software developers and customers are shuffling morons - the evidence
is overwhelming - and I am one of them :-(

However, none of this is relevant to this thread, and nor is your
remark. The thread was about QUOTED power requirements and system
designs. The quoted requirements may start off with the engineers,
but are heavily edited by the marketdroids, sometimes changing context
enough to make them effectively false.

SURELY you remember the furore about numerous mobiles with "energy
saving" features, where the claimed maximum performance could be
delivered only for a minute or so, and the claimed power consumption
was achieved only when staring at a nearly inactive computer?

I could also name systems with a very impressive density in the
marketing documents but, when it came to the actual planning, racks
could be only half filled if the boards were fully populated. And
that was because of cooling.


Regards,
Nick Maclaren.
 
Sometimes I wonder. Please note that I never, EVER wonder whether 99%
of software developers and customers are shuffling morons - the evidence
is overwhelming - and I am one of them :-(

Well, just for the record: I'm not. I worked in operating system
(kernel, for those who have the MS mindset that the OS encompasses
everything thrown on the HD) development for seven years before I (being
a physicist) turned back to physics, to wit: Meteorology.

I understand OS (-kernels). I don't need lectures on that subject.

I admit that I do not know every in and out of all hardware that comes
to market - that's why I initiate market research if I need info what
current hardware is able to offer us.

I know hardware vendors run codes on their stuff - they're asking us, so
this should be trivially true.

*However*, a hardware vendor cannot decide whether a certain computation
that gives its product headaches is really necessarily that complicated
as the original coder meant it.

That's where my competence comes in.
 
I believe I've found the answer as to how MS is going to handle
this: service oriented programming model. They're essentially going
to make the desktop work like a server, taking the web services model
and moving it in to not only inter-process but also in-process messaging.
The technology codename is called Indigo, to be used under the new API
called WinFX, due to be released in 2006, beta in 2005.

The generall idea is loose coupling of inter-related logic. Whether it
succeeds, ie efficient in a multi-processor environment, is another matter.
But you know MS, right? They can make new technology
more efficient by making the old technology less efficient.
 
Stefan Monnier said:
In fact, Emacs IS a good candidate. Very little context that is not
buffer or window/frame local. Going into that swamp is another issue!

If text editing a huge file is the answer, then that's an awfully weird
question (other tools may be better [having used ed/vi and emacs on 40
MBs on a Cray]).

--
 
Yeah, but that's not bad.
2nd CPUs are cheap these days. The IBM TF-1 realy double the counted
CPUs for fault tolerance ad I/O. Garbage collection is also a candidate
as well as compatibility.
One of the big benefits to a dual processor that is difficult to measure
is the improvement in hand eye coordination with the application. Lets
say a heavy CAD application is using 10% of a CPU for keyboard and mouse
activity, and 100% of the other CPU for application processing. This dual
processor arrangement is much better hand->(KB->app->graphics)->eye
coordination than a single CPU with 110% the processing power.

Maybe.
Might be better to have better I/O processors.

--
 
Yeah, but that's not bad.
2nd CPUs are cheap these days.

You may htinf the second is "cheap", but I don't. The second CPU and the
board that dgoes with it are certainly *not* "cheap".
The IBM TF-1 realy double the counted
CPUs for fault tolerance ad I/O.

Hmm, TF-1 was what, a decade ot two ago?
Garbage collection is also a candidate
as well as compatibility.

I'm not against SMP at all, if it's free I'll take it (and have predicted
multiple core processors here for at leat five years), but to say it's
somehow "free" today, is *nutz*. Even a short few years ago I stated that
two complete systeems were better than one dual. I think the line is
crossing soon to the dual-porcessor, but I'd rather have two systems.
....both duals soon. ;-)
Maybe.
Might be better to have better I/O processors.

Ah, back to the /360. ;-)

Note that we do have GPUs and DMA masters. SMP doesn't solve all ills.
 
In comp.sys.ibm.pc.hardware.chips keith said:
I'm not against SMP at all, if it's free I'll take it (and
have predicted multiple core processors here for at leat five
years), but to say it's somehow "free" today, is *nutz*. Even
a short few years ago I stated that two complete systeems
were better than one dual. I think the line is crossing
soon to the dual-porcessor, but I'd rather have two systems.
...both duals soon. ;-)

I think you're a bit behind the times :)

I've been running an Abit BP6 (dual OC Celerons) as my main
machine since July 1999. Current uptime 195 days. IIRC when
I built it, the premium for dual was $75. Effectively zero,
especially considering the life extention.

But two complete systems are still better for some things
(backup, MS-Windows) and always will be.

-- Robert
 
I think you're a bit behind the times :)

Well, I was talking about single-chip SMP. Even at that it was rather
obvious (I believe I argued with Fleger over this). What else to do with
infinite transistor budgets after caches? Actually *designing* a way of
using transistors is exponentially difficult. Doubling cacches is more or
less linear, as is another processor.
I've been running an Abit BP6 (dual OC Celerons) as my main machine
since July 1999. Current uptime 195 days. IIRC when I built it, the
premium for dual was $75. Effectively zero, especially considering the
life extention.

When I looked (a few months ago) a decent dual AthlonMP board was around
$400, with the processors at a rather premium too. I was *considering a
dual K7 at the time, rather than a single K8. The duals lost because of
the cost. It would have been cheaper to upgrade the second system than go
SMP.
But two complete systems are still better for some things (backup,
MS-Windows) and always will be.

....particularly when Linux is on this one. ;-)
 
|> > Stefan Monnier wrote:
|>
|> >>> > Your second CPU will be mostly idle, of course, but so is the first CPU
|> >>> > anyway ;-)
|> >
|> > Yeah, but that's not bad.
|> > 2nd CPUs are cheap these days.
|>
|> You may htinf the second is "cheap", but I don't. The second CPU and the
|> board that dgoes with it are certainly *not* "cheap".

What board?

The cost difference is far more marketing than production. Dual
CPU boards are sold as 'servers' and as 'performance workstations',
both at a premium. They could equally well be sold with the same
margin as the 'economy' boards.


Regards,
Nick Maclaren.
 
In comp.sys.ibm.pc.hardware.chips keith said:
Well, I was talking about single-chip SMP.

Sorry, I missed that upthread.
What else to do with infinite transistor
budgets after caches?

A very good point. SMT is a fairly simple thing.
Orthogonal to other efforts to improve performance.
Actually *designing* a way of using
transistors is exponentially difficult.

True enough. You run out of orthogonalities :)
When I looked (a few months ago) a decent dual AthlonMP board
was around $400, with the processors at a rather premium too.

Decent? What do you classify as decent? I see'em around $200,
and surely you don't shy away from fixing painted jumpers?
I figure the dual premium is around $200 now.
...particularly when Linux is on this one. ;-)

Oh, I see you're still running the K6-3.
No reason to stop.

-- Robert
 
|>
|> > Well, I was talking about single-chip SMP.
|>
|> Sorry, I missed that upthread.
|>
|> > What else to do with infinite transistor
|> > budgets after caches?
|>
|> A very good point. SMT is a fairly simple thing.
|> Orthogonal to other efforts to improve performance.

Boggle. If it were either, let alone both, it would be vastly
more effective.


Regards,
Nick Maclaren.
 
In comp.sys.ibm.pc.hardware.chips Nick Maclaren said:
|> A very good point. SMT is a fairly simple thing.
|> Orthogonal to other efforts to improve performance.

Boggle. If it were either, let alone both, it would be
vastly more effective.

SMT is simple in that "all" that needs be done is create
duplicate state machines (register sets) to create "virtual
CPUs". Add some (not too much) fairness to the hardware
scheduler and thread through the retirement unit. The main
execution pipeline (ROB, ports, exec units) remains unchanged.

"Vastly more effective" is a comparative term. What do you
expect? SMT won't match SMP under most circumstances. You
don't have the ports or exec units! It'll be particularly
lame on the P7 because that throwback is short of issue ports.

Code type matters. SMT is best for continuing work during
the ~300 clock memory fetch latency. You'd rather the CPU
just stall? But most optimized code has already done
prefetching and is either bandwidth or compute limited.
SMT will help with neither. SMP will only help the latter.

-- Robert
 
|> > |> A very good point. SMT is a fairly simple thing.
|> > |> Orthogonal to other efforts to improve performance.
|> >
|> > Boggle. If it were either, let alone both, it would be
|> > vastly more effective.
|>
|> SMT is simple in that "all" that needs be done is create
|> duplicate state machines (register sets) to create "virtual
|> CPUs". Add some (not too much) fairness to the hardware
|> scheduler and thread through the retirement unit. The main
|> execution pipeline (ROB, ports, exec units) remains unchanged.

That is wrong, completely so.

You DON'T just create duplicate register sets, but have to "dual
port" every execution unit - possible by creating a single set
of double the length, and create some new scheduling to manage it.
You have to move some privileged registers and state from out of
(logically) the execution units to the register sets.

You have to mangle any performance counters and many privileged
registers fairly horribly, because their meanings and constraints
change. Similarly, you have to add logic for CPU state change
synchronisation, because some changes must affect only the current
thread and some must affect both. And you have to handle the case
of the two threads attempting incompatible operations simultaneously.

Oh, of course, none of this affects the main flow of control,
but all forms of real engineering (as distinct from academic
demonstrations and marketing) are as much or more about the problem
cases as the normal ones.

|> "Vastly more effective" is a comparative term. What do you
|> expect? SMT won't match SMP under most circumstances. ...

My suspicion is that it wouldn't match CMP, with the same amount
of real estate, under most circumstances. But that is pure
speculation AS IS THE CLAIM OF THE CONVERSE until and unless
someone does some proper analysis.


Regards,
Nick Maclaren.
 
Code type matters. SMT is best for continuing work during
the ~300 clock memory fetch latency.

What is the evidence to back up this claim?

Not theories, but _evidence_ of bigger speed up compared to,
for example, switch on event multi-threading, or CMP with simpler
and smaller processors, but not sharing L1 cache.

Note that I'm not claiming evidence the other way, but as far as
I can tell the jury is out on the best organisation for concurrency
on chip.

I would however claim that functional units are almost free,
and that the best organisation will win in the long run, not
necessarily the one that best uses a finite number of functional units.
But most optimized code has already done
prefetching and is either bandwidth or compute limited.
SMT will help with neither. SMP will only help the latter.

CMP will also help the former.

Peter
 
In comp.sys.ibm.pc.hardware.chips Nick Maclaren said:
That is wrong, completely so.

Interesting. Do you have specific specialised knowledge?
Or some reference to exactly how SMT has been implemented?
You DON'T just create duplicate register sets, but have to "dual
port" every execution unit - possible by creating a single set
of double the length, and create some new scheduling to manage it.

This is an awful lot of work compared to simply tagging each
instruction with a thread number which indicates which register
set to operate upon. Then letting everything run through with
the extra bits catching dependancies.
You have to mangle any performance counters and many
privileged registers fairly horribly, because their meanings
and constraints change. Similarly, you have to add logic for
CPU state change synchronisation, because some changes must
affect only the current thread and some must affect both.

I wouldn't expect SMT to _always_ run multi-threaded.
The name is _Symmetrical_ Multi Threading. The moment the
execution environment is driven assymmetrical, I expect
failures. Some changes might require an IPI to restart
And you have to handle the case of the two threads
attempting incompatible operations simultaneously.

Usually this is handled by the OS.
Oh, of course, none of this affects the main flow of control,
but all forms of real engineering (as distinct from academic
demonstrations and marketing) are as much or more about
the problem cases as the normal ones.

It's still engineering if it works 99% of the time so long
as it doesn't fail catastrophically in the other 1%.

I see SMT as a simple, cheap way to use fetch wait cycles.
It just needs to work in the common case, two+ pmode threads
(maybe multiple rings) with different pagemaps. Of course
you can probably make it break. Then you deserve what you get.

-- Robert
 
Nick Maclaren said:
|> > |> A very good point. SMT is a fairly simple thing.
|> > |> Orthogonal to other efforts to improve performance.
|> >
|> > Boggle. If it were either, let alone both, it would be
|> > vastly more effective.
|>
|> SMT is simple in that "all" that needs be done is create
|> duplicate state machines (register sets) to create "virtual
|> CPUs". Add some (not too much) fairness to the hardware
|> scheduler and thread through the retirement unit. The main
|> execution pipeline (ROB, ports, exec units) remains unchanged.

That is wrong, completely so.

You DON'T just create duplicate register sets, but have to "dual
port" every execution unit - possible by creating a single set
of double the length, and create some new scheduling to manage it.
You have to move some privileged registers and state from out of
(logically) the execution units to the register sets.

I think that Nick is muddled on this one. If the base implementation is
already OoO then there will normally be many more physical registers than
architected ones. To go two-way SMT may not involve adding any physical
registers, but rather involve changes to renaming. "dual port" every
execution unit doesn't make much sense to me. Access to execution units from
either virtual processor is essentially free - they are after all virtual
processors, not real. What is required is that every bit of *architected*
processor state be renamed or duplicated, prehaps that's what Nick is
getting at?
You have to mangle any performance counters and many privileged
registers fairly horribly, because their meanings and constraints
change. Similarly, you have to add logic for CPU state change
synchronisation, because some changes must affect only the current
thread and some must affect both. And you have to handle the case
of the two threads attempting incompatible operations simultaneously.

What operations are incompatible. SMT as implemented in the Pentium 4, say,
allows either virtual processor to do what it likes. One can transition from
user to kernel and back while the other services interrupts or exceptions or
whatever. The only coordination needed for proper operation is what is
needed for two processors - of course the performance may suffer though.
Oh, of course, none of this affects the main flow of control,
but all forms of real engineering (as distinct from academic
demonstrations and marketing) are as much or more about the problem
cases as the normal ones.

|> "Vastly more effective" is a comparative term. What do you
|> expect? SMT won't match SMP under most circumstances. ...

My suspicion is that it wouldn't match CMP, with the same amount
of real estate, under most circumstances. But that is pure
speculation AS IS THE CLAIM OF THE CONVERSE until and unless
someone does some proper analysis.

yes lots of speculation. The difference here is that to CMP processor take
about twice the silicon of one, while with SMT you have the option to use
1.5 cores worth of silicon. Perhaps once >dual cores is cheap and easy SMT
will die because its more effort than its worth, but my bet is that chips
will go both routes with SMT and CMP. Just one more little problem for the
OS developers to deal with :)
Regards,
Nick Maclaren.

Peter
 
Back
Top