The end of Netburst in 2006

  • Thread starter Thread starter YKhan
  • Start date Start date
Robert said:
AFAIK, the only case where SMT is a win is when a thread
stalls like waiting for uncached data, IO or frequent branch
misprediction. Otherwise it is a loss because of lower cache
hits (caches split). Some apps, like relational databases
are pointer chasing exercises and need a lot of uncached data.
I think compilers suffer a lot of misprediction.

It's amazing that this topic just won't die. The only thing that's new
about, say, Pentium-M and HT is that Pentium-M's shorter pipeline makes
hyperthreading less valuable than on NetBurst, where it is already
marginal in the sense that it seems to cost just about the same
increase in power and transistors as you get in increased performance.
If you had no other way of jamming more throughput onto the die and you
could swallow the hit in power, HT is almost always a clear win. If
you include the cost in power or transistors, it's almost always a
wash.

The biggest win I remember (about 35% IIRC) seeing for HT was on a
chess playing program, where I assume the win came from stalled
pointer-chasing threads. Server applications, which also typically
spend significant (~50%) time stalled for memory, should benefit
significantly, as well.

HT may, in practice, do little more than to reduce the hit that Intel
takes in latency from having the memory controller off the die, but it
does do that (up to whatever effect cache-trashing has in the other
direction).

HT does give Intel marketeers a feature that AMD doesn't have to talk
about. The fact that AMD doesn't have nearly the need for SMT because
memory latency is lower and the pipeline is shorter isn't something
you'd really expect Intel to emphasize in its advertising. The way HT
is used in Intel advertising is just market babble. As a part of the
design philosophy of Intel microprocessors, HT actually does make
sense.

RM
 
Very true, especially on a CPU like the iP7 (Pentium4)
that has lots of execution units, but very few issue ports.

IIRC, the P4 can only issue from the same thread, which reduces the
benefit of INTC's version of SMT.
AFAIK, the only case where SMT is a win is when a thread stalls like
waiting for uncached data, IO or frequent branch misprediction.

I thought I said that. ;-) It has *nothing* to do, as the OP proposed,
with execution units.
Otherwise it is a loss because of lower cache hits (caches split). Some
apps, like relational databases are pointer chasing exercises and need a
lot of uncached data. I think compilers suffer a lot of misprediction.

....particularly with the miniscule P4 I-Cache.
 
David Schwartz wrote:
I assume you mean he's correct in his technical statement, and not that
you agree I ever said any such thing...
Yes.

Thank you, I'm not sure I've seen *huge* gains, but 10-30% for free is a
nice bonus. I've never seen a negative on real work, although there was a
benchmark showing that. Gain appear larger on threaded applications than
general use, probably because of more shared code and data in cache.

The huge gains are not measurable but are in terms of usability and
interactive responsiveness. Benchmark gains do tend to be modest.

I think a lot of the usability gains are due to design problems with the
hardware and software. On a single CPU system without HT, for example, an
interrupt that takes too long to service makes the system non-responsive and
frustrates the user. On a system with either multiple CPUs or HT, the system
remains responsive.

DS
 
I think you're kind of hitting the nail on the head with the second
option. My understanding is that SMT added only a very small number
of transistors to the core (the numbers I've heard floated around are
5-10%, though I have no firm quote and I'm not sure if that's for
Northwood or Prescott). With IBM's Power5, where the performance
boost from SMT is much larger, I understand that they were looking at
a 25% increase in the transistor count.

That actually brings up a rather interesting point though. At some
point SMT may become counter-productive vs. multi-core. In the case
of the Power5, if you need to increase you're transistor count by 25%
per core for SMT, you only need 4 cores before you've got a enough
extra transistors for another full-fledged core. That of course leads
to the question, are you better off with 4 cores with SMT or 5 cores
without? My money is on 5 cores without.
....snip...

Now a question: why we didn't see the dual cores many years
earlier? The main (OK, in overly simplistic view) difference between
486 and P5(AKA Pentium) was the second integer pipeline. While I
don't have the transistor count per pipeline in P5 and proportion of
it to the total count, I may suppose (just hypothetically) that making
a dual core of 486-style single pipeline (minus branch prediction
logic, plus extra FPU for second core) would not be much more
complicated than a single core P5, using the same logic as above.
Besides, 486 was even easier to crank up the clock - when Pentium was
around 100 (or 120 - too ancient history to remember exactly) AMD had
its 586 (486 with bigger L1 cache) at 133, easily overclockable to 160
- did it myself. So why nobody back then jumped to make dual cores?
My answer - no software to take advantage of it, at least on
consumer level. Win95 had nothing in it to take advantage of SMP.
Ditto Quake 1 ;-). And in corporate world, it was more than a year
before the introduction of NT4 that might (or might not) have been
benefitted.
Any other answers?
Just another 'what if' speculation with no practical meaning...
 
Now a question: why we didn't see the dual cores many years
earlier?

There is a simple answer to that. There were other, better, things to
do with transistors.
The main (OK, in overly simplistic view) difference between
486 and P5(AKA Pentium) was the second integer pipeline.

I think there was a tad more than that, but...
While I
don't have the transistor count per pipeline in P5 and proportion of
it to the total count, I may suppose (just hypothetically) that making
a dual core of 486-style single pipeline (minus branch prediction
logic, plus extra FPU for second core) would not be much more
complicated than a single core P5, using the same logic as above.
Besides, 486 was even easier to crank up the clock - when Pentium was
around 100 (or 120 - too ancient history to remember exactly) AMD had
its 586 (486 with bigger L1 cache) at 133, easily overclockable to 160
- did it myself. So why nobody back then jumped to make dual cores?
My answer - no software to take advantage of it, at least on
consumer level. Win95 had nothing in it to take advantage of SMP.
Ditto Quake 1 ;-). And in corporate world, it was more than a year
before the introduction of NT4 that might (or might not) have been
benefitted.
Any other answers?

Yes! Caches were a better use of transistors.
Just another 'what if' speculation with no practical meaning...

What if the Earth were flat sorta thing? ;-)
 
Now a question: why we didn't see the dual cores many years
earlier? The main (OK, in overly simplistic view) difference between
486 and P5(AKA Pentium) was the second integer pipeline. While I
don't have the transistor count per pipeline in P5 and proportion of
it to the total count, I may suppose (just hypothetically) that making
a dual core of 486-style single pipeline (minus branch prediction
logic, plus extra FPU for second core) would not be much more
complicated than a single core P5, using the same logic as above.
Besides, 486 was even easier to crank up the clock - when Pentium was
around 100 (or 120 - too ancient history to remember exactly) AMD had
its 586 (486 with bigger L1 cache) at 133, easily overclockable to 160
- did it myself. So why nobody back then jumped to make dual cores?
My answer - no software to take advantage of it, at least on
consumer level. Win95 had nothing in it to take advantage of SMP.
Ditto Quake 1 ;-). And in corporate world, it was more than a year
before the introduction of NT4 that might (or might not) have been
benefitted.
Any other answers?
Just another 'what if' speculation with no practical meaning...

A dual core 486 would have performed no better than a 486 on any single
program. The paradigm of use in that era was a single process on a
single processor. Dual core would require two programs to be running.
Adding a second core would add no value to the customers. On the other
hand, adding additional decode and execution pipes makes the single
program go faster, something every customer was screaming at intel for.
The choice was obvious, and correct at the time.

Alex
 
(e-mail address removed) wrote:

A dual core 486 would have performed no better than a 486 on any single
program. The paradigm of use in that era was a single process on a
single processor. Dual core would require two programs to be running.
Adding a second core would add no value to the customers. On the other
hand, adding additional decode and execution pipes makes the single
program go faster, something every customer was screaming at intel for.
The choice was obvious, and correct at the time.

There *were* multi-tasking environments available at the time. It's even
remotely possible that if dual cores had been available at a reasonable
price, DesqView, which did real pre-emptive multi-tasking for 386 Protected
Mode progs, would not have disappeared into oblivion. Then again.....:-)
 
There *were* multi-tasking environments available at the time. It's even
remotely possible that if dual cores had been available at a reasonable
price, DesqView, which did real pre-emptive multi-tasking for 386 Protected
Mode progs, would not have disappeared into oblivion. Then again.....:-)

Not to mention OS/2. ...but the "official" veiw from Mt. Redomond was
that no one needed to multi-task. ...which is obvious because Win
*couldn't*.

No, the real reason was that caches, OoO, and speculation, were a better
use of the real estate until quite recently.
 
Now a question: why we didn't see the dual cores many years
earlier? The main (OK, in overly simplistic view) difference between
486 and P5(AKA Pentium) was the second integer pipeline.

I'd say that is a grossly over-simplistic view which ignores the
improvements in cache, memory bus, FPU, branch prediction, pipelining,
etc. The two chips were really quite different, to the extent that
the Pentium was easily twice as fast, clock for clock, as a 486.
While I
don't have the transistor count per pipeline in P5 and proportion of
it to the total count, I may suppose (just hypothetically) that making
a dual core of 486-style single pipeline (minus branch prediction
logic, plus extra FPU for second core) would not be much more
complicated than a single core P5, using the same logic as above.

The 486 weighed in at 1.2M transistors, the P5 had 3.1M transistors.
I don't know the exact break-down of the transistor count, but it is
certainly quite reasonable to assume you could build a dual-core 486
for no more (and probably less than) a single-core P5.

Of course, the 486 would be a LOT slower. In fact, in '93 when the
Pentium was released a dual-core 486 would really struggle to be more
than 5% faster than a single-core 486 in any application at all, most
would end up being slower. The first problem was lock of software,
but it didn't end there.
Besides, 486 was even easier to crank up the clock - when Pentium was
around 100 (or 120 - too ancient history to remember exactly) AMD had
its 586 (486 with bigger L1 cache) at 133, easily overclockable to 160
- did it myself.

Of course the 100MHz Pentium was MUCH faster than the 160MHz AMD 486.
So why nobody back then jumped to make dual cores?

Because they could get more performance in 99% of the cases by going
with a beefier single-core.
My answer - no software to take advantage of it, at least on
consumer level. Win95 had nothing in it to take advantage of SMP.

Win95 (and '98 and Me) didn't support SMP at all. If you booted Win95
on a dual-processor (either dual-core or two separate processors) then
the second processor would simply be disabled because it had
absolutely zero support.

Oh, and the Pentium pre-dated Win95 by 2 years, so really we're
talking about Win3.1 timeframe.
Ditto Quake 1 ;-). And in corporate world, it was more than a year
before the introduction of NT4 that might (or might not) have been
benefitted.

WinNT 4.0 at least support multiple processors, but most of the
software would have made little to no use of it. I suppose you could
have done ok in OS/2 as well, though really only the weirdos like
Keith ran OS/2 :>
Any other answers?
Just another 'what if' speculation with no practical meaning...

Back with the 486 vs. Pentium, doubling the transistors resulted in
roughly doubling the performance in single-core chips. Since this
gave you twice as much performance in ALL situations, going to
dual-core which only gave you a small increase in performance in a
very few applications available at the time made no sense.

Now the tables are very much turned. With the Northwood P4 vs.
Prescott P4, Intel more than doubled their transistor count. The
result was basically a negligible increase in performance. On the
other hand a LOT more software is available now that can take
advantage of multiple processing cores and we do a lot more
multitasking than we did back in the Win3.1 or even Win9x days.

Processor design is always a question of trade-offs, what feature
gives you the most performance for most users with a given number of
transistors and/or power consumption. Back in the 486 -> Pentium days
it was pretty clear that adding logic transistors was the best way to
go. This held true when going from the Pentium to the PPro as well,
though after that time the trend shifted to adding cache transistors.
Now we're starting to see the benefits of extra cache trailing off,
but multiple cores is becoming more interesting. Eventually it's
likely that adding more cores will no longer buy you much and people
will have to think up something altogether new to do with their
transistors.
 
Alex said:
A dual core 486 would have performed no better than a 486 on any single
program. The paradigm of use in that era was a single process on a
single processor. Dual core would require two programs to be running.
Adding a second core would add no value to the customers. On the other
hand, adding additional decode and execution pipes makes the single
program go faster, something every customer was screaming at intel for.
The choice was obvious, and correct at the time.

Also let's not forget that the majority of operating systems that could
run on it at the time were single-processor oriented, such as Windows
3.x and Windows 95.

Yousuf Khan
 
George said:
There *were* multi-tasking environments available at the time. It's even
remotely possible that if dual cores had been available at a reasonable
price, DesqView, which did real pre-emptive multi-tasking for 386 Protected
Mode progs, would not have disappeared into oblivion. Then again.....:-)

Desqview disappeared because nobody wanted to do DOS apps anymore, they
wanted to do nice and easy GUIs from that point forward.

Yousuf Khan
 
I'd say that is a grossly over-simplistic view which ignores the
improvements in cache, memory bus, FPU, branch prediction, pipelining,
etc. The two chips were really quite different, to the extent that
the Pentium was easily twice as fast, clock for clock, as a 486.

Twice as fast? Nothing we ran on a P5 ran twice as fast as the 486,
including numerically intensive stuff. The first P5 chipsets were crap and
the general reaction to the first P5-60/66 boxes was disappointment - they
were good for keeping your feet warm in the Winter.
The 486 weighed in at 1.2M transistors, the P5 had 3.1M transistors.
I don't know the exact break-down of the transistor count, but it is
certainly quite reasonable to assume you could build a dual-core 486
for no more (and probably less than) a single-core P5.

Of course, the 486 would be a LOT slower. In fact, in '93 when the
Pentium was released a dual-core 486 would really struggle to be more
than 5% faster than a single-core 486 in any application at all, most
would end up being slower. The first problem was lock of software,
but it didn't end there.


Of course the 100MHz Pentium was MUCH faster than the 160MHz AMD 486.

I don't think so - my Cyrix 5x86/120 ran about on par with a P5-90...
except for maybe FP. Remember that the mbrds used in the DIY 5x86 systems
had accessible BIOS settings for tuning, even before overclocking... vs.
the detuned dumbed-down BIOSes which came in the vendor P5 boxes. Hell the
dumb schmucks were trying to charge an arm & leg for 60ns EDO memory vs.
their "standard" 70ns FPM, which is why a bunch of us upgraded 486/33 boxes
to Cyrix/AMD 5x86s for $500. instead of giving them the $3000. or so they
were asking.
 
Desqview disappeared because nobody wanted to do DOS apps anymore, they
wanted to do nice and easy GUIs from that point forward.

.... even if they didn't need them.:-) ISTR some talk of a hook-up with Gem
which never got anywhere but you're right, eventually Windows 3.11 killed
Desqview.
 
Tony said:
The 486 weighed in at 1.2M transistors, the P5 had 3.1M transistors.
I don't know the exact break-down of the transistor count, but it is
certainly quite reasonable to assume you could build a dual-core 486
for no more (and probably less than) a single-core P5.

Another big stumbling block is that the 486 had no concept of multiple
processors. The bus didn't support it. The cache didn't support it.
If you wanted multiple 486 cores you'd have to re-engineer the 486
pretty drastically. In this day of all CPUs designed to support MP,
people forget that. Multicore has the same requirements as
multiprocessor and 486 was never multiprocessor.

Alex
 
George said:
... even if they didn't need them.:-) ISTR some talk of a hook-up with Gem
which never got anywhere but you're right, eventually Windows 3.11 killed
Desqview.

Desqview could've survived if it could've done what's being done now by
programs like Vmware and Xen. That is if it could've been an OS virtualizer.

Quarterdeck did come up with the Desqview/X program, which was Desqview
with an X Windows interface. From what I can remember of it, it was
extremely pretty, but Quarterdeck didn't have the marketing muscle of
Microsoft (what else is new?) to compel developers to develop for their
environment.

Yousuf Khan
 
In comp.sys.intel George Macdonald said:
... even if they didn't need them.:-) ISTR some talk of a hook-up with Gem
which never got anywhere but you're right, eventually Windows 3.11 killed
Desqview.

And of course they had their own GUI, Desqview/X, which implemented an X
Windows server.
 
Desqview could've survived if it could've done what's being done now by
programs like Vmware and Xen. That is if it could've been an OS virtualizer.

Back then virtualization was not that well known at the desktop level - I
don't see how that could have helped them that much. They also were making
money on QEMM386 because it took M$ 7-8 years to figure it out and they
never quite got there.
Quarterdeck did come up with the Desqview/X program, which was Desqview
with an X Windows interface. From what I can remember of it, it was
extremely pretty, but Quarterdeck didn't have the marketing muscle of
Microsoft (what else is new?) to compel developers to develop for their
environment.

Ahead of their time.:-)
 
Back then virtualization was not that well known at the desktop level - I
don't see how that could have helped them that much.

Dunno. I know a few thousand people who knew all about virtualization
long before (VM/370). ...and while the processor wasn't on their desktop
the glass was. I'm quite tickled when I see all these "new revelations"
about virtualization and multi-tasking.

<snip>
 
Also let's not forget that the majority of operating systems that could
run on it at the time were single-processor oriented, such as Windows
3.x and Windows 95.

Not only single-processor, but single *task*. There was no concept of
preemption in 3x or Win95.
 
Dunno. I know a few thousand people who knew all about virtualization
long before (VM/370). ...and while the processor wasn't on their desktop
the glass was. I'm quite tickled when I see all these "new revelations"
about virtualization and multi-tasking.

Not on the desktop? Well not for everybody but there was VM/PC on an
AT/370 card... or was that just a name? I guess it didn't really do VM did
it? BTW any idea if there is still such a card as the AT/370 for a modern
machine and how one obtains one and at what cost?
 
Back
Top