Intel engineer discusses their dual-core design

  • Thread starter Thread starter YKhan
  • Start date Start date
keith said:
The logic *is* there, though perhaps not brought out to the user.
Regardless, Intel *has* SMP verification capability. The only
excuse for such a stupid statement is that Intel has NIH burried so far
up their ass that even the exec's can't wiff the nonsense. They *have*
the test-cases.

You keep saying this, please post a picture of an Intel P4 desktop chip
with some arrows pointing to the SMP logic you state is present, or stop
making the claim. How can Intel have capability to verify a non-existant
feature? Or provide a link to an official Intel statement that the
capability for SMP is present in non-Xeon chips.
Irrelevant. They have SMP test and verification capability. If they
needed it for dual-core, they didn't have to invent new material. To
suggest such is simply silly.

I say there are no capabilities to test and you say that's irrelevant,
Intel can still test them. Please clarify.
HT is a bust. Everyone knew 64bits was needed five years ago. Intel
tried to derail x86 and go with Itanic only. AMD decided otherwise.
Where is Itanic?
They sure have sold you that "you need 64 bit" hype, haven't they?
That's why Intel couldn't use P-M, AMD convinced people it was necessary
to have 64 bits. HT is a free way to get 15-30% more performance out of
a CPU, if that's a bust I wish someone would bust my gas mileage.
 
Bill Davidsen said:
They sure have sold you that "you need 64 bit" hype, haven't they?
That's why Intel couldn't use P-M, AMD convinced people it was
necessary to have 64 bits. HT is a free way to get 15-30% more
performance out of a CPU, if that's a bust I wish someone would bust
my gas mileage.

however, kernel smp overhead has been notorious for adding 15-30%
overhead ... which can result in a wash.

from 30+ years ago ... there was a project to add a second i-stream to
370/195. the issue was that 195 drained the pipeline on branches and
most codes ran at around half pipeline and half peak thruput. the hope
was that 2nd i-stream could get close to double hardware thruput (for
wide-range of codes) ... more than enuf to compensate for any
incremental kernel overhead going to smp kernel.

it was never produced. originally 370/195 was targeted at national
labs. and numerical intensive supercomputer type stuff. however, it
started to see some uptake in the TPF market (transaction processing
facility ... the renamed ACP ... airline control program ... which in
addition to be being used in large airline res. systems ... was
starting to see some deployment in large financial transaction
networks, therefor its name change). the codes in this market segment
was much more commercial oriented ... so a 2nd thread/i-stream might
benefit the customers workload.

the high-end TPF financial transaction market was seeing some uptake
of 195 as growth from 370/168 (195 even running commercial codes at
half peak could still be around twice thruput of 168). a big stumbling
block was that TPF (operating system) didn't have SMP support and
didn't get SMP until after 3081 time-frame in the 80s (the 3081
product was smp only ... but eventually they were forced to produce a
reduced priced, single-cpu 3083 for the TPF market).

the other issue was that 3033 eventually showed up on the scene which
was nearly the thruput of half-peak 195 (about even on commercial
codes).

for some folklore drift sjr was still running 370/195 and made it
available to internal shops for numerical intensive operations.
however, the batch backlog could be extremely long. Somebody at PASC
claimed that they were getting 3month turn-arounds (they eventually
setup a background and checkpoint process on their own 370/145 that
would absorbe spare cycles offshift ... and started getting slightly
better than 3 month turn-around for the same job).

one of the applications being run by the disk division on research's
195 was air-bearing simulation ... working out the details for
floating disk heads. they were also getting relatively poor turn
around.

the product test lab across the street in bldg. 15 got an early 3033
engineering machine for use in disk/processor/channel product test.

the disk engineering labs (bldg 14) and the product test labs (bldg
15) had been running all their testing "stand-alone" ... scheduled
machine time for a single 'testcell" at a time. the problem was that
standard operating system tended to quickly fail in an environment
with a lot of engineering devices being tested (MVS had a MTBF of 15
minutes in this environment).

I had undertaken to rewrite the i/o supervisor making operating system
bullet proof so that they could concurrently test multiple testcells
w/o requiring dedicated, scheduled stand-alone machine time (and w/o
failing)
http://www.garlic.com/~lynn/subtopic.html#disk

when the 3033 went into bldg. 15, this operating system was up and
running and heavy concurrent product test was consuming something
under 4-5 percent of the processor. so we setup an environment to make
use of these spare cpu cycles. one of the applications we setup to
provide thousands of cpu hrs processing was air-bearing simulation (in
support of working out details for floating disk heads).
 
keith said:
To address the original point YK raised, Intel still has the clear lead
in low power for mobile. [...]
They're focusing on mobile, there's no doubt there. At the same time
Intel is leaving the high performance market to AMD.

I think one important change is that the laptop market (like high
performance) used to be a high margin area. Now that laptop prices
have really plummeted, and Intel's laptop lead isn't as exclusive as
its Xeon business used to be (sure, Centrino is nice, but you can
always put together something similar based on AMD), it seems highly
unlikely that it will give anything near the same margins.

No argumennt here. The only "issue" is *may* be addressed by the
antitrust suits. Certainly AMD has shown itself to be able to com>

here. ...at least in the technical sense.
 
You keep saying this, please post a picture of an Intel P4 desktop chip
with some arrows pointing to the SMP logic you state is present, or stop
making the claim. How can Intel have capability to verify a non-existant
feature? Or provide a link to an official Intel statement that the
capability for SMP is present in non-Xeon chips.

Oh, please! The Xeons aren't that different than P4's, of any stipe. The
PIIIs came in both SMP and non stripes. The PPro was SMP, so the P6 bus
*is* SMP capable. Intel is well known to have SMP x86 capability. They
have had the verification infrastructure for SMP for well over a decade.
If there is a minor bus change that's a *small* issue, compared with
verification. Product test is done by systems that don't care what the
widget is. They only care about test vectors, which drop out of the
architecture verification.
I say there are no capabilities to test and you say that's irrelevant,
Intel can still test them. Please clarify.

Then I say, that you have no clue how these things are done. ...or you
and Intel are in the same clueless boat. Somehow I doubt the latter.

They sure have sold you that "you need 64 bit" hype, haven't they?

It is needed. Perhaps not today, but sooner than I'm going to replace
this system. My Win-system is a Y2K replacement.
That's why Intel couldn't use P-M, AMD convinced people it was necessary
to have 64 bits. HT is a free way to get 15-30% more performance out of
a CPU, if that's a bust I wish someone would bust my gas mileage.

Oh, and Intel's marketeering is so grand, but AMD sucks? In reality Intel
didn't want to go to 64b, except in Itanic. They were brought along
kicking and screaming. ...as Itanic sank beneath the waves. I wonder why
the architecture is called AMD64? Hmm.

==
Keith
 
keith said:
Wasn't the 3033 a 3168 on steriods? ...complete with dual I-streams?

303x machines were organized to use the 303x channel director.

the 370/158 had integrated channels ... i.e. the 158 engine was
shared between the microcode that executed 370 instructions and
the microcode that executed channel (I/O) programs.

for the 303x channel director they took the 370/158 and removed
the microcode that executed 370 instructions leaving just
the channel execution microcode.

a 3031 was a 370/158 remapped to use a channel director box
i.e. a 3031 was a 370/158 that had the channel program microcode
removed ... leaving the engine dedicated to executing the
370 instruction microcode.

in some sense a single processor 3031 was a two 158-engine smp system
.... but with one of the 158 engines dedicated to running 370
instruction microcode and the other processor dedicated to running the
channel program microcode.

a 3032 was a 370/168 modified to use the 303x channel director.

a 3033 started out being a 370/168 wiring diagram remapped to new chip
technology. the 168 chip technology was 4 circuits per chip. the 3033
chip technology was about 20% faster but had about ten times as many
circuits per chip. the initial straight wiring remap would have
resulted in the 3033 being about 20% faster than 168-3 (using only 4
cicuits/chip). somewhere in the cycle, there was a decision to
redesign critical sections of the machine to better utilize the higher
circuit density ... which eventually resulted in the 3033 being about
50% faster than the 168-3.

basic 3033 was single processor (modulo having up to three 303x
channel directors for 16 channels). you could get two-processor
(real) 3033 smp systems (not dual i-stream).

there was some internal issues with the 3031 being in competition with
4341 ... with the 4341 being significantly better price/performance.
A cluster of six 4341s also were cheaper than a 3033 also with much
better price/performance and higher aggregate thruput. Each 4341 could
have 16mbytes of real storage and 6channels for an aggregate of
96mbytes of real storage and 36 i/o channels. A single processor 3033
was still limited to 16 i/o channels and 16mbytes.

somewhat in recognition of the real stroage constraint on 3033 thruput
.... a hack was done to support 32mbytes of real storage even tho the
machines had only 16mbyte addressing. A standard page table entry had
16bits, 12bit page number (with 4k pages giving 24bit real storage
addrssing), 2 defined bits, and two undefined bits. The undefined bits
were remapped on the 3033 to be used in specifying real page
numbers. That allowed up to 14bit page number ... with 4k pages giving
up to 26bit real storage addressing (64mbytes). channel program idals
had been introduced with 370 ... allowing for up to 31bit real storage
addressing (even tho only 24bits were used). This allowed the
operating system to do page I/O into and out of storage above the
16mbyte line.

around the same time that the product test lab (bldg. 15) got their
3033, they also got a brand new engineering 4341 (for the same purpose
doing channel disk i/o testing). we could co-op the 4341 in much
the same way that the 3033 was co-oped. In fact, for a period of
time, I had better access to the 4341 for running tests than 4341
product people in endicott did; as a result I got asked to run some
number of benchmarks on the bldg. 15 4341 for the endicott 4341
product people. minor past refs:
http://www.garlic.com/~lynn/2000d.html#0 Is a VAX a mainframe?
http://www.garlic.com/~lynn/2000d.html#7 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2001l.html#32 mainframe question
http://www.garlic.com/~lynn/2001m.html#15 departmental servers
http://www.garlic.com/~lynn/2002b.html#0 Microcode?
http://www.garlic.com/~lynn/2002d.html#7 IBM Mainframe at home
http://www.garlic.com/~lynn/2002f.html#8 Is AMD doing an Intel?
http://www.garlic.com/~lynn/2002i.html#7 CDC6600 - just how powerful a machine was it?
http://www.garlic.com/~lynn/2002i.html#19 CDC6600 - just how powerful a machine was it?
http://www.garlic.com/~lynn/2002i.html#22 CDC6600 - just how powerful a machine was it?
http://www.garlic.com/~lynn/2002i.html#37 IBM was: CDC6600 - just how powerful a machine was it?
http://www.garlic.com/~lynn/2002k.html#4 misc. old benchmarks (4331 & 11/750)
http://www.garlic.com/~lynn/2003.html#10 Mainframe System Programmer/Administrator market demand?
http://www.garlic.com/~lynn/2005m.html#25 IBM's mini computers--lack thereof


some number of past posts on 303x channel director:
http://www.garlic.com/~lynn/95.html#3 What is an IBM 137/148 ???
http://www.garlic.com/~lynn/97.html#20 Why Mainframes?
http://www.garlic.com/~lynn/99.html#7 IBM S/360
http://www.garlic.com/~lynn/2000c.html#69 Does the word "mainframe" still have a meaning?
http://www.garlic.com/~lynn/2000d.html#7 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2000d.html#11 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2000d.html#12 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2000d.html#21 S/360 development burnout?
http://www.garlic.com/~lynn/2000g.html#11 360/370 instruction cycle time
http://www.garlic.com/~lynn/2000.html#78 Mainframe operating systems
http://www.garlic.com/~lynn/2001b.html#69 Z/90, S/390, 370/ESA (slightly off topic)
http://www.garlic.com/~lynn/2001b.html#83 Z/90, S/390, 370/ESA (slightly off topic)
http://www.garlic.com/~lynn/2001j.html#3 YKYGOW...
http://www.garlic.com/~lynn/2001l.html#24 mainframe question
http://www.garlic.com/~lynn/2001l.html#32 mainframe question
http://www.garlic.com/~lynn/2002d.html#7 IBM Mainframe at home
http://www.garlic.com/~lynn/2002f.html#8 Is AMD doing an Intel?
http://www.garlic.com/~lynn/2002.html#36 a.f.c history checkup... (was What specifications will the standard year 2001 PC have?)
http://www.garlic.com/~lynn/2002i.html#23 CDC6600 - just how powerful a machine was it?
http://www.garlic.com/~lynn/2002n.html#58 IBM S/370-168, 195, and 3033
http://www.garlic.com/~lynn/2002p.html#59 AMP vs SMP
http://www.garlic.com/~lynn/2003g.html#22 303x, idals, dat, disk head settle, and other rambling folklore
http://www.garlic.com/~lynn/2003g.html#32 One Processor is bad?
http://www.garlic.com/~lynn/2003.html#39 Flex Question
http://www.garlic.com/~lynn/2004d.html#12 real multi-tasking, multi-programming
http://www.garlic.com/~lynn/2004d.html#65 System/360 40 years old today
http://www.garlic.com/~lynn/2004e.html#51 Infiniband - practicalities for small clusters
http://www.garlic.com/~lynn/2004f.html#21 Infiniband - practicalities for small clusters
http://www.garlic.com/~lynn/2004g.html#50 Chained I/O's
http://www.garlic.com/~lynn/2004.html#9 Dyadic
http://www.garlic.com/~lynn/2004.html#10 Dyadic
http://www.garlic.com/~lynn/2004m.html#17 mainframe and microprocessor
http://www.garlic.com/~lynn/2004n.html#14 360 longevity, was RISCs too close to hardware?
http://www.garlic.com/~lynn/2004o.html#7 Integer types for 128-bit addressing
http://www.garlic.com/~lynn/2005b.html#26 CAS and LL/SC
http://www.garlic.com/~lynn/2005d.html#62 Misuse of word "microcode"
http://www.garlic.com/~lynn/2005h.html#40 Software for IBM 360/30
http://www.garlic.com/~lynn/2005m.html#25 IBM's mini computers--lack thereof


in the late 70s and early 80s, 4341 competed with and sold into the
same market segment as vax machines
http://www.garlic.com/~lynn/2000c.html#76 Is a VAX a mainframe?
http://www.garlic.com/~lynn/2000c.html#83 Is a VAX a mainframe?
http://www.garlic.com/~lynn/2000d.html#0 Is a VAX a mainframe?
http://www.garlic.com/~lynn/2000d.html#7 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2000d.html#9 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2000d.html#10 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2000d.html#11 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2000d.html#12 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2000d.html#13 4341 was "Is a VAX a mainframe?"
http://www.garlic.com/~lynn/2001m.html#15 departmental servers
http://www.garlic.com/~lynn/2002h.html#52 Bettman Archive in Trouble
http://www.garlic.com/~lynn/2002i.html#30 CDC6600 - just how powerful a machine was it?
http://www.garlic.com/~lynn/2002k.html#1 misc. old benchmarks (4331 & 11/750)
http://www.garlic.com/~lynn/2002k.html#3 misc. old benchmarks (4331 & 11/750)
http://www.garlic.com/~lynn/2003c.html#17 diffence between itanium and alpha
http://www.garlic.com/~lynn/2003c.html#19 diffence between itanium and alpha
http://www.garlic.com/~lynn/2003d.html#0 big buys was: Tubes in IBM 1620?
http://www.garlic.com/~lynn/2003d.html#33 Why only 24 bits on S/360?
http://www.garlic.com/~lynn/2003d.html#61 Another light on the map going out
http://www.garlic.com/~lynn/2003d.html#64 IBM was: VAX again: unix
http://www.garlic.com/~lynn/2003e.html#56 Reviving Multics
http://www.garlic.com/~lynn/2003f.html#48 Alpha performance, why?
http://www.garlic.com/~lynn/2003g.html#22 303x, idals, dat, disk head settle, and other rambling folklore
http://www.garlic.com/~lynn/2003.html#14 vax6k.openecs.org rebirth
http://www.garlic.com/~lynn/2003.html#15 vax6k.openecs.org rebirth
http://www.garlic.com/~lynn/2003i.html#5 Name for this early transistor package?
http://www.garlic.com/~lynn/2003p.html#38 Mainframe Emulation Solutions
http://www.garlic.com/~lynn/2004f.html#39 Who said "The Mainframe is dead"?
http://www.garlic.com/~lynn/2004g.html#24 |d|i|g|i|t|a|l| questions
http://www.garlic.com/~lynn/2004.html#46 DE-skilling was Re: ServerPak Install via QuickLoad Product
http://www.garlic.com/~lynn/2004j.html#57 Monster(ous) sig (was Re: Vintage computers are better
http://www.garlic.com/~lynn/2004l.html#10 Complex Instructions
http://www.garlic.com/~lynn/2004m.html#59 RISCs too close to hardware?
http://www.garlic.com/~lynn/2004m.html#63 RISCs too close to hardware?
http://www.garlic.com/~lynn/2004q.html#71 will there every be another commerically signficant new ISA?
http://www.garlic.com/~lynn/2005f.html#30 Where should the type information be: in tags and descriptors
http://www.garlic.com/~lynn/2005f.html#58 Where should the type information be: in tags and descriptors
http://www.garlic.com/~lynn/2005f.html#59 Where should the type information be: in tags and descriptors
http://www.garlic.com/~lynn/2005m.html#8 IBM's mini computers--lack thereof
http://www.garlic.com/~lynn/2005m.html#12 IBM's mini computers--lack thereof
http://www.garlic.com/~lynn/2005m.html#25 IBM's mini computers--lack thereof
http://www.garlic.com/~lynn/2005n.html#10 Code density and performance?
http://www.garlic.com/~lynn/2005n.html#11 Code density and performance?
http://www.garlic.com/~lynn/2005n.html#12 Code density and performance?
http://www.garlic.com/~lynn/2005n.html#16 Code density and performance?
http://www.garlic.com/~lynn/2005n.html#47 Anyone know whether VM/370 EDGAR is still available anywhere?
 
On Wed, 31 Aug 2005 12:30:44 -0600, Anne & Lynn Wheeler wrote:

the other issue was that 3033 eventually showed up on the scene which
was nearly the thruput of half-peak 195 (about even on commercial
codes).

Wasn't the 3033 a 3168 on steriods? ...complete with dual I-streams?
for some folklore drift sjr was still running 370/195 and made it
available to internal shops for numerical intensive operations. however,
the batch backlog could be extremely long. Somebody at PASC claimed that
they were getting 3month turn-arounds (they eventually setup a
background and checkpoint process on their own 370/145 that would
absorbe spare cycles offshift ... and started getting slightly better
than 3 month turn-around for the same job).

3mos? Wow. I had a friend that had a bank of '85s doing ASTAP (circuit
sim) for 72 hours at a crack over the weekends. His boss didn' tmuch like
the bill, but the "customer" (internal) wanted statistical tranisent runs.
....something today that could be done on this computer with some
version of Spice nin a few hours, no doubt.

one of the applications being run by the disk division on research's 195
was air-bearing simulation ... working out the details for floating disk
heads. they were also getting relatively poor turn around.

the product test lab across the street in bldg. 15 got an early 3033
engineering machine for use in disk/processor/channel product test.

....and you swiped it. ;-) Aside: I worked on the 3033.
the disk engineering labs (bldg 14) and the product test labs (bldg 15)
had been running all their testing "stand-alone" ... scheduled machine
time for a single 'testcell" at a time. the problem was that standard
operating system tended to quickly fail in an environment with a lot of
engineering devices being tested (MVS had a MTBF of 15 minutes in this
environment).

At least you didn't *smoke* 'em. Customers *hated* that. Execs seemed to
hat us for it. ;-)

when the 3033 went into bldg. 15, this operating system was up and
running and heavy concurrent product test was consuming something under
4-5 percent of the processor. so we setup an environment to make use of
these spare cpu cycles. one of the applications we setup to provide
thousands of cpu hrs processing was air-bearing simulation (in support
of working out details for floating disk heads).

You "stole" the CPU cycles. We had a ton of the systems on site, but they
were off-limits to users (customer machines can't be toyed with). Indeed
I killed one customer's ES9000. I pushed it over the power-on cycles
looking for an intermittent bug. My boss had to buy all the TCMs. He
wasn't happy, but they wouldn't buy (or rent) me a $10 logic analyzer
either. So we spent weeks tracking down the bug and a few more patching
it.
 
Under anything reassembling a typical situation, Intel processors
don't need thermal throttling either, even if you are running some
application that has CPU use pegged at 100% for long periods of time.
Intel's TDP numbers are rather pessimistic and there are VERY few
situations where actual power consumption ever exceeds that number,
even for just a short peak.

What's more, it's a sort of worst-case from the process standpoint.
As you probably know, processors of a giving speed grade and stepping
can and do have different power consumption figures for the same load.
Minor differences from the process side of things dictate this.

I thought that was what binning was for - any resulting difference should
be minimal... unless Intel is binning too close the ragged edge.
And
finally, any normal cooler sold with Intel systems doesn't JUST meet
the specs with no room to spare, there is always at least a small
margin of error.

Not what anecdotal evidence here is saying - just a coupla weeks ago we had
someone who could not get his P4 to quit throttling with the Intel
heatsink: [email protected] and the resolution:
[email protected] Now it's always possible that he did not fit
Intel's sink properly, *twice*... OTOH he had no trouble with the
Zalman.<shrug>

I'm sure Rob Stow has commented here about Intel's heaters, the inference
being that Intel is pushing the temp envelope to err, keep up.
Long story short, if your processor is throttling, regardless of what
you're running on the PC, there's probably something wrong with your
setup. Either that or ambient temp is well above normal.

My processor is not throttling - it's an Athlon64 3500+:-) and it's running
~15C below max temp at full load in a warmish 26C ambient. This is a real
load, FP an' all. Though I've no recent experience with Intel, the stories
that they are running closer to the thermal limits do seem to have some
substance.
 
keith said:
On Wed, 31 Aug 2005 12:30:44 -0600, Anne & Lynn Wheeler wrote:




Wasn't the 3033 a 3168 on steriods? ...complete with dual I-streams?

Story I heard was it was a card for card remap from MST into HPCL-F MS255.
snip
 
Story I heard was it was a card for card remap from MST into HPCL-F MS255.

That was the plan, but it didn't turn out that way. They ended up
using the "amazingly dense" 25 gates per module. Lynn has some links to
this end, IIRC. BTW, I did a few "specials" (clock drivers, mostly) in
MS255. It was amazingly fast for its time.
 
already there. Years? Please! They don't simulate/verify in
multi-processor environments? *Amazing*!

Like the article says, Intel didn't have the test vectors
or BIST for testing interconnects between the two cores.
That's understandable as Intel was fabricating single-cores prior.
Intel has accumulated a large suite of software diagnostics
to verify SMP scenarios (#LOCK, APIC, cache MESI, etc) back when
the Pentium II introduced built-in SMP logic (and before).
 
Back
Top