65nm news from Intel

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
Yes, but there are SUPPOSED to be checksums and sequence counts.
If those were used properly, the chances of error are low. Yes,
I know that regrettably many systems don't check them correctly,
or even run with no checking by default, but still ....

Let's back-up a step - were you saying that the transfers were not
completing, or that the transfers were completing, but have corrupt
files? I (perhaps mistakenly) thought I read you writing that the
transfers were not completing.
In a previous task, I had to investigate FTP, and that is nothing
like as solid. In particular, it makes it too easy to truncate a
transfer early and think that was EOF. I think that was what was
happening - the last window wasn't being pushed, and so the last
few KB of the file were always arriving.

So the transfers were not completing?

rick jones
 
Auth requirements on the presentations seems to be inconsistent. Some
require them, some don't. Some of the ones you can download without
authorization have the username and password on one of the last few
slides. (You can get to the list of presentations here:
http://www.cplan.com/idfafa04/sys/catalog1
(just click the search button without filling in any fields.)
Multi-threaded: yes. SMT: no. Montecito uses a different version of
multithreading than SMT. I know that's been discussed before. Search
for it if you want details.

Rereading the slides, I conclude that my original comement was just
wrong. The presentations says thread switches happen on "long latency
operations." There is n't really any detailed information on this in
the pressentation. My apologies for the error. (They have slides
showing something that looks like SMT, but I think it is just for an
overview, not what Montecito does.)

-Z-
 
Zalman said:
Auth requirements on the presentations seems to be inconsistent. Some
require them, some don't. Some of the ones you can download without
authorization have the username and password on one of the last few
slides. (You can get to the list of presentations here:
http://www.cplan.com/idfafa04/sys/catalog1
(just click the search button without filling in any fields.)




Rereading the slides, I conclude that my original comement was just
wrong. The presentations says thread switches happen on "long latency
operations." There is n't really any detailed information on this in
the pressentation. My apologies for the error. (They have slides
showing something that looks like SMT, but I think it is just for an
overview, not what Montecito does.)

The google search

speculative slice "delinquent loads"

yields a cornucopia of what I believe are the relevant links,

http://www.intel.com/research/mrl/library/148_collins_j.pdf

in particular.

RM
 
|>
|> Let's back-up a step - were you saying that the transfers were not
|> completing, or that the transfers were completing, but have corrupt
|> files? I (perhaps mistakenly) thought I read you writing that the
|> transfers were not completing.

Acrobat was gagging. I didn't check the files in detail.

|> So the transfers were not completing?

Let me explain the issue.

TCP/IP (in its general sense) and its 'Internet' interfaces specify
actions on closing a connexion cleanly, but assume that connexions
will be kept open until they are closed cleanly. There is thus no
architected way of indicating an unsuccessful close. It isn't quite
like that, but that is the effect.

Most semi-decent systems (i.e. Unix, not Microsoft) do pass that
information back up to the application, but all of them sometimes
get it wrong and the indecent ones USUALLY get it wrong. In
particular, there is no way of passing that information through a
filter which does not have a suitable out-of-band messaging system.

FTP does not have a post-close checking 'flag', so cannot tell the
difference between a break and a close. Its assumption is that a
transfer completes or hangs. And that is the misdesign I was
referring to.

The effect is that it is quite common for transfers to appear to
have completed successfully but actually to have failed. In this
case, at least almost all of the file transferred (i.e. it got its
size right to the nearest few KB), but it could have been a last
block problem.


Regards,
Nick Maclaren.
 
Nick Maclaren said:
|>
|> Let's back-up a step - were you saying that the transfers were not
|> completing, or that the transfers were completing, but have corrupt
|> files? I (perhaps mistakenly) thought I read you writing that the
|> transfers were not completing.

Acrobat was gagging. I didn't check the files in detail.

|> So the transfers were not completing?

Let me explain the issue.

TCP/IP (in its general sense) and its 'Internet' interfaces specify
actions on closing a connexion cleanly, but assume that connexions
will be kept open until they are closed cleanly. There is thus no
architected way of indicating an unsuccessful close. It isn't quite
like that, but that is the effect.

Most semi-decent systems (i.e. Unix, not Microsoft) do pass that
information back up to the application, but all of them sometimes
get it wrong and the indecent ones USUALLY get it wrong. In
particular, there is no way of passing that information through a
filter which does not have a suitable out-of-band messaging system.

FTP does not have a post-close checking 'flag', so cannot tell the
difference between a break and a close. Its assumption is that a
transfer completes or hangs. And that is the misdesign I was
referring to.

The effect is that it is quite common for transfers to appear to
have completed successfully but actually to have failed. In this
case, at least almost all of the file transferred (i.e. it got its
size right to the nearest few KB), but it could have been a last
block problem.

Most ftp clients and servers support passing information about file sizes to
the client, so that the client knows if it has successfully downloaded the
whole file (although it can't tell if the download has been corrupted in
some way). One particular client is very poor at this - internet explorer.
It will happily tell the user that a download has completed successfully,
when in fact it has stopped halfway. There are some ftp servers that don't
give proper size information, so that even the best clients can only guess
as to whether the transfer has been completed successfully or has failed.

The reason why ftp does not have proper completion singalling is that it
does not, as you previously stated, run on TCP/IP, but on UDP/IP. TCP
communications establish a two-way link through a series of handshake
telegrams, and every data telegram must be acknowledged by the receiver.
UDP, on the other hand, is basically a one-way link (or, for ftp, two
one-way links in anti-parallel), and telegrams are not acknowledged. This
reduces the overhead (which is why it is used for ftp - the assumption is
that a higher level checking mechanism such as md5sum will be used to verify
the transfer), but means that there is no way to distinguish between a lost
packet and no packet.
 
David said:
The reason why ftp does not have proper completion singalling is that it
does not, as you previously stated, run on TCP/IP, but on UDP/IP.

Best to quit while you're ahead. FTP does not use UDP.
 
David said:
The reason why ftp does not have proper completion singalling is that it
does not, as you previously stated, run on TCP/IP, but on UDP/IP. TCP
communications establish a two-way link through a series of handshake
telegrams, and every data telegram must be acknowledged by the receiver.
UDP, on the other hand, is basically a one-way link (or, for ftp, two
one-way links in anti-parallel), and telegrams are not acknowledged. This
reduces the overhead (which is why it is used for ftp - the assumption is
that a higher level checking mechanism such as md5sum will be used to verify
the transfer), but means that there is no way to distinguish between a lost
packet and no packet.

When did this happen? I implemented an FTP client for both active and passive tcp
transfers and it seemed to work okay. Of course shutting down the tcp connections
was fun as the ftp implementations varied widely. Some shutdown only if you
did a QUIT and others ignored the QUIT and you had to unilaterally close the
connection in which case the former would complain. HTTP implementations were
almost just as bad were as likely to ignore the keepalive attribute no matter
what its proper default value was. You basically cannot write a comforming
FTP or HTTP implementation because if you did, you wouldn't be able to talk
to most of the other servers/clients out there.

Joe Seigh
 
In comp.arch David Brown said:
The reason why ftp does not have proper completion singalling is
that it does not, as you previously stated, run on TCP/IP, but on
UDP/IP.

I believe you have confused FTP, which does indeed use TCP for its
transport, with TFTP, which uses UDP. While there is a considerable
substring match on their acronyms, they are _very_ different beasts.

rick jones
 
_ said:
Best to quit while you're ahead. FTP does not use UDP.

Ok, I quit (even though I'm not ahead on this one). As another poster said,
I probably mixed it up with TFTP. Sorry.
 
Jan said:
Worked for me. The line break at the underline is unfortunate, however.

I've tried multiple times and the server says it can't find the file.
Has it been pulled?
 
Zalman said:
There is also:
http://www28.cplan.com/cbi_export/MA_OSAS002_266814_68-1_v2.pdf
which gives the specific quote:
2 cores, 2 threads, 26.5MByte of cache, and 1.72 billion
transistors at 100W
(2 threads means "2 threads per core" in case it is not clear. Slide
elsewhere indicates SMT.)

(The crypto folks will appreciate y'all adding an extra shifter per
core too. Its the little extra touches that count :-))

Undoubtedly they will appreciate that the file requires a login to D/L
it, as well. I'm getting suspicious when server after server can't
provide data for one reason or another.
 
I have two main memories of APL, both about 2.5 decades old.

To the APL programmer, every problem looks like a vector/matrix.
(To the man with a hammer, every problem looks like a nail.)

You can apply every monadic operator, in the correct sequence, to
zero, and the result is 42. (HHGTG reference)

So *that's* where the 42 came from. Neat! I always wonders why 42.
 
Jan said:
Worked for me. The line break at the underline is unfortunate, however.

If you go to download.intel.com/pressroom/kits/events/ you will see that
the content was removed Sept 13... is there an alternate location for
this material?
 
While we're on this subject, may be one you
geniuses can fix my FTP issue. I'm sure it's
an IE issue.

If I click on any FTP link, or open it in a new
window, the FTP doesn't happen, nothing
happens. It just sits there. However, I can
browse FTP folders and then download
manually from there.

This is not a problem in Netscape on the same
computer. I have also tried changing any relevant
IE settings to no avail, including the Passive FTP toggle
in Advanced. Security is on default, though I
tried lower, which didn't help. Running no virus scans
either, never have, got a router, but that's not the
culprit either.

IE is fully updated on this fully updated Win2k machine.
Actually I suspect one of the updates some months back
caused this, although I would've expected many more
people to have this problem, which doesn't seem to be
the case.

Any ideas?
 
Catching up on comp.arch...

Nick Maclaren said:
|> http://whatever.org.ar/~module/resources/computers/computer-arch/ia-64/vail_slides_2003.pdf

I am extremely impressed. Foil 7 gives the same order of magnitude
as I got to, but my current understanding is that the power has
been reduced by 2.5-3 times below that.

Pretty impressive, isn't it? And just to be clear, I had practically
nothing to do with this - the author of that Vail presentation, and a host
of others who actually implemented it, deserve all the credit.
From my point of view, that changes the IA64 line from something
that we would simply rule out of consideration to something that
we shall have to consider seriously.

Excellent! Shall I put you in touch with your local HP salesrep? :) I
believe Intel has said Montecito parts will be available next year, so
you'll want to buy some Madison-based machines right away, to get your
apps ported, etc.

-- Jim
 
Pretty impressive, isn't it? And just to be clear, I had practically
nothing to do with this - the author of that Vail presentation, and a host
of others who actually implemented it, deserve all the credit.


I agree that there are some pretty impressive advances in there, in
particular the fact that it isn't monitoring temperature but actual
power usage -- and does it on a real time basis, rather than something
controlled by the OS or BIOS making changes happen much more slowly.

However, I'm unclear on how much of the savings in getting the 100w TDP
specced for Montecito was done merely with the "guardband removal" by
defining TDP using some sort of "average case" power usage, measured
using SPEC2000 or similar test software. Since the TDP of McKinley
listed as 130w is measured using a different method, they really aren't
comparable. What would be the TDP of the McKinley measured the way
Montecito intends to? Suddenly its 130w might be 90w or something,
making the improvements in Montecito somewhat less impressive.

I'm thinking more here in terms of actual power usage and heat production,
i.e., what datacenter people are thinking of, rather than the problems
that board designers face in terms of insuring there is sufficient power
to the CPU socket for worst case power usage. Up until now, CPUs have
solved that by speccing the measured worst case (like AMD does, referred
to as a "power virus" in this slideset) or taking 90% of theoretical max
power (like Intel does, at least for x86, for IA64 I believe TDP was
specced as 100% of theoretical max power)

When you look at the power usage of McKinley on SPEC2000 on page 9, you
could quite reasonably define TDP as 97.5w based on the max power usage
of 75% within that suite. If you were willing to give up a little bit
of performance in exchange for power (as you might if you wanted to cram
two cores onto a die) you might define it as 65% and your TDP is now 84.5w.
Add the 90nm shrink and some Pentium M-like tweaks with lower power and
less leaky transistors on non critical paths, etc. and 100w with two cores
is suddenly well within reach, but most of it has been reached by defining
TDP differently, and applying existing techniques to IA-64 for the first
time, and not by some amazing leaps that reduced power 3x of where it
would have been with a simple shrink as the slides wish to imply.

Given that you can define the TDP as almost anything you like, using the
ability of power control, you could use the same part as a 100w TDP normal
Montecito, and as a 50w LV version, 25w ULV version, etc. Given that
board designers for the kind of high end systems running Montecito are
likely to over spec their designs, seeing boards capable of delivering
130w+ to Montecito despite the 100w TDP seems reasonable. So I wouldn't
be surprised that in addition to being able to toggle between normal mode,
LV mode and ULV mode, you might also be able to toggle to a turbo mode
that tosses that undoes that "guardband removal" that gets used for SPEC.
Might be worth checking the kernel parameters/firmware settings section on
the SPEC disclosures for Montecito quite carefully!
 
|>
|> I'm thinking more here in terms of actual power usage and heat production,
|> i.e., what datacenter people are thinking of, rather than the problems
|> that board designers face in terms of insuring there is sufficient power
|> to the CPU socket for worst case power usage. Up until now, CPUs have
|> solved that by speccing the measured worst case (like AMD does, referred
|> to as a "power virus" in this slideset) or taking 90% of theoretical max
|> power (like Intel does, at least for x86, for IA64 I believe TDP was
|> specced as 100% of theoretical max power)

Not entirely. Some of the 'mobile' designs tend to use an average
that assumes that they are being used primarily by slack-jawed
executives staring gormlessly at marketing charts. Start using
them for anything computationally intensive, and the power usage
goes through the roof.

This is a problem in HPC - CPUs are NOT sitting idle most of the
time, and averages that assume that are useless.


Regards,
Nick Maclaren.
 
Nick Maclaren said:
|>
|> I'm thinking more here in terms of actual power usage and heat production,
|> i.e., what datacenter people are thinking of, rather than the problems
|> that board designers face in terms of insuring there is sufficient power
|> to the CPU socket for worst case power usage. Up until now, CPUs have
|> solved that by speccing the measured worst case (like AMD does, referred
|> to as a "power virus" in this slideset) or taking 90% of theoretical max
|> power (like Intel does, at least for x86, for IA64 I believe TDP was
|> specced as 100% of theoretical max power)

Not entirely. Some of the 'mobile' designs tend to use an average
that assumes that they are being used primarily by slack-jawed
executives staring gormlessly at marketing charts. Start using
them for anything computationally intensive, and the power usage
goes through the roof.

This is a problem in HPC - CPUs are NOT sitting idle most of the
time, and averages that assume that are useless.

Nick and Douglas:

I'm afraid that to adequately answer your questions would require me to
disclose things about how this really works in Montecito that HP/Intel are
not prepared to make public at this time. Sorry.

Let me just say that the Montecito designers are well aware that servers
(both HPC and commercial) don't spend all their time in the kernel's idle
loop, as well as the kind of power and thermal issues that datacenter
designers face.

-- Jim
 
I agree that there are some pretty impressive advances in there, in
particular the fact that it isn't monitoring temperature but actual
power usage -- and does it on a real time basis, rather than something
controlled by the OS or BIOS making changes happen much more slowly.

However, I'm unclear on how much of the savings in getting the 100w TDP
specced for Montecito was done merely with the "guardband removal" by
defining TDP using some sort of "average case" power usage, measured
using SPEC2000 or similar test software. Since the TDP of McKinley
listed as 130w is measured using a different method, they really aren't
comparable. What would be the TDP of the McKinley measured the way
Montecito intends to? Suddenly its 130w might be 90w or something,
making the improvements in Montecito somewhat less impressive.

Note that with McKinley Intel defined TDP in exactly the same way that
AMD has defined it for their Opterons, ie the higher power consumption
that they will ever produce a chip for the entire line. No McKinley
chip ever reached that 130W power consumption figure, the highest
power consumption of any McKinley, as measured by executing a
worst-case piece of code, was 107W for the 1.5GHz/6MB cache chip.

You can find this info in the Electrical Specifications section of the
Itanium2 datasheets:

ftp://download.intel.com/design/Itanium2/datashts/25094502.pdf

I'm thinking more here in terms of actual power usage and heat production,
i.e., what datacenter people are thinking of, rather than the problems
that board designers face in terms of insuring there is sufficient power
to the CPU socket for worst case power usage. Up until now, CPUs have
solved that by speccing the measured worst case (like AMD does, referred
to as a "power virus" in this slideset) or taking 90% of theoretical max
power (like Intel does, at least for x86, for IA64 I believe TDP was
specced as 100% of theoretical max power)

Reading Intel's Itanium2 datasheets, they list both worst case a la
thermal virus as well as a "thermal design power" for the board
designer types.

I have not yet seen how Intel plans on defining TDP for their next
generation of Itanium. As you correctly state, the definition of
"Thermal Design Power" has always been a rather fluid thing.
 
Jim Hull said:
Nick and Douglas:

I'm afraid that to adequately answer your questions would require me to
disclose things about how this really works in Montecito that HP/Intel are
not prepared to make public at this time. Sorry.

Let me just say that the Montecito designers are well aware that servers
(both HPC and commercial) don't spend all their time in the kernel's idle
loop, as well as the kind of power and thermal issues that datacenter
designers face.

-- Jim

It is interesting the way that some posters post as if the chip and system
designers working for large computer companies were all a bunch of shuffling
morons. Duh, you mean folks actually run code on these here thangs? Hyuk
hyuk.

del cecchi
 
Back
Top