S
Student
Is Hypertransport faster or Infiniband?
Is Hypertransport faster or Infiniband?
http://139.95.253.214/SRVS/CGI-BIN/...00000000220593058,K=9241,Sxi=1,Case=obj(4650)
How much faster is HyperTransport™ than other technologies like PCI,
PCI-X or Infiniband?
Traditional PCI transfers data at 133 MB/sec, PCI-X at 1 GB/sec,
InfiniBand at about 4GB/sec in the 12 channel implementation and 1.25 in
the more popular 4 channels. HyperTransport transfers data at 6.4 GB. It
is about 50 times faster than PCI, 6 times faster than PCI-X and 5 times
faster than InfiniBand 4 channels. It is important to remember that
InfiniBand is not an alternative to HyperTransport technology. Each
HyperTransport I/O bus consists of two point-to-point unidirectional
links. Each link can be from two bits to 32 bits wide. Standard bus
widths of 2, 4, 8, 16, and 32 bits are supported. Asymmetric
HyperTransport I/O buses are designed to be permitted in situations
requiring different upstream and downstream bandwidths. Commands,
addresses, and data (CAD) all use the same bits. So, a simple, low-cost
HyperTransport I/O implementation using two CAD bits in each direction
is designed to provide a raw bandwidth of up to 400 Megabytes per second
in each direction (at the highest possible speed of 1.6 Gbit/sec). Two
directions combined give almost 8 times the peak bandwidth of PCI 32/33.
A larger implementation using 16 CAD bits in each direction is designed
to provide bandwidth up to 3.2 Gigabytes per second both ways - 48 times
the peak bandwidth of 32-bit PCI running at 33MHz.
Is Hypertransport faster or Infiniband?
George said:It doesn't matter - they are not targeted at the same "problem". I must
say Inifiniband's proponents have not helped here by announcing it as a
"do-all" "solution" for on-board as well as off-board links... so bloody
confusing. The way I see it, Inifiniband, despite claims, is an off-board
wired or back-plane transport which possibly has better error
detection/recovery.
On that last point, I keep reading that Hypertransport suffers from lack of
error detection/recovery but CRC checking and packet retries are clearly in
the specs so I don't know what the full story is there yet. To put things
in perspective, Hypertransport links on currently available mbrds run at
approximately the same speed as current PCI Express, which on a x16 link
has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
bandwidth of 4GB/s in each direction.
It may be in the spec, but the retry is recent. So what do you do in a
pc when you get an error and don't find out about it for 512 bytes? Do
you retry the last 512 bytes worth of transactions? If you would check
into it you would find that the actual result is a crash of some sort.
I don't know anyone that has been pushing IB as a "do all" solution.
Clearly it is not a FSB. And IB does for sure have better recovery and
detection. Although HT is trying, but they have an installed base
problem with the networking extensions.
Del said:It may be in the spec, but the retry is recent. So what do you do in a
pc when you get an error and don't find out about it for 512 bytes? Do
you retry the last 512 bytes worth of transactions? If you would check
into it you would find that the actual result is a crash of some sort.
So how does that work in practice? One gathers from Stone and
Partridge work on ethernet checksum vs CRC errors that undetected
errors are probably much more common than anyone would have cared to
think. Does anybody know about HT-type traffic? If someone bothers to
do a study, we'll find out that computers have turned into random
number generators?
Robert said:So how does that work in practice? One gathers from Stone and
Partridge work on ethernet checksum vs CRC errors that undetected
errors are probably much more common than anyone would have cared to
think. Does anybody know about HT-type traffic? If someone bothers to
do a study, we'll find out that computers have turned into random
number generators?
As it is, the Stone and Partridge stuff doesn't seem to have created
much more than some interesting exchanges on comp.arch. Does anybody
care anymore? I'm sure that IBM does, but can it afford to?
RM
Del said:Robert Myers wrote:
I don't know about their work, and I know little about ethernet. I do
know from experience in the lab that a 32 bit crc, properly chosen, with
retry can cope with quite high error rates without any problem with the
system. And I would believe that the systems in question would not
tolerate very many undetected errors because the disks for the virtual
memory, and the coherence traffic if any was carried over the network in
question along with all the other I/O traffic.
the wonderful person Robert Myers said:In overclocking tests, I've found that PC's will tolerate significant
memory errors without giving any immediate indication of a problem.
Short of a crash, I don't know how you'd know anything was wrong
without application-level checking.
I think you did participate in the discussion of this subject on
comp.arch:
Stone, J., Partridge, C.: "When The CRC and TCP Checksum Disagree",
Proceedings of the ACM conference on Applications, Technologies,
Architectures, and Protocols for Computer Communication (SIGCOMM'00),
Stockholm, Sweden, August/September 2000, pp. 309-319
Abstract
"Traces of Internet packets from the past two years show that between 1
packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
links where link-level CRCs should catch all but 1 in 4 billion errors.
For certain situations, the rate of checksum failures can be even
higher: in one hour-long test we observed a checksum failure of 1
packet in 400. We investigate why so many errors are observed, when
link-level CRCs should catch nearly all of them.We have collected
nearly 500,000 packets which failed the TCP or UDP or IP checksum. This
dataset shows the Internet has a wide variety of error sources which
can not be detected by link-level checks. We describe analysis tools
that have identified nearly 100 different error patterns. Categorizing
packet errors, we can infer likely causes which explain roughly half
the observed errors. The causes span the entire spectrum of a network
stack, from memory errors to bugs in TCP.After an analysis we conclude
that the checksum will fail to detect errors for roughly 1 in 16
million to 10 billion packets. From our analysis of the cause of
errors, we propose simple changes to several protocols which will
decrease the rate of undetected error. Even so, the highly non-random
distribution of errors strongly suggests some applications should
employ application-level checksums or equivalents."
It may not be a good model for the possiblity of other link-level
errors, but it does make you wonder.
In overclocking tests, I've found that PC's will tolerate significant
memory errors without giving any immediate indication of a problem.
Short of a crash, I don't know how you'd know anything was wrong
without application-level checking.
Robert Myers said:I think you did participate in the discussion of this subject on
comp.arch:
Stone, J., Partridge, C.: "When The CRC and TCP Checksum Disagree",
Proceedings of the ACM conference on Applications, Technologies,
Architectures, and Protocols for Computer Communication (SIGCOMM'00),
Stockholm, Sweden, August/September 2000, pp. 309-319
Abstract
"Traces of Internet packets from the past two years show that between 1
packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
links where link-level CRCs should catch all but 1 in 4 billion errors.
For certain situations, the rate of checksum failures can be even
higher: in one hour-long test we observed a checksum failure of 1
packet in 400. We investigate why so many errors are observed, when
link-level CRCs should catch nearly all of them.We have collected
nearly 500,000 packets which failed the TCP or UDP or IP checksum. This
dataset shows the Internet has a wide variety of error sources which
can not be detected by link-level checks. We describe analysis tools
that have identified nearly 100 different error patterns. Categorizing
packet errors, we can infer likely causes which explain roughly half
the observed errors. The causes span the entire spectrum of a network
stack, from memory errors to bugs in TCP.After an analysis we conclude
that the checksum will fail to detect errors for roughly 1 in 16
million to 10 billion packets. From our analysis of the cause of
errors, we propose simple changes to several protocols which will
decrease the rate of undetected error. Even so, the highly non-random
distribution of errors strongly suggests some applications should
employ application-level checksums or equivalents."
It may not be a good model for the possiblity of other link-level
errors, but it does make you wonder.
In overclocking tests, I've found that PC's will tolerate significant
memory errors without giving any immediate indication of a problem.
Short of a crash, I don't know how you'd know anything was wrong
without application-level checking.
RM
Del said:OK, thanks for the reminder. As I recall, claiming that the checksums
were missing the errors was a mild distortion. The errors were
transpositions of data blocks being fetched to the adapter so the data
was bad when it got there. Is that not the case?
As for PCs not being affected by memory errors, how many do you estimate
it took to crash the system? The lab system I was referring to was
seeing many errors per second.
The lab system was in the range of 10**5 error/sec and still ranRobert Myers said:That was one of the explanations. I wasn't convinced there was any one
single explanation that would have dominated.
I concluded that, if you really had to know that your data are
reliable, you should probably do your own end-to-end error checking.
Oh, a few errors per hour will generally let a system run, IIRC. The
difference in speed between running on the ragged edge like that and
not running at all is so small that it isn't worth running on the
ragged edge.
RM
Del Cecchi said:The lab system was in the range of 10**5 error/sec and
still ran perfectly with 32 bit crc and retry.
A bad cable can really prove your recovery mechanism.
So it sounds like the protocol or the software or something
for the ethernet systems in question were broken.
Yes, this is barely possible on a 100 Mbit/s system.
A 64 byte packet has a 60% chance of arriving error free.
Unfortunately, a 1500 byte packet has only a 0.0006% chance,
assuming random error distribution. So acks get through,
but data in will be bad.
Yep! Beware of newbies with crimpers! RJ45s are hard to do,
and not just because the correct pattern is counter-intuitive.
All the intuitive patterns split a pair which often gives
some connectivity but poor performance. There are 40,320 ways
of wiring the 8 conductor cable straight-thru. All but 1,152
split at least one pair necessary for 10baseT or 100baseTX.
George Macdonald said:Switches with just a Web based interface, which allow you
to collect error rates and mirror ports, are cheap now.
All the Cat5 that I put in is now running 1Gb/s Full Duplex,
with maybe 5-6 errors/week/port due, I believe, to speed
ramping at PC power on.
While there is undoubtedly bad cable around, much of it
was done by "professionals" or taken off the shelf... or
even just caused by physical abuse or misrouting in the
wall or ceiling/floor cavity. I also tend to think much
of "bad cable" is due to legacy "telephone" mentality,
equipment practices and personnel. To me the punch-down
block is a scary and dangerous place.
This was a parallel source synchronous link (RIO) running at a GB/sec.Robert Redelmeier said:Yes, this is barely possible on a 100 Mbit/s system.
A 64 byte packet has a 60% chance of arriving error free.
Unfortunately, a 1500 byte packet has only a 0.0006% chance,
assuming random error distribution. So acks get through,
but data in will be bad.
Yep! Beware of newbies with crimpers! RJ45s are hard to do,
and not just because the correct pattern is counter-intuitive.
All the intuitive patterns split a pair which often gives
some connectivity but poor performance. There are 40,320 ways
of wiring the 8 conductor cable straight-thru. All but 1,152
split at least one pair necessary for 10baseT or 100baseTX.
I would hope any system with anything near 0.1% error rates
was using ECC, not just CRC.
-- Robert
Del Cecchi said:This was a parallel source synchronous link (RIO) running
at a GB/sec. And the error rate was packet errors.
I didn't have any way to collect statistics on bit errors.
CRC with retry is the moral equivilent of ECC.