AMD to integrate PCIe into CPU

  • Thread starter Thread starter YKhan
  • Start date Start date
Kai said:
While a serial (and encoded) link is way easier to handle, the sky is
not the limit. Consider that at 10Gb/s standard FR-4 board material
have quite frightening losses, which limits the length you can send
it. The several meters that Del talk about is on cables, I think.

And just exactly why would you want to go several meters on a CPU to
CPU interconnect (at least in the x86 mass-market)?

Sure, the parallel link as other problems, also pointed out by Del,
but my point here is that blindly claiming that either technology is
the "right thing" is not a good idea.

Latency, bandwidth, die area, power consumption, and maximum trace
length should all be considered.




Definitely - and as we know: money can buy you bandwidth, but latency
is forever.

Think of the performace of SDRAMs - while the DDR's have awesome peak
BW numbers, they rarely translate into real-world benefits that is
worth taking about.


Kai
PCI express is an IO expansion network, I don't know where the cpu-cpu
talk came from. Architecturally it is sort of master slave.

As for length, it is limited by the loss budget to 8db of loss at
1.25GHz as I recall, and by ISI to about 100 ps. HT on the other hand
transmits data in parallel with a clock to provide timing and no
alignment so distance is limited primarily by skew as defined in the HT
spec.

The 2.5 Gbit interfaces like PCI-e can go a couple of feet on a
backplane. IB used 20 inches as an objective.

The serdes based standards do indeed have somewhat longer latency due to
the 10 bit penalty on each end. But HT also has to make the data wider
for convenient handling.

This kind of stuff is what I do. So I am familiar with the various
limitations.
 
On 26 Jul 2005 11:05:21 -0700, "David Kanter" <[email protected]>
wrote:
....snip previous msgs... - nnn
One never knows what the future holds. Anyway, it's pretty obvious
that parallel transmission (read HT) is the way of the past. If you
look at any high performance interconnect, they are all serial. Talk
to the Rambus guys, they know what they are doing...
Surely they know - they sue everyone and their mother in law (pun
intended)
nnn
 
I didn't realize that PCI-E and HTX had similar connectors. Is one an
extension of the other (eg. HTX is a few extra slots beyond the PCIE
slots, like VESA was compared to ISA), or something like EISA was to
ISA, with somewhat deeper slots? Or are they totally incompatible but
they look similar?

More like Slot 1 vs. Slot A. Same physical connector but turned
backwards (or at least that is my understanding of it). The
electrical specs are, not surprisingly, totally incompatible.
 
While a serial (and encoded) link is way easier to handle, the sky is
not the limit. Consider that at 10Gb/s standard FR-4 board material
have quite frightening losses, which limits the length you can send
it. The several meters that Del talk about is on cables, I think.

I don't really know anything about board engineering, but I think that
one thing we might want to consider is that in the near future some CPU
interconnects will simply be on-die; and you can afford to have
ridiculously fast and nice interconnects there.

Also, does PCB have worse loss than cables, as that's what you seem to
be implying.
And just exactly why would you want to go several meters on a CPU to
CPU interconnect (at least in the x86 mass-market)?

Sure, the parallel link as other problems, also pointed out by Del,
but my point here is that blindly claiming that either technology is
the "right thing" is not a good idea.

Latency, bandwidth, die area, power consumption, and maximum trace
length should all be considered.

Absolutely, and for a CPU interconnect you probably end up sacrificing
the latter. But as I understand, serial encoding schemes can have a
very small difference in latency from a parallel one (when designed
properly).
Definitely - and as we know: money can buy you bandwidth, but latency
is forever.

Hehehe. We should start selling latency to compete with diamonds,
maybe we can piggyback off of all those De Beers commercials...
Think of the performace of SDRAMs - while the DDR's have awesome peak
BW numbers, they rarely translate into real-world benefits that is
worth taking about.

CPU's always need more bandwidth though, and I think the benefits are
pretty obvious, especially when you start talking about servers with
multiple processors.

David
 
Yousuf said:
Not quite, HT is a set of multiple serial interfaces. You can go from
one to 16 unidirectional links, one to 16 in the other direction too.
Exactly the same as PCI-e.

If I may quote from http://www.hypertransport.org/tech/tech_faqs.cfm:

"Serial technologies such as PCI Express and RapidIO require
serial-deserializer interfaces and have the burden of extensive
overhead in encoding parallel data into serial data, embedding clock
information, re-acquiring and decoding the data stream. The parallel
technology of HyperTransport needs no serdes and clock encoding
overhead making it far more efficient in data transfers."

Try to ignore the PR-speak in there, and focus on this part "The
parallel technology of HyperTransport".

HT is bit parallel and delivers at least 2 bits per cycle in parallel;
it's about as parallel as a PCI bus, it just happens to be much more
intelligently designed for the task at hand (and thankfully
unidirectional, and not multidrop).

Now, let me quote someone who knows quite a bit about CPU<->CPU
interconnects:

http://www.realworldtech.com/forums...stNum=3546&Thread=328&roomID=11&entryID=53843

"Using equivelent technology the bit serial scheme will have 2X+ the
datarate per pin. the latency differental is at worst 2 bit times but
can be exactly the same if not better depending on the actual protocol
being used."

PCIe is bit serial, HT, as I explained above, is not. Yes, the latency
is a little worse, but the amount of times it takes to transmit two
bits is pretty darn negligible for double the bandwidth.

David
 
I don't really know anything about board engineering, but I think that
one thing we might want to consider is that in the near future some CPU
interconnects will simply be on-die; and you can afford to have
ridiculously fast and nice interconnects there.

Also, does PCB have worse loss than cables, as that's what you seem to
be implying.

Cabling? I dunno if this is still cabling but these guys seem to think so:
http://ap.pennnet.com/Articles/Arti...Articles&Subsection=Display&ARTICLE_ID=220939
 

Knock the colon off to get to this.
"Serial technologies such as PCI Express and RapidIO require
serial-deserializer interfaces and have the burden of extensive
overhead in encoding parallel data into serial data, embedding clock
information, re-acquiring and decoding the data stream. The parallel
technology of HyperTransport needs no serdes and clock encoding
overhead making it far more efficient in data transfers."

Try to ignore the PR-speak in there, and focus on this part "The
parallel technology of HyperTransport".

HT is bit parallel and delivers at least 2 bits per cycle in parallel;
it's about as parallel as a PCI bus, it just happens to be much more
intelligently designed for the task at hand (and thankfully
unidirectional, and not multidrop).

Selective quoting is never a good idea. Just above, read: "Thus, the
HyperTransport Technology eliminates the problems associated with high
speed parallel buses with their many noisy bus signals (multiplexed
data/address, and clock and control signals) while providing scalable
bandwidth wherever it is needed in the system."

No, HT is not "about as parallel as a PCI bus".
Now, let me quote someone who knows quite a bit about CPU<->CPU
interconnects:

http://www.realworldtech.com/forums...stNum=3546&Thread=328&roomID=11&entryID=53843

"Using equivelent technology the bit serial scheme will have 2X+ the
datarate per pin. the latency differental is at worst 2 bit times but
can be exactly the same if not better depending on the actual protocol
being used."

PCIe is bit serial, HT, as I explained above, is not. Yes, the latency
is a little worse, but the amount of times it takes to transmit two
bits is pretty darn negligible for double the bandwidth.

Where do you get double the bandwidth? Both are currently running at
~4GB/s at x16. From the same article, next para, as you quoted above:
"RapidIO defines a data rate of 3.125 gigabit/second, while PCI Express
defines a 2.5 gigabit/second data rate. The latest 2.0 HyperTransport
specification defines a 2.8 gigatransfers/second data rate".
 
Yousuf said:
Knock the colon off to get to this.

Done, sorry about that.
Selective quoting is never a good idea. Just above, read: "Thus, the
HyperTransport Technology eliminates the problems associated with high
speed parallel buses with their many noisy bus signals (multiplexed
data/address, and clock and control signals) while providing scalable
bandwidth wherever it is needed in the system."

No, HT is not "about as parallel as a PCI bus".

Actually HT is just as parallel as a PCI bus WRT the bit lanes, which
is all I was speaking about. The reason I didn't bother quoting the
rest is that it doesn't deal with whether the data transmission is
parallel or serial. I'll be the first to admit that HT is alright, but
it could be better if it were serial.
Where do you get double the bandwidth? Both are currently running at
~4GB/s at x16. From the same article, next para, as you quoted above:
"RapidIO defines a data rate of 3.125 gigabit/second, while PCI Express
defines a 2.5 gigabit/second data rate. The latest 2.0 HyperTransport
specification defines a 2.8 gigatransfers/second data rate".

Perhaps you didn't understand the quote, but it was referring to
abstract, theoretical serial vs. parallel communications, not PCIe vs.
HT. I am not arguing that PCIe has more bandwidth than HT, I am
arguing that bit-serial interconnects are better than bit-arallel ones.
I am further stating (because it is a fact) that HT is bit-parallel.
This limits the speed of HT.

David
 
George said:
Knock the colon off to get to this.




Selective quoting is never a good idea. Just above, read: "Thus, the
HyperTransport Technology eliminates the problems associated with high
speed parallel buses with their many noisy bus signals (multiplexed
data/address, and clock and control signals) while providing scalable
bandwidth wherever it is needed in the system."

No, HT is not "about as parallel as a PCI bus".

Knowledge and reading the specifications in question is even a better
idea. HT sends 1 to 32 bits in parallel (the most common
instantiations seem to be 8 or 16 bits of data) accompanied by a clock
for every 8 bits which is used to latch all the bits on the receiving
chip and a framing signal used to mark the start of 4 byte words and
distinguish data from commands. No alignment of the clock and data is
performed so all of the jitter and skew and timing tolerance must be
accomodated by the width of the data bit. The HT spec provides a
detailed allocation of the time in question.
This isn't true. There is an unavoidable latency penalty associated
with serializing the bytes and deserializing them on the other end.
I would wonder who said that above. What is his name?

The real latency difference comes in error control. If you are going to
wait until the data is known good, you have to wait for 512 bytes in HT,
and to the end of the packet in PCI-E. HT doesn't, so an error causes a
crash. PCI-E and IB do, and retry, so errors are transparent.
Where do you get double the bandwidth? Both are currently running at
~4GB/s at x16. From the same article, next para, as you quoted above:
"RapidIO defines a data rate of 3.125 gigabit/second, while PCI Express
defines a 2.5 gigabit/second data rate. The latest 2.0 HyperTransport
specification defines a 2.8 gigatransfers/second data rate".
Go download the spec and tell us how many picoseconds are allowed for
skew and tolerance at the board level.

I would but a dialup is a little slow for that.
 
David Kanter said:
I don't really know anything about board engineering, but I think that
one thing we might want to consider is that in the near future some CPU
interconnects will simply be on-die; and you can afford to have
ridiculously fast and nice interconnects there.

For on-die interconnects, why would you want to go serial at all?
Also, does PCB have worse loss than cables, as that's what you seem to
be implying.

Standard FR-4 PCB material is quite lousy, so yes, cables have lower
loss (or rather: tighter margins, which translate into lower loss)
than FR-4.

Regards,


Kai
 
While a serial (and encoded) link is way easier to handle, the sky is
For on-die interconnects, why would you want to go serial at all?

Good point, I don't know what I was thinking about with that line of
thought.
Standard FR-4 PCB material is quite lousy, so yes, cables have lower
loss (or rather: tighter margins, which translate into lower loss)
than FR-4.

Thanks for that info, I didn't realize that. I had always figured it
must have been the other way around.

David
 
[snip]
Knowledge and reading the specifications in question is even a better
idea. HT sends 1 to 32 bits in parallel (the most common
instantiations seem to be 8 or 16 bits of data) accompanied by a clock
for every 8 bits which is used to latch all the bits on the receiving
chip and a framing signal used to mark the start of 4 byte words and
distinguish data from commands. No alignment of the clock and data is
performed so all of the jitter and skew and timing tolerance must be
accomodated by the width of the data bit. The HT spec provides a
detailed allocation of the time in question.

This isn't true. There is an unavoidable latency penalty associated
with serializing the bytes and deserializing them on the other end.

Right, but I believe the point Aaron Spink was making was that a bad
parallel protocol can be worse than a good serial protocol.
I would wonder who said that above. What is his name?

See above, he's an old friend from comp.arch : ) If you have a
contrary view, you can weigh in back at www.realworldtech.com.
The real latency difference comes in error control. If you are going to
wait until the data is known good, you have to wait for 512 bytes in HT,
and to the end of the packet in PCI-E. HT doesn't, so an error causes a
crash. PCI-E and IB do, and retry, so errors are transparent.

Interesting. So PCIe has 'better' RAS at the cost of
latency...interesting.
Go download the spec and tell us how many picoseconds are allowed for
skew and tolerance at the board level.

I would but a dialup is a little slow for that.

According to the PDF (page 325):

Uncertainty in CADIN relative to CLKIN due to PCB trace length
mismatch: 10ps

Within pair differential skew of CAD/CTL and CLK due to PCB trace
length mismatch: 5ps

Uncertainty in CADIN relative to other CADIN due to PCB trace length
mismatch: 20ps

PCB interconnect induced jitter caused by reflections, ISI, and
crosstalk CAD/CTL setup side of CLK: 22ps

PCB interconnect induced jitter caused by reflections, ISI, and
crosstalk CAD/CTL hold side of CLK: 22ps

Can you translate that into something useful for me?

David
 
Kai said:
For on-die interconnects, why would you want to go serial at all?




Standard FR-4 PCB material is quite lousy, so yes, cables have lower
loss (or rather: tighter margins, which translate into lower loss)
than FR-4.

Regards,


Kai
FR406 isn't bad up to a couple of Gb/s. Above that the dielectric
absorbtion loss tangent gets bad. And cables have larger conductors
than PWB, mostly.

On Die is a whole nother thing because you aren't pin limited.
 
David said:
[snip]

Knowledge and reading the specifications in question is even a better
idea. HT sends 1 to 32 bits in parallel (the most common
instantiations seem to be 8 or 16 bits of data) accompanied by a clock
for every 8 bits which is used to latch all the bits on the receiving
chip and a framing signal used to mark the start of 4 byte words and
distinguish data from commands. No alignment of the clock and data is
performed so all of the jitter and skew and timing tolerance must be
accomodated by the width of the data bit. The HT spec provides a
detailed allocation of the time in question.


This isn't true. There is an unavoidable latency penalty associated
with serializing the bytes and deserializing them on the other end.


Right, but I believe the point Aaron Spink was making was that a bad
parallel protocol can be worse than a good serial protocol.

I would wonder who said that above. What is his name?


See above, he's an old friend from comp.arch : ) If you have a
contrary view, you can weigh in back at www.realworldtech.com.

The real latency difference comes in error control. If you are going to
wait until the data is known good, you have to wait for 512 bytes in HT,
and to the end of the packet in PCI-E. HT doesn't, so an error causes a
crash. PCI-E and IB do, and retry, so errors are transparent.


Interesting. So PCIe has 'better' RAS at the cost of
latency...interesting.

Go download the spec and tell us how many picoseconds are allowed for
skew and tolerance at the board level.

I would but a dialup is a little slow for that.


According to the PDF (page 325):

Uncertainty in CADIN relative to CLKIN due to PCB trace length
mismatch: 10ps

Within pair differential skew of CAD/CTL and CLK due to PCB trace
length mismatch: 5ps

Uncertainty in CADIN relative to other CADIN due to PCB trace length
mismatch: 20ps

PCB interconnect induced jitter caused by reflections, ISI, and
crosstalk CAD/CTL setup side of CLK: 22ps

PCB interconnect induced jitter caused by reflections, ISI, and
crosstalk CAD/CTL hold side of CLK: 22ps

Can you translate that into something useful for me?

David
Happy to. PCB delay is about 70 ps/cm. Tracking on pcb delay is 5 to
10 percent. CADIN is a data bit, CLK is the clock. So the traces must
match in length between clk and any of the 9 data/ctl bits by 1.5mm.
Actually, since the tolerance on delay is 5percent, if the length
mismatch was 0, the overall length could only be 3 cm (210 ps) or the 5%
eats your 10 ps.

ISI is the change in transition time due to different pulse widths in
the data, caused by freq dependent loss in interconnect. So if the rise
time is degraded by enough to cause the transitions in 0101010 to be
different place than 00000000011111111100000000 by 22 ps you are also in
violation.

And the within pair skew is the two conductors of the diff pair.
Tolerance gets them too. and 5 ps?

I didn't know Aaron was a PHY type guy. I didn't go to the page due to
dialup and only one phone line.
 
Done, sorry about that.


Actually HT is just as parallel as a PCI bus WRT the bit lanes, which
is all I was speaking about. The reason I didn't bother quoting the
rest is that it doesn't deal with whether the data transmission is
parallel or serial. I'll be the first to admit that HT is alright, but
it could be better if it were serial.

You have to define better - add serialized and lose on latency. I see HT
as more of a hybrid inter-connect, with its packetization and lack of
control signals or multiplexing. It's somewhat similar to the P4 FSB with
clock signals allocated to 8-bit widths instead of 16-bit. I believe some
confusion has arisen here since I'm almost sure that in its original form,
LDT was said to be a serial interconnect.
Perhaps you didn't understand the quote, but it was referring to
abstract, theoretical serial vs. parallel communications, not PCIe vs.
HT. I am not arguing that PCIe has more bandwidth than HT, I am
arguing that bit-serial interconnects are better than bit-arallel ones.
I am further stating (because it is a fact) that HT is bit-parallel.
This limits the speed of HT.

Horses for courses I guess - HT has to do the job of CPU interconnect which
includes non-local memory accesses. Would it really be better to take the
latency hit on those?

I'm afraid your para above starting "PCIe is bit serial, HT, as I...." on
re-read still indicates to me that you were suggesting that PCIe has
"double the bandwidth" of HT.
 
Knowledge and reading the specifications in question is even a better
idea. HT sends 1 to 32 bits in parallel (the most common
instantiations seem to be 8 or 16 bits of data) accompanied by a clock
for every 8 bits which is used to latch all the bits on the receiving
chip and a framing signal used to mark the start of 4 byte words and
distinguish data from commands. No alignment of the clock and data is
performed so all of the jitter and skew and timing tolerance must be
accomodated by the width of the data bit. The HT spec provides a
detailed allocation of the time in question.

Yeah the spec is better but looooong and I'm lazy and not able to see PCIe
specs anyway, since they cost $$.:-)
This isn't true. There is an unavoidable latency penalty associated
with serializing the bytes and deserializing them on the other end.

I would wonder who said that above. What is his name?

If you mean the quote from RealWorldTech it was Aaron Spink.
The real latency difference comes in error control. If you are going to
wait until the data is known good, you have to wait for 512 bytes in HT,
and to the end of the packet in PCI-E. HT doesn't, so an error causes a
crash. PCI-E and IB do, and retry, so errors are transparent.

Are you saying that HT implementations just ignore the CRC, or it doesn't
work right at the 64-bits into the next window?
Go download the spec and tell us how many picoseconds are allowed for
skew and tolerance at the board level.

From a Rev2.00 I DL'd last December there are several numbers listed for
relative skews but the tightest is for "Within pair differential skew of
CAD/CTL and CLK due to PCB trace length mismatch" the value listed for a
1GHz clock, which is common currently, it shows 5ps.
 
You have to define better - add serialized and lose on latency.

Aaron's point was that the loss on latency isn't that bad if you do
things right.
I see HT
as more of a hybrid inter-connect, with its packetization and lack of
control signals or multiplexing. It's somewhat similar to the P4 FSB with
clock signals allocated to 8-bit widths instead of 16-bit. I believe some
confusion has arisen here since I'm almost sure that in its original form,
LDT was said to be a serial interconnect.

Horses for courses I guess - HT has to do the job of CPU interconnect which
includes non-local memory accesses. Would it really be better to take the
latency hit on those?

If the latency hit is small, quite possibly.
I'm afraid your para above starting "PCIe is bit serial, HT, as I...." on
re-read still indicates to me that you were suggesting that PCIe has
"double the bandwidth" of HT.

Well, that wasn't the case.

David
 
Back
Top