Bidirectional PCIe

  • Thread starter Thread starter Greg Berchin
  • Start date Start date
G

Greg Berchin

I need an ATX or microATX Athlon 64 motherboard that supports true
bidirectional 8x PCI Express. I have read that the PCIe x16 slots on many
motherboards are x16 outbound only, and x1 inbound, so I don't want to make a
mistake here. Suggestions welcome.

Thanks,
Greg
 
Greg said:
I need an ATX or microATX Athlon 64 motherboard that supports true
bidirectional 8x PCI Express. I have read that the PCIe x16 slots on many
motherboards are x16 outbound only, and x1 inbound, so I don't want to make a
mistake here. Suggestions welcome.

Thanks,
Greg

The only way to refute that statement, would be if I had access to the
PCI Express spec. Unfortunately, it costs money, so I don't have a copy.

My expectation would be, that the link training sequence would require
matched TX/RX interfaces - a lane would only train up, if protocols
can work bidirectionally.

This document only hints at that. I wouldn't really expect to find a lot
of free technical info, derived from the official spec. So I have to
search for "table scraps" instead. I got real lucky, when
Tomshardware posted the pinout for the x16 slot. If they hadn't done
that, I'd know very little about PCI Express at all.

http://download.intel.com/design/chipsets/specupdt/30304203.pdf

"MCH fails to train when non-TS1/TS2 training sequences are received

Problem: During the PCI Express training sequence, if a broken endpoint
or a good endpoint on a broken board has correct receiver termination
on any lane and transmits signals on that lane that can be seen at the
MCH and are not valid TS1/TS2 training sequences, the MCH will fail to
train that link at all.

Implication: The PCI Express specification intends that, if some lanes
are transmitting bogus data instead of valid training sequences, those
lanes should be treated as broken, and the link should fail down to
an acceptable width (such as x1). If lane 0 were failing in this manner,
the link would fail to train per the PCI Express specification. If a
higher-numbered lane were failing in this manner, the PCI Express
specification requires that the link attempt to train as a x1 on
lane 0 - the MCH will not train in this scenario.

Failures are anticipated to occur because of a broken transmitter/receiver
path, or a silent transmitter. None of those failure modes will cause the
MCH to fail to train, since either the receiver termination will be missing,
or the transmitted signals will not be seen at the MCH. In order to see
invalid transmitted signals at the MCH, either a logic bug in the other
PCI Express endpoint would be required, or a signal integrity issue so
severe as to make operation impossible."

So that to me, hints at a need for symmetry in the number of transmit/receive
interfaces.

A valid concern, is "subpopulation" of lanes. Installing a PCI Express x16
connector, and only connecting x8 lanes or x4 lanes to it. You can detect this
by counting the number of surface mount capacitors next to the connector. PCI
Express is high speed differential, with capacitive coupling for the signals. Thus,
each end needs capacitors, and on some motherboards you can see the capacitors
right next to the PCI Express x16 slot. If the number of caps seems small,
suspect a subpopulated set of lanes.

(Both ends use caps. The others are on the video card.)
http://web.archive.org/web/20070322...om/2004/11/22/sli_is_coming/pcie-slot-big.gif

It is still possible for lanes to be populated dynamically. The SLI boards
with a paddle card, have x16/x1 in one position of the paddle card, and
x8/x8 in the other. You still see enough caps for 16 lanes on the main
connector, but depending on the orientation of the paddle card, there
might only be signal on 8 of them. Generally, users are aware, when a
paddle card is present, that in the SLI situation, the slots are only
operating at half bandwidth due to that.

Another way to cross-check what is possible, is to look up the chipset
summary on the chipset manufacturer's web site. For example. Nvidia
has summary pages for various generations of chips, and if you saw
say 16,1,1,1,1 as an entry for a chipset, and yet the motherboard
had two x16 PCI Express slots, you'd know that the thing would have
to be wired x8/x8 to work.

There was at least one recent motherboard, where they actually used
a lane switching chip to do some creative cheating. What they did, is
put three PCI Express x1 slots on the board, and another slot which
offered x4 bandwidth. When you plug a card into the x4 slot, the
lane switch is used to disconnect all three PCI Express x1 slots.
The lanes are stolen, combined with one other lane, to make the
x4 lanes needed to drive the other slot. I haven't seen this
approach used on other motherboards - yet. A strange solution,
and for at least one user, not the one he was expecting.

Paul
 
Greg said:

If I had a PCI Express spec, it might be possible to tell from the
state diagrams in there, whether the lanes function independently,
or work as a group. My comment about training is based on other
high speed serial interfaces I've worked with.

The idea that a video slot would have an unbalanced set of TX and
RX doesn't make much sense. The chipset comes with them in
equal numbers. The coupling caps are on the direction from chipset
to video slot, which the NI article claims is the direction they
would not shrink down. In the opposite direction, the video card
would have the caps (so no reason for fewer lanes on the motherboard,
as no component cost would be saved by the motherboard maker).

The only resource this would save, is routing resources. Making an
asymmetric TX/RX would mean less diff pairs to route. But the board
is full of stuff like that, so I fail to see a really big saving.

The premise is a bit skewed as well, when you consider the low
end video cards with Hypermemory and Turbocache. Those two methods
are used on low end video cards, to allow some texture memory to
be located in system memory, rather than video card memory. During
an operation that required movement of textures into or out of
the video card, information could be travelling in both directions,
according to whatever they use for a cache management algorithm.
Now, being low end cards, I don't know if you could tell whether the
slowness of the card was caused by the concept they're based on,
or on an imbalanced TX/RX lane configuration.

I really haven't seen mention of any anomalies in postings here,
to suggest a problem like this. There was one case of an SLI
board (might have been A8N-SLI family), where an Areca RAID
card placed in the second x16 slot, didn't work. A BIOS upgrade
fixed that, and then the card worked as far as I know. Not
too many people try stuff like that, so perhaps there just
aren't enough reports to uncover something like this.

I don't recollect any benchmark articles where there was a
strange difference between read and write to RAID arrays.

In terms of practices, HP is the last company I would suspect
of doing something like this. I can see a company like Asrock
perhaps, because they do all kinds of crazy stunts like that.
("AGI slot"). When Asus pulls a stunt, usually the user manual
contains an admission of guilt (like when a x4 slot only has
x2 lane wiring - another good reason to read the manual before
buying). Some of the Asus Intel boards, that have a couple x16
slots, will have x16 for one, and x4 wiring for the other,
but we can kind of figure that out from the known lane limits
of the chipsets being used (few chipsets have enough lanes to
do x16/x16, so no surprise there - the X38 might be the first
to do it). Some of the high end Nvidia SLI boards rely on both
the Northbridge and the Southbridge, having a video card interface
each. So that is how they manage it.

To answer your original question, I don't see an easy way
to detect this situation. If you had a high resolution picture
of the motherboard (and those are hard to find), you could
look for differential pairs on the B side of the slot, and
see if they look deficient in number. (The A side has all the
cap pairs, for the signals coming from the Northbridge towards
the slot.) The B side diff pairs would be returning to the Northbridge,
to give some idea of the direction they would be moving in.

I'll keep an eye out for this, but right now, I suspect at
least some of the entries in that table are caused by BIOS
issues. And that is based on the one Asus board I've heard
of, that wouldn't work with an Areca RAID card, until the
BIOS was updated.

OK, have a look at the Areca compatibility list. This needs
to be updated, but it has one hopeful entry.

http://www.areca.us//support/download/RaidCards/Documents/Hardware/MBCompatibilityList_011606.zip

"HP XW6200*1 INTEL7525 Tested by customer PCI-E X8 Intel Xeon

*1. Update latest BIOS xw9300 to (1.29),xw8200 to (?) and
xw6300 to(?) can solve the problem . Update controller
firmware to 1.39 also solve it."

The XW6200 was in the NI table as only doing x1.

Pages 4 and 5 of that document, cover desktop boards with
SLI slot configs. Even the A8N-SLI family got honorable mention.
On the Asus site, the A8N-SLI Deluxe 1013 2005/08/10 BIOS lists:

"Fixed system cannot detect ARC 12xx Serial ATA RAID Host adapter."

And if you do have a problem with whatever you purchase,
don't forget to post about it :-) So other people get
properly warned.

Paul
 
The idea that a video slot would have an unbalanced set of TX and
RX doesn't make much sense.

The only resource this would save, is routing resources. Making an
asymmetric TX/RX would mean less diff pairs to route. But the board
is full of stuff like that, so I fail to see a really big saving.
I did more Web searching yesterday, and the ONLY reference to asymmetry that
I found was the one that I already mentioned in my second post. I'm not
certain whether that just came from observations of some early
implementations, or whether it was just plain bogus, but the "problem"
appears to be a "non-problem" from everything I've been able to find
recently. I have seen some more recent tests of PCIe implementations where
x16 was only able to negotiate x4 consistently, but that's not the same
thing. Apparently all of the manufacturers are actually creating physical
x16 (bidirectional) implementations; whether those actually achieve x16
throughput is another matter.

My thanks for your comments and insights.

Greg
 
Back
Top