Robert said:
On Aug 19, 4:45 am, Sebastian Kaliszewski
The names of the people who have actually been responsible for the
entire design (or enough of it to speak with authority) of a modern,
full-featured processor could probably be written on one sheet of a
yellow legal pad. The rest of us are observers, no matter what
classes you have taken at university.
Dictated is such a strong word that I suspect this statement to be
false. From the discussions that take place from those who know much,
much more than I do or ever could, I conclude that details of memory
ordering are somewhat arbitrary and aren't even fully specified by the
ISA. As you would say, it doesn't matter until it does, and, when it
does, as in concurrency, I don't think anyone completely understands
what is guaranteed to work and what isn't.
Stuff like memory ordering, retirement ordering is part of the ISA
wether it is somewhere explicitly specified or just impluied by existing
implementations.
Then there is stuff like availability of certain ISA features like
predication/conditional operations, fused operations (like FMAC), etc...
Then there are statistical properties like localised vs non localised
memory accesses, localised memory vs architectureal registers accessed,
etc. For example x86 requires more memory access resources than other
architectures to get equvalet perofrmance on average.
[...]
I can't even imagine how an ISA would dictate retirement logic. What
a computer must do, no matter if it's System/360 or x86, is to make it
appear that instructions have been executed in order.
But what "in order" means? It can have and does have varius meainng --
there are in fact at least 3 possible ones:
1 in order issue + in order retirement (eg. x86)
2 in order issue + out of order retirement (eg. Alpha)
3 out of order issue + in order retirement (strange but theoretically
possible)
For example Alpha spec explicitly allowed that instructions retire in
different order they were issued while it reqiered that they're issued
in order. For exaple 21164 being an in order processor used the feature.
Now add to that memory ordering rules (which migh be still different
from instruction ordering) -- for example it has bit Oracle when their
DB software started mibehaving when transitioned from 21064 or 21164 to
21264 machines (which actually exploited weak memory ordering of the ISA)
[...]
Neither Intel nor IBM, implementing different ISA's, can dictate all
details, regardless of the ISA. Important c code has to work across
across different processor lines.
That part belongs to compilers.
Then, for the dicey stuff, you are
into whether the behavior of c is well-defined, and the answer appears
to be that it isn't.
Of course it isn't. C standard even explicitly dictates when the
behaviour is outringht undefined or wether it's just implementation
specific.
[...]
The disconnect between what the processor appears to be on the outside
and what it is actually doing is so profound that I don't even know
how to discuss this kind of thing.
Well, you missed many important factors to begin with...
An architect has to make sure that
the architectural registers in the ISA appear actually to be there.
Behind every architectural register, there are now many, many
invisible registers that are dictated by the internal workings of the
processor and not by the ISA. Once again, even x86 is emulating
itself.
An implementation must be effective enough. And that effectiveness can't
be obtained without adjusting to a particular ISA features. Adjuting
well, very well. Chips which were not adjusted well performed poorly.
Case example: AMD SSA5 (early K5) which, despite being 4-way OoO core
based on Am29000, performed worse clock for clock than 2-way in-order
Pentium, and it clocked worse as well... And Pentium had smaller
cache... As AMD optimised the thing it's performance improved
significantly (first to be eqal clock for clock, then even faster) -- it
still had problems with clocking -- Pentium classic went to 200MHz while
K5 was practically struck at 117MHz (some 133MHz parts while oficlally
released, were virtually unavailable)
ARM is very different from x86. NVidia bought rights to produce core not
designed for x86 emulation. Unless NVidia performs a miracle, expect
*absymal* performance.
Summary: there are plenty enough details to get wrong that
daytripper's advice seems sound to me, but most of them really aren't
dictated by the ISA but by an installed code base.
Daytripper's advice is sound because emulation above microcode level
works poorly, and it even works poorly at microcode level, if the
excution backend is not carefully tuned to a particular ISA (case
example: IA64 and it's 'native' x86 execution).
rgds
\SK