Easy for you to say. If anyone accurately foresaw the importance of
OoO and just exactly _why_ it would be so important _before_ it was
introduced into common usage, I should be very much indebted to anyone
who can direct me to an appropriate link (who knows, maybe such a link
exists). Run-time scheduling may or may not prove in the long run to
play the critical role that it currently does, so I'm not going to
make any emphatic statements that purport to be true for all time.
People have tried every conceivable scheme for scheduling, and right
now, on-die runtime scheduling appears to be the winner.
I don't claim to have forecast OoO, or anything about it. To be
perfectly honest, I don't know exactly when OoO hit the mainstream. I
seem to remember seeing an overview of the PentiumPro architecture that
had what I now think of as OoO structures, but I'm honestly not sure.
But OoO came to maturity while IA64 was in development. You don't have
to have forecast the future to see that as it happens, you have to
review your current plans.
I've heard of an even bigger similar project than IA64 getting stopped
in its tracks when reality smacked it in the nose. Well, one such
bigger project happened before my time, and one slightly smaller during
my tenure. Sometimes reality makes you change your plans.
On a slightly different, but still related vein... A friend once
brought once back one key piece of wisdom from a conference, "Software
is hard, and hardware is easy."
IA64 is saddled with an instruction set that makes OoO very hard, but
not impossible. OoO also makes nonsense of the premise of the IA64
ISA, which is that all the scheduling was to be preprogrammed, using
predicated instructions to make whatever run-time adjustments were
necessary. The scheme works _much_ better than people give it credit
for. The problem is that you only need a cache miss rate of less than
one percent to produce a factor of two slowdown in code execution
given the current mismatch between processor speed and memory latency.
Only the slightest miscalculation can bring you to ruin, and that's
why IA64 needs such a gigantic cache to perform decently.
One further comment about compiler scheduling of instruction flow...
Isn't this one of the lessons of MIPS - that you'd have poor portability
of binaries from one generation to the next. We've really only seen
two generations of IA64, Merced and McKinley. All else has been shrinks
and cache enhancements. They've had to flog the compilers SOOOO hard to
get the levels of performance so far attained. What happens when the
next real IA64 architecture rev comes along? I truly doubt it will get
its best performance from even the best McKinley compiler, and what
will happen when code from the McKinley+1 compiler is used on McKinley?
Plus, how long will it take to reach the good McKinley+1 compiler? Will
it bring back the days of fat binaries?
Intel designed IA64 so that it would be very hard to clone. That, and
making sure that it could not be construed as subject to any of their
cross-licensing agreements, not performance, was their primary design
goal. As it stands _at_the_moment_, Intel seems to have succeeded
beyond its wildest expectations in those respects. It also happens to
have produced a world-beating processor for certain applications. It
can't be cloned, it isn't subject to cross-licensing agreements, and
it can be virtualized.
Here you hit the nail on the head.
You have to ask what problems Intel was trying to solve with IA64.
They're obviously *always* after performance. But in this case, IMHO
they were after clone-relief, too.
I may not be in the CPU business, but I've spent a lot of years in a
big company - a company once renouned for being self-absorbed. Maybe
I'm only starting to learn about caching issues in ccNUMA, but I've
seen internal corporate politics before, and I think I can recognize
the signs. IA64 reeks of it.
Comparison: Sometimes you get execs who want a magic bullet to solve
their problem. Sometimes you get someone pushing what should be an
academic solution, but they make a convincing claim that they have
the magic bullet. I've seen several projects of this sort started,
and at least two within my tenure go to hardware and conferences, I
even worked on one of them. (The other predates the web.) Agan, I see
similarities in IA64.
What makes you think you're so smart?
Squat. George gives me too much credit. I'm a DRAM designer of many
years, though most of the time, including now, I can't or shouldn't
comment on what I'm really doing. (I don't even feel good about
telling any Rambus stories, and that was YEARS and projects ago.) I
just like to dabble in this stuff on the side.
I have some experience seeing technical proposals whose biggest merit
is that they solve a 'political' problem. My assessment of IA64 is
more based on that than any technical expertise in the field, where
I'd have to defer to many others.
Dale Pontius
--