I don't think this is quite accurate. The PPro was not *supposed* to
execute 16bit code, thus the architects didn't see any harm in making
segment register reloads expensive. They *added* the segment register
renaming (cacheing) to ameliorate this problem, in the PII. Anyway, if
you can wake up Felg, he has the real deal. I looked through my email
archives and couldn't find the info.
You're off base in this case.
The PPro was supposed to execute everything, there wasn't a
conscious effort to optimize "32 bit code" and leave out "16 bit". It
was just that the architects didn't realize how important some of
these things such as partial register usage were. The architects
came from a non-x86 background, and they were thinking about high
performance. There were talks about the segment registers during
the architecture phase of the processor, but the ball was dropped,
and there were some miscommunication about how important segment
register rename was, and partial register usage too. I remember
Andy Glew talking quite a bit about the partial register usage,
and he took the blame for that as well IIRC.
The basic idea of the PPro was to make the common case fast, and
the not-so-common case, not-so-fast. Unfortuantely, some of the
cases thought to be not-so-common turned out to be more common
than believed. This whole thing about "16 bit software" is just
a cover all term to mean "legacy software that contained a bunch
of weird hand coded stuff that the architects of P6 didn't think
would be common."