Nehalem also has a TLB bug

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
Y

Yousuf Khan

Looks like having a TLB bug is becoming a right of passage for anybody
designing a quad-core processor with an L3 cache and an integrated
memory controller. AMD had it in it's first revision of Barcelona, and
now Intel has it in its first revision of Nehalem.

Yousuf Khan

"We were told that Intel's Nehalem, the CPU that we know as Core i7 has
TLB. TLB, three letters that have destroyed the sales of Phenom and
Opterons based on 65nm K10 cores, stands for Translation Lookaside
Buffer, and Intel officialy states in its Intel Core i7 Processor,
Extreme Edition Series and Intel Core i7 Processor - Specification
Update PDF, that the CPU has a TLB bug.

If you open Intel’s official document that is nicely stored here, on
page 37 AAJ1 Clarification of TRANSLATION LOOKASIDE BUFFERS (TLBS)
Invalidation part, you will see that Intel says that in some rare cases
improper TLB invalidation may result in unpredictable system behavior
and can hang your OS or result with incorrect data. Here is the word to
word quote: "In rare instances, improper TLB invalidation may result in
unpredictable system behavior, such as system hangs or incorrect data.
Developers of operating systems should take this documentation into
account when designing TLB invalidation algorithms. For the processors
affected, Intel has provided a recommended update to system and BIOS
vendors to incorporate into their BIOS to resolve this issue."

We are not sure if you should be concerned, but such a thing completely
destroyed K10’s reputation and we will certainly do a bit more
investigating about it, and ask Intel for a comment. We would like to
thank one of our readers for the tip."
http://www.fudzilla.com/index.php?option=com_content&task=view&id=10707&Itemid=1
 
Looks like having a TLB bug is becoming a right of passage for anybody
designing a quad-core processor with an L3 cache and an integrated
memory controller. AMD had it in it's first revision of Barcelona, and
now Intel has it in its first revision of Nehalem.
The key sentence seems to be:

"Developers of operating systems should take this documentation into
account when designing TLB invalidation algorithms."

Quickly followed by

"For the processors affected, Intel has provided a recommended update
to system and BIOSvendors to incorporate into their BIOS to resolve
this issue."

In the case of the AMD bug, the issue was that the available
workarounds were costly (~10%) in terms of performance, which meant
that you had the unenviable choice between getting the performance
promised by early benchmarks or having a processor that wouldn't hang
unpredictably.

There is no clue in what you have offered, other than the situation
offers an amdroid the opportunity to say, "See, it happens to Intel,
too," as to whether this is a serious matter or merely a concern for
kernel and BIOS developers.

Robert.
 
The key sentence seems to be:

"Developers of operating systems should take this documentation into
account when designing TLB invalidation algorithms."

Why is that key? AMD said the same thing.
Quickly followed by

"For the processors affected, Intel has provided a recommended update
to system and BIOSvendors to incorporate into their BIOS to resolve
this issue."

In the case of the AMD bug, the issue was that the available
workarounds were costly (~10%) in terms of performance, which meant
that you had the unenviable choice between getting the performance
promised by early benchmarks or having a processor that wouldn't hang
unpredictably.

Until Intel provides those updated BIOSes, then we won't know if there
is a performance penalty as well. Besides in the case of AMD, the
Linux community came up with a workaround that had less than a 1%
performance penalty. The BIOS solution was less aesthetic, as it
turned off the L3 cache entirely to avoid the problem. In the Linux
solution, the kernel just checked for conditions that might cause the
problem, and took care of them when necessary without turning the L3
off.
There is no clue in what you have offered, other than the situation
offers an amdroid the opportunity to say, "See, it happens to Intel,
too," as to whether this is a serious matter or merely a concern for
kernel and BIOS developers.

What clue do you need Robert?

Yousuf Khan
 
What clue do you need Robert?
Some actual indication of what the consequences of dealing with this
will be:

1. Nearly invisible to the end user.

2. Enough to cause the same kinds of marketing problems for Intel that
AMD encountered.

At this point, there just isn't enough information to distinguish
between 1 and 2, which is the only distinction that matters.

The fact that Intel has a bug in the same class as encountered by AMD
may in itself be interesting if it is anything but pure coincidence.
If it isn't pure coincidence; that is to say it is something that
Intel should have been especially careful about because of AMD's
experience, then one has to wonder yet again about the competence of
Intel management.

Again, though, there simply isn't enough information to conclude
anything, other than that Intel had a bug in the same class as AMD.
Whether or not that should concern anyone other than BIOS and kernel
developers remains to be seen.

Robert.
 
The fact that Intel has a bug in the same class as encountered by AMD
may in itself be interesting if it is anything but pure coincidence.
If it isn't pure coincidence; that is to say it is something that
Intel should have been especially careful about because of AMD's
experience, then one has to wonder yet again about the competence of
Intel management.

Again, though, there simply isn't enough information to conclude
anything, other than that Intel had a bug in the same class as AMD.
Whether or not that should concern anyone other than BIOS and kernel
developers remains to be seen.

Finally! A situation where potential problems show up just *before*
I buy a unit! I had an X58 motherboard, i7 CPU and triple-channel RAM
all picked out and was about to order when I saw this thread. If my luck
had been running as usual, I would have just installed the hardware *then*
this thread would've appeared. As you say it may not make any difference
at all but I'm glad I can wait and see if that turns out to be the case.

Tom Lake
 
chrisv said:
I'm another happy C2D customer. I've been overclocking my E6400 (from
2.13GHz) to 2.66GHz for almost 2 1/2 years now, with not a hiccup.

I too am a happy Core 2 Dual owner. My E6550 has handled any task I have
wanted it to do and the desire to upgrade anything is NIL.
 
Back
Top