Robert Myers said:
Your key claim (I believe) is that the benchmark software is a
subterfuge by way of giving scheduling attention to the jobs on the
hyperthreaded system but not on the Opteron system. That's an
interesting theory, and it may well be correct, but your analysis
rests on assumptions about the actual benchmark and about scheduling
behavior that I don't know how to check.
To play my own devil's advocate, I'll list what we do know about the
benchmark, and what we are conjecturing. We _know_ that the benchmark is
Hyperthreading aware, we know that it runs one real-world application
thread, and multiple synthetic load-generating threads, and that the
synthetic threads are disposable (i.e. their results are not saved or
measured). What we are _conjecturing_ is that the benchmark is using its
Hyperthreading awareness to create an unfair multitasking priority advantage
for the benchmarked application -- we don't know this for sure; for all we
know, this benchmark doesn't make use of any of its Hyperthreading knowledge
(i.e. complete innocence), to create an unfair testing situation.
The conjecture is based upon the fact that it's easy to detect
Hyperthreading and to optimize for it. Detecting Hyperthreading can be done
completely in user-space, it doesn't require any privileged instructions,
simply a couple of CPUID instructions and you're done. During bootup, Intel
has specified that all physical processors will be enumerated first, and all
virtual processors will be enumerated last. So it's easy to figure out which
processors are real and which ones are virtual. Most OS'es have some kind of
functionality to allow applications to specify which processors they want
their threads to start up on.
Since this was a dual-processor vs. dual-processor shootout, the non-HT CPU
will appear simply as two CPUs, whereas the HT CPU will appear as 4 CPUs.
CPUID will tell you automatically how many are real and how many are virtual
and which ones they are.
One can always, at least in theory, arrange job priorities so that
background jobs interfere minimally with foreground jobs. Without any
constraint on how the background jobs are hog-tied, you could probably
get any result you wanted...if indeed you are fiddling with scheduling
priorities.
Yeah, obviously they didn't want to appear to be fiddling with Windows' own
scheduling priorities that would be too obviously unfair, so they worked
around Windows' scheduling priorities with the HT loophole. Since each
logical processor appears to have its own separate run queue in Windows,
they didn't actually modify any of the run queue priorities, they just
distributed the workloads strategically, putting their most important
threads on less busy logical processors. That way they can claim that all of
the individual run queues were unchanged, which is true, but they have twice
as many run queues to choose from.
In an actual multitasking environment, with real work being done both in the
foreground and background, the applications will get distributed out to the
run queues in a roundrobin-fashion. Therefore even with twice the run
queues, an HT processor will have more or less evenly loaded run queues, no
different than the case on a non-HT processor.
csaresearch.com has a skewed view of things resulting from a desire to
sell advertising? The "Seeing double?" stuff right on the web page
you linked to is probably a better clue than Randall Kennedy's c.v.
Perhaps, it is a better clue. But I thought the fact he himself says he
worked for an Intel marketing department was also a pretty good clue.
Someone is influenced by his "strong recommendations" despite an
apparent conflict of interest? Caveat emptor.
It's hard to say how much people are going to be influenced by this, since
this article barely published any of the benchmarks that they said they ran.
Yousuf Khan