Hyperthreading and application response time

Robert Myers · Sep 10, 2004

Greetings,

Intel has put out a blurb about the effects of hyperthreading on
response time when performing multiple tasks:

http://www.intel.com/update/contents/dt09042.htm

Since the scenarios are described, if not in tremendous detail, one
could imagine trying to duplicate (or question) the results by
performing measurements of one's own. With 50-65% claimed improvement,
one imagines the effect would be noticeable, and they are somewhat
larger than the throughput improvement one might optimistically hope for
from hyperthreading (0-40%), but not so much larger that any but the
most fanatic wouldn't feel that there are probably more interesting
benchmarks to try to look into in detail.

The hyperthreading vs. no hyperthreading is a much cleaner question, of
course, than the AMD vs. Intel comparison that was recently discussed at
such length in these forums.

A more decisive study (if not to many engineers) would be to compare
user perceptions of system responsiveness with and without
hyperthreading (and against AMD) in a double blind study.

RM

Carlo Razzeto · Sep 11, 2004

Robert Myers said:
Greetings,

A more decisive study (if not to many engineers) would be to compare user
perceptions of system responsiveness with and without hyperthreading (and
against AMD) in a double blind study.

RM

Interesting idea, I'd love to know the results of such a study!

Carlo

alexi · Sep 15, 2004

Robert Myers said:
Greetings,

Intel has put out a blurb about the effects of hyperthreading on
response time when performing multiple tasks:

http://www.intel.com/update/contents/dt09042.htm

Since the scenarios are described, if not in tremendous detail, one
could imagine trying to duplicate (or question) the results by
performing measurements of one's own. With 50-65% claimed improvement,
one imagines the effect would be noticeable, and they are somewhat
larger than the throughput improvement one might optimistically hope for
from hyperthreading (0-40%), but not so much larger that any but the
most fanatic wouldn't feel that there are probably more interesting
benchmarks to try to look into in detail.

The hyperthreading vs. no hyperthreading is a much cleaner question, of
course, than the AMD vs. Intel comparison that was recently discussed at
such length in these forums.

A more decisive study (if not to many engineers) would be to compare
user perceptions of system responsiveness with and without
hyperthreading (and against AMD) in a double blind study.

RM

While the "study" of Principled Technologies, Inc. is apparent
baloney, there is an effect of hyperthreading processor on
"system responsiveness", I can attest to this, being duly sworn
and in sober mind.

The problem is not in some mysterious "hyper" features but in
a combination of Microsoft operating system, lack of interrupts
in PC (which forces the idiocy of "cooperative" interrupt sharing),
and some sloppy Windows APIs people have to use to create
interface to their applications. I can attest that if you are
running Xilinx synthesis tools, their place and routing tools,
or many other EDA tools, a single-processor system (Windows)
response is like calling your credit card account during rush hours,
it completely sucks. Once you migrate to a dual-processor system,
or a hyperthreaded processor, things are back to normal.

The result of "response speedup" is solely dependent on how sloppy
your main application interface is written. For example, in very
popular Microsoft language "Visual Basic", there is a call "DoEvents()".
If you forget to insert this call within some heavy data processing
loop, forget "interactivity" at all if you are running it on a
uniprocessor Windows.

IMHO of course,

- aap

Carlo Razzeto · Sep 15, 2004

alexi said:
The result of "response speedup" is solely dependent on how sloppy
your main application interface is written. For example, in very
popular Microsoft language "Visual Basic", there is a call "DoEvents()".
If you forget to insert this call within some heavy data processing
loop, forget "interactivity" at all if you are running it on a
uniprocessor Windows.

IMHO of course,

- aap

Not being familiar with VB I'm assuming from the description this function
is akin to a "realease()" type function you would find on a coopertive
multi-tasking system? If so wouldn't the pre-emptive multi-tasking nature of
moder windows help to mitigate this? Or does the VB run time a high enough
priority that it would kill the system?

Carlo

Bill Davidsen · Sep 15, 2004

alexi said:
While the "study" of Principled Technologies, Inc. is apparent
baloney, there is an effect of hyperthreading processor on
"system responsiveness", I can attest to this, being duly sworn
and in sober mind.

The problem is not in some mysterious "hyper" features but in
a combination of Microsoft operating system, lack of interrupts
in PC (which forces the idiocy of "cooperative" interrupt sharing),
and some sloppy Windows APIs people have to use to create
interface to their applications. I can attest that if you are
running Xilinx synthesis tools, their place and routing tools,
or many other EDA tools, a single-processor system (Windows)
response is like calling your credit card account during rush hours,
it completely sucks. Once you migrate to a dual-processor system,
or a hyperthreaded processor, things are back to normal.

The result of "response speedup" is solely dependent on how sloppy
your main application interface is written. For example, in very
popular Microsoft language "Visual Basic", there is a call "DoEvents()".
If you forget to insert this call within some heavy data processing
loop, forget "interactivity" at all if you are running it on a
uniprocessor Windows.

No, just by having two CPUs to run the user and kernel threads at the
same time you will get some gain, no matter how well written your
application. Every interrupt need not result in register saves, etc.

Note that true SMP will be somewhat better than HT, due to cache for
each CPU. But in some cases the system call will be faster because the
arguments are in L1 cache and no memory access is done. Predicting the
effects for multiple execution units is frequently a case of "it depends."

George Macdonald · Sep 15, 2004

Greetings,

Intel has put out a blurb about the effects of hyperthreading on
response time when performing multiple tasks:

http://www.intel.com/update/contents/dt09042.htm

Since the scenarios are described, if not in tremendous detail, one
could imagine trying to duplicate (or question) the results by
performing measurements of one's own. With 50-65% claimed improvement,
one imagines the effect would be noticeable, and they are somewhat
larger than the throughput improvement one might optimistically hope for
from hyperthreading (0-40%), but not so much larger that any but the
most fanatic wouldn't feel that there are probably more interesting
benchmarks to try to look into in detail.

The hyperthreading vs. no hyperthreading is a much cleaner question, of
course, than the AMD vs. Intel comparison that was recently discussed at
such length in these forums.

A more decisive study (if not to many engineers) would be to compare
user perceptions of system responsiveness with and without
hyperthreading (and against AMD) in a double blind study.

Hmm, well alexi's response intrigued me and maybe somebody else already
noticed this but it seems that www.principledtechnologies.com, which is
passworded access only, is registered to a law firm, www.wcsr.com, called
Womble, Carlyle, Sandridge & Rice. What the hell is Intel up to here? Is
Randall Kennedy involved here... again? Are lawyers a required entity for
the publishing of benchmarks now?

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

alexi · Sep 16, 2004

Bill Davidsen said:
No, just by having two CPUs to run the user and kernel threads at the
same time you will get some gain, no matter how well written your
application. Every interrupt need not result in register saves, etc.

What do you mean "no"? :-)

I am talking here not about "some" gain in
application performance (in the customary sense as faster Task Completion
Time). Actually, for many "classic" applications it is rather "loss", not
gain, when you migrate from a uniprocessor to a dual-processor in Windows
environment. The reasons are a) different, more complex 2P kernel and HAL,
b) mandatory APIC engagement (I think), c) the scheduler will joggle
the task between two processors causing more severe cache thrashing,
d) what else?...

The topic is about _system response_ times. As I said, go download a
Xilinx free Webpack development tool, launch their "integrated environment",
then start some FPGA project, compile, place, and route some FPGA
design, and during the process, try to open any other Windows folder
on a uniprocessor PC.
You will see what I mean - opening a folder may require several seconds
if not _minutes_ in some cases. As I understand, their whole Integrated
Environment is written as some script-level launcher of original
console-based applications, and uses solely Windows API to communicate
parameters and provide interactive controls. I've seen similar effect
on some older Gerber viewers, and on some modelling packages.

What I am trying to say is that you can select a poorly cludgy-written
application and get a _tremendous_ gain in system response time in Windows
when moving to a dual-processor or hyperthreaded system. That's it.
I do not recall any problems of this kind in Unix-based operating systems,
although my experience there is much smaller.

Note that true SMP will be somewhat better than HT, due to cache for
each CPU.

Not necessarily true. Is I already mentioned, if the application is
single-threaded and has no idea how to claim processor affinity, Windows
will alternate processors, which likely will cause a loss in performance
as compared to a uniprocessor system with all other things equal.

... But in some cases the system call will be faster because the
arguments are in L1 cache and no memory access is done. Predicting the
effects for multiple execution units is frequently a case of "it depends."

No argument here.

- aap

alexi · Sep 16, 2004

George Macdonald said:
Hmm, well alexi's response intrigued me and maybe somebody else already
noticed this but it seems that www.principledtechnologies.com, which is
passworded access only, is registered to a law firm, www.wcsr.com, called
Womble, Carlyle, Sandridge & Rice. What the hell is Intel up to here? Is
Randall Kennedy involved here... again? Are lawyers a required entity for
the publishing of benchmarks now?

Interesting observation indeed. Upon following your link, I noticed that
one of the firm specialty is "Product Liability Litigation",

http://www.wcsr.com/FSL5CS/practiceareadescriptions/practiceareadescriptions
280.asp

Maybe this is the key? All bases covered? :-)

- aap

Bill Davidsen · Sep 16, 2004

alexi said:
What do you mean "no"? I am talking here not about "some" gain in
application performance (in the customary sense as faster Task Completion
Time). Actually, for many "classic" applications it is rather "loss", not
gain, when you migrate from a uniprocessor to a dual-processor in Windows
environment. The reasons are a) different, more complex 2P kernel and HAL,
b) mandatory APIC engagement (I think), c) the scheduler will joggle
the task between two processors causing more severe cache thrashing,
d) what else?...

Unless MS is lying (would they do THAT?) there should be affinity in
recent releases. And the point I was making is that even well-written
applications should see a benefit.

The topic is about _system response_ times. As I said, go download a
Xilinx free Webpack development tool, launch their "integrated environment",
then start some FPGA project, compile, place, and route some FPGA
design, and during the process, try to open any other Windows folder
on a uniprocessor PC.
You will see what I mean - opening a folder may require several seconds
if not _minutes_ in some cases. As I understand, their whole Integrated
Environment is written as some script-level launcher of original
console-based applications, and uses solely Windows API to communicate
parameters and provide interactive controls. I've seen similar effect
on some older Gerber viewers, and on some modelling packages.

What I am trying to say is that you can select a poorly cludgy-written
application and get a _tremendous_ gain in system response time in Windows
when moving to a dual-processor or hyperthreaded system. That's it.

Sure, the more room for improvement, the more improvement.

I do not recall any problems of this kind in Unix-based operating systems,
although my experience there is much smaller.

The old OpenServer did not have affinity, Linux (recent) does, and even
knows about SMT (aka HT) and does sane things in scheduling. Beyond that
I haven't enough experience to say for sure.

Not necessarily true. Is I already mentioned, if the application is
single-threaded and has no idea how to claim processor affinity, Windows
will alternate processors, which likely will cause a loss in performance
as compared to a uniprocessor system with all other things equal.

Windows of old did very poorly with SMP, it has now improved to mediocre.

No argument here.

If you have the right two threads in a HT CPU it runs amazingly fast.

alexi · Sep 16, 2004

[snip]

Unless MS is lying (would they do THAT?) there should be affinity in
recent releases.

The affinity might be there and likely is, but there is no
corresponding system call in applications compiled before
"recent releases." Which probably includes pretty-much every
off-shelf application :-(

....... And the point I was making is that even well-written
applications should see a benefit.

And my counter-point was that even if an application is well-written
(well-written by uniprocessor standards of course) and should see
benefits, but in reality it doesn't, for reasons I stated in
my previous post. Couple years back I tried the single-P-compiled
SPEC_CPU benchmark suite on a dual-P system in hope that any
system and other bookkeeping OS activity would be served by the
second processor and not thrash my main thread. I was wrong
- there was no measurable improvement in scores. Only when the
KAI parallelizing pre-processor was employed I was able to see
some shift in performance. Unfortunately, the shift was in
both directions in different individual benchmarks, and the net
was only slightly positive. As you might already know, the
KAI is now an integral part of Intel compiler technology, and
I am much sure there were vast improvements over the years.

-aap

Bill Davidsen · Sep 16, 2004

alexi said:
[snip]

Unless MS is lying (would they do THAT?) there should be affinity in
recent releases.

Click to expand...

The affinity might be there and likely is, but there is no
corresponding system call in applications compiled before
"recent releases." Which probably includes pretty-much every
off-shelf application :-(

I didn't realize it had to be set, Linux tracks it and uses the same CPU
where practical. You can set it by hand, but don't need to in most
cases. Linux (recent) knows enough to handle SMT and SMP in any mix as well.

Hyperthreading and application response time

Robert Myers

Carlo Razzeto

alexi

Carlo Razzeto

Bill Davidsen

George Macdonald

alexi

alexi

Bill Davidsen

alexi

Bill Davidsen