[Q]Hyperthreading...still don't get it...

  • Thread starter Thread starter redbrick
  • Start date Start date
R

redbrick

Okay, I've read up on what I think is all the marketing hype and all the
half baked article on the net and still they don't explain exactly how
hyperthreading works. ...yeah, yeah, I know the apps's instructions can
be broken up into several small tasks for each processor...and that's
where they leave it for the reader to digest...

My question is...and please forgive my simplistic POV....I've purchased
one physical processor...but wouldn't splitting the processor into
two effactively slow the processor down...by say 50% if each of them were
to be completely filled with tasks to complete? I'm thinking of 'time
share.' Unless I actually purchased one CPU with
two mini CPU's packaged into one...is this the concept of hyperthreading???
Is this why we need the ATX12v???

I'm laughing as I type this cause it's rather embarrasing<sp>, but I'm asking
simply because I'm not familiar with the phyiscal nature of hyperthreading...
I know what it can do...just not how it's phyiscally done and how that
benefits me.

I'm curious cause I just purchased a P4-3ghz and running it on WinXP-SP2
and I've been playing with applications with the taskmanager running just
to get a better idea on how the beast works...but still can't understand...
what is this 'logical' processor????

Please enlighten me...Thanks in advance..

Redbrick...who Loves his CLK
 
redbrick said:
Okay, I've read up on what I think is all the marketing hype and all the
half baked article on the net and still they don't explain exactly how
hyperthreading works. ...yeah, yeah, I know the apps's instructions can
be broken up into several small tasks for each processor...and that's
where they leave it for the reader to digest...

My question is...and please forgive my simplistic POV....I've purchased
one physical processor...but wouldn't splitting the processor into
two effactively slow the processor down...by say 50% if each of them were
to be completely filled with tasks to complete? I'm thinking of 'time
share.' Unless I actually purchased one CPU with
two mini CPU's packaged into one...is this the concept of hyperthreading???
Is this why we need the ATX12v???

I'm laughing as I type this cause it's rather embarrasing<sp>, but I'm asking
simply because I'm not familiar with the phyiscal nature of hyperthreading...
I know what it can do...just not how it's phyiscally done and how that
benefits me.

I'm curious cause I just purchased a P4-3ghz and running it on WinXP-SP2
and I've been playing with applications with the taskmanager running just
to get a better idea on how the beast works...but still can't understand...
what is this 'logical' processor????

Please enlighten me...Thanks in advance..

Redbrick...who Loves his CLK

I don't claim to understand it either, but can see that HT hype has been
exploited somewhat by marketing people. First of all, multithreading is
nothing special, most software packages exploit multithreading in one form
or another, or the screen would freeze up every time you start a process.
Nevertheless, marketing like to suggest that HT suddenly allows your computer
to multitask. The processor without HT is perfectly capable of switching in
and out between processes many times a second, but there is only one worker
employed inside the CPU. With HT there is still only one worker employed!
But this worker might sometimes have to wait for parts or tools and his time
is be better used by fitting other jobs in between.
 
Logically, processor constututes of the state registers and handles one
execution flow. Instructions fetched at the instruction pointer are decoded
and fed to the various execution units. Often, those execution units have to
idle, because the instruction stream either doesn't need them, or because of
memory latency, or because of instruction interdependencies.

HT processor can handle two execution flows at the same time, from two
independent instruction pointers, using the same set of execution units. It
has two sets of state registers, this is why it's logically two processors.
 
from the said:
Okay, I've read up on what I think is all the marketing hype and all the
half baked article on the net and still they don't explain exactly how
hyperthreading works. ...yeah, yeah, I know the apps's instructions can
be broken up into several small tasks for each processor...and that's
where they leave it for the reader to digest...

Ok, the real simple view .. a modern processor has lots of bits which
can be used pretty much independently (units which add numbers together,
multipliers, address calculators, floating point units, busses, etc.
etc.). Most programs won't be using all the bits all the time. The
theory with HT is that you can load some instructions from program B
which will use the hardware units that Program A was going to leave idle
right now.

Whether it works In practise it depends on the two programs .. there are
still plenty of items which the CPU only has one of, where contention
can snarf things up. Branch prediction failure (where program A suddenly
goes off in a direction that the CPU was not expecting) can ruin your
whole microsecond (it is a pain anyway, but with a =long= queue of
things in the pipeline .. some from program A and some from program B,
the program A ones suddenly need to be flushed .. it goes down
particularly badly)

The next step (AMD already announced it, and showed some product) =is=
two whole CPUs on one chip, which really will go nearly 2x as fast as
one (although they'll still be fighting over external memory access and
the like).

In the meantime HT is (In My Estimation) about 90% marketing hype and
10% useful performance gain.
 
redbrick said:
My question is...and please forgive my simplistic POV....I've
purchased
one physical processor...but wouldn't splitting the processor into
two effactively slow the processor down...by say 50% if each of them
were
to be completely filled with tasks to complete? I'm thinking of 'time
share.' Unless I actually purchased one CPU with
two mini CPU's packaged into one...is this the concept of
hyperthreading??? Is this why we need the ATX12v???

Look at it this way, the Pentium 4 is hardly ever totally busy, when it's
running one instruction stream. There's a lot of idle points and empty holes
inside its instruction pipeline, due to stalls. A single stream of
instructions aren't able to keep the P4 completely busy and productive all
of the time. So along comes Hyperthreading, and now you have two streams of
instructions, and hopefully if one stream can't keep it busy, then the other
stream will. It's simply using the inefficiency of the P4 architecture and
filling its idleness in with more work.

This is a situation mostly specific to the P4's unique design. Other x86
architectures, such as AMD's or even Intel's own Pentium M family are much
more productive when running one instruction stream, there is hardly any
holes or idle points in their pipelines, so adding Hyperthreading to them
would not gain them anything, and perhaps would make them slower.

Yousuf Khan
 
My question is...and please forgive my simplistic POV....I've purchased
If there is just one thread being executed this means there is only one
"stream" of instructions being executed. CPU has functional units for doing
different work, like multiplying adding shifting computing addreses et
cetera et cetera. Single stream of instructions cannot possibly give enough
work for all the parts of the CPU to do, so parts of the CPU sit on their
ass doing nothing.

A modern OS like Linux and Windows support threading, each task runs on
their own thread (application can also split it's work into multiple
threads, this is what being "multi-threaded" means for applications). In SMP
(Symmetric Multi-Processor) machines there are more than single CPU, the OS
splits the work (threads) as evenly as it can between the two processors.
This way more work can be done. Enter Hyper Threading, single CPU looks like
two processors to the OS, so more than one stream of instructions (threads)
can be processed in parallel. Now there is double the number of work going
in into the CPU, it is likely that larger % of the CPU is being used for
doing real computational work.

This means larger % of your CPU is doing work. This in turn (theoretically)
means that the total work will complete in less time. This means your CPU is
"faster", not slower. You are correct that the work is split 50/50 (in ideal
case) but look it this way: the work is just split into smaller atoms and
larger number of atoms go through the CPU in given time period. This should
translate into more performance out of your investment. Should and will are
not synonymous in this context, ofcourse.


Logical Processor is a functional unit inside the CPU that looks like unique
processor to the OS. Again you are looking at this from wrong direction: One
Real CPU is split into number of Logical Processors, in case of P4 HT the
number of logical processors "inside it" is two.

nothing special, most software packages exploit multithreading in one form
or another, or the screen would freeze up every time you start a process.

Multithreading is a good practise in User Interfaces when you must handle
window messages and there is a long delay between handling message queue.
Example, we call a function which does encode 1 hour avi file. If completing
the function takes 20 minutes, it means for 20 minutes we don't handle
window messages and you get "application not responding" (sp?) message. Not
very nice is it? Typical solution is to put the encoding into it's own
thread and let it run in the background while the user interface still
handles messages normally (except the buttons might be disabled, whatever,
depends really). In this case the work thread was not made for performance
reasons but for the application to "behave" nicely and not appear dead to
the OS.
Nevertheless, marketing like to suggest that HT suddenly allows your computer
to multitask. The processor without HT is perfectly capable of switching in
and out between processes many times a second, but there is only one worker
employed inside the CPU. With HT there is still only one worker employed!
But this worker might sometimes have to wait for parts or tools and his time
is be better used by fitting other jobs in between.

The task switching overhead is neglible in practise, if we lose even
1,000,000 clock cycles per second for that activity that is still only
1/3000th of total computing power lost in your case and not really worth
pursuing. Would you notice the difference if you had 3,06 or 3,07 Ghz
machine? I think you wouldn't.. what we want second processor for, logical
or otherwise is that more actual work can be processed.
 
Yousuf said:
Look at it this way, the Pentium 4 is hardly ever totally busy, when it's
running one instruction stream. There's a lot of idle points and empty holes
inside its instruction pipeline, due to stalls. A single stream of
instructions aren't able to keep the P4 completely busy and productive all
of the time. So along comes Hyperthreading, and now you have two streams of
instructions, and hopefully if one stream can't keep it busy, then the other
stream will. It's simply using the inefficiency of the P4 architecture and
filling its idleness in with more work.

Interesting way of putting it. In theory it should be great, but in reality
performance
is sometimes decreased with hyperthreading, or even worse there may be instances
of instability. Do a Google search on "disable hyperthreading" .
Enabling hyperthreading adds extra overhead to the system. In some cases
more than the extra overhead is achieved and hyperthreading boosts performance,
while in other cases it doesn't and decreased performance is the result. Even
worse though are the many instances when people have issues when hyperthreading
is enabled, and they feel a need to disable hyperthreading.
Do that Google search. Go to some of the links and read what they say.
If hyperthreading is so great, why do so many people want to disable it?
Read this article. http://www.pcworld.com/news/article/0,aid,107492,00.asp
It is old, however it makes a number of interesting points.
 
assaarpa said:
[...]
Logical Processor is a functional unit inside the CPU that looks like unique
processor to the OS. Again you are looking at this from wrong direction: One
Real CPU is split into number of Logical Processors, in case of P4 HT the
number of logical processors "inside it" is two.

IMO, a logical processor is a software concept, hence 'logical' and not
necessarily 'real'. The software may be written to assume several independent
processors if they were available. If they're not, then a single processor
will split its time between the different jobs - processes or tasks or threads
or whatever. Similar to a person who has two jobs, but it's still one and the
same person.
 
JK said:
Interesting way of putting it. In theory it should be great, but in
reality performance
is sometimes decreased with hyperthreading, or even worse there may
be instances of instability. Do a Google search on "disable
hyperthreading" .

Yes, we all know about your personal crusade against all things
Hyperthreading.
Enabling hyperthreading adds extra overhead to the system. In some
cases
more than the extra overhead is achieved and hyperthreading boosts
performance, while in other cases it doesn't and decreased
performance is the result. Even worse though are the many instances
when people have issues when hyperthreading is enabled, and they feel
a need to disable hyperthreading.

Basically, all instruction streams will achieve their own characteristic
level of efficiency on the P4 architecture. Some programs might achieve 10%
efficiency, some might do 40%, while others might have done 60% on their
own, etc. You put together two streams with 30% efficiency each, and you'll
probably achieve 60% on the P4. You put together two streams with 40%
efficiency, and together they might achieve 80% efficiency on the P4. You
put together two streams with 50% efficiency, and then you might achieve
100% efficiency on the P4. But if you put together two streams with 60%
efficiency, then you're never going to get 120% efficiency, so the overall
effect is a slowdown on the P4. The idea is to put together streams that
won't go over 100% efficiency on the P4, if you want to avoid slowdowns. So
far it looks like most instruction streams are safely under 100% efficiency
when put together, with a few oddball types of instructions that achieve too
much efficiency by themselves under P4.

The interesting thing is that the Hyperthreading specs allow for upto 256
virtual processors running on a single processor. Now with the Prescott core
which has 50% more pipeline stages than the older Northwood core, there
might be more pipeline stalls available with the new core. All of the
efficiencies of the various program streams under Northwood core would be
totally different in the Prescott. So conceivably, Intel could've increased
the number of virtual processors in later revisions, to take advantage of
the additional pipeline slots. It's too bad that the Prescott is the last P4
core that Intel is going to launch.

Yousuf Khan
 
I'd like to thank you guys who responded to my question. I now have a better
idea of what Hyperthreading is ...I didn't realize the CPU had all this time
on it's hands doing nothing. I think I'll start dreaming up something to
keep at least one othem fully employed...cause I did spend a good chuck
of change on the two or more of them (the registers)or one which ever.

So now it makes sense...I'm running FS2004 and for giggles, I thought I'd
fire up the in flight entertainment...none but WinTV :-) ...yes,I've got
sometime this labor day holiday to play.. I notice one of the processors
is complete at 100% which I assume is running FS2004 and the other is just
bouncing at 20-30%...so I thought I'd do some benchmarks on the side...and
sure enough...the other went up 100%...really neat...

Thanks guys...

now my next question...dual channel memory...to follow...

Redbrick...who Loves his CLK
 
Okay, I've read up on what I think is all the marketing hype and all the
half baked article on the net and still they don't explain exactly how
hyperthreading works. ...yeah, yeah, I know the apps's instructions can
be broken up into several small tasks for each processor...and that's
where they leave it for the reader to digest...

My question is...and please forgive my simplistic POV....I've purchased
one physical processor...but wouldn't splitting the processor into
two effactively slow the processor down...by say 50% if each of them were
to be completely filled with tasks to complete? I'm thinking of 'time
share.' Unless I actually purchased one CPU with
two mini CPU's packaged into one...is this the concept of hyperthreading???
Is this why we need the ATX12v???

I'm laughing as I type this cause it's rather embarrasing<sp>, but I'm asking
simply because I'm not familiar with the phyiscal nature of hyperthreading...
I know what it can do...just not how it's phyiscally done and how that
benefits me.

I'm curious cause I just purchased a P4-3ghz and running it on WinXP-SP2
and I've been playing with applications with the taskmanager running just
to get a better idea on how the beast works...but still can't understand...
what is this 'logical' processor????

Please enlighten me...Thanks in advance..

Have you read this:
http://www.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.pdf

It may take a bit of re-reading and searching/looking up references to
other materials but IMO it's worth the effort. This doc describes how the
CPU resources are partitioned and shared to allow two threads or tasks to
share the CPU and make it appear like two logically separate processors.

There are a couple of things you need to know about how task and thread
switching is done by an OS and the mechanisms which must be invoked in the
CPU to achieve them. E.g. a task switch involves a full context switch and
without HT requires certain housekeeping operations to be performed by the
OS such as flushing the TLB (Translation Lookaside Buffer) entries
associated with the task being switched out.

The TLB is important in the efficiency of virtual-to-real page address
translation and flushing a task's portion of it means considerable overhead
in switching back to that task again: all adresses then have to go go
through the page table translation mechanisms to refresh the TLB. IMO the
tagged (by task/logical processor) TLB entries of HT is the important part
of HT to understand as far as seeing how response can be improved on task
switches.

Of course there *are* pros and cons to partitioning and sharing caches
between tasks: you get efficiency from not having to reload data from main
memory; OTOH if sharing reduces the effective cache size that either task
really needs to perform efficiently, you lose. That's why some task mixes
perform better and others perform worse with HT turned on. IOW you can
cook up benchmarks which make HT look wonderful and other people can do the
opposite and make it look like a waste of time.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
Back
Top