Is the philosophy of hyperthreading (I'not informaticien rather
scientific, and then not aware of that) a way to bypass the
performances limit obtained on a CPU by implementing two in a chip,
unabling the use of the whole system power in a single task, except
if coordinating (the works of different CPU) it by a separated thread
in a parallel type like code, as I've done in a deep past on Cray's
hardware (I worked on a XMP in the end of the 80's) ?
Sorry for my ignorance, and many thanks for the answers.
If I understand correctly, Hyperthreading came about as a result of studies
performed by Intel (and others, no doubt) that found that typical code mixes
only utilized about 30% of the CPU core capacity and were largely bound by
memory bandwidth. A hyperthreading CPU still has only 1 set of functional
units (adders, multipliers, etc), but has a double set of registers. With
that extra set of registers (and a second instruction stream) it's possible
to simulate a second CPU in the same chip. The two separate virtual CPUs
compete for the functional units in the core, so having both virtual CPUs
compute-bound frequently results in less throughput than having a single
virtual CPU compute-bound due to internal contention for resources. The big
advantage of hyperthreading in my experience is that a single compute-bound
thread can't completely monopolize the machine - all the other (mostly
blocked) threads can use the second virtual CPU to get work done while the
compute bound thread grinds away. For today's hyperthreaded CPUs, you're
actually getting better throughput at 50% utilization than you would at 100%
in most cases.
Multicore CPUs are the next generation. In a multicore, internal resources
are not shared between cores, so contention is reduced. A multicore design
might have (for example) 4 cores, each with L1 cache, each supporting 2-way
hyperthreading, with a large, shared L2 cache that all core access, yielding
8 virtual CPUs in a single chip.
As multicore/hyperthreading become more common, advances are needed in
inter-thread coordination and communication. The current lock-based
strategies are simply too inefficient and don't scale well - core to run
efficiently on a 100-CPU machine needs to be structured very differently
from code that's suitable for 2 processors.
-cd