Nick Maclaren said:
|> > |> A very good point. SMT is a fairly simple thing.
|> > |> Orthogonal to other efforts to improve performance.
|> >
|> > Boggle. If it were either, let alone both, it would be
|> > vastly more effective.
|>
|> SMT is simple in that "all" that needs be done is create
|> duplicate state machines (register sets) to create "virtual
|> CPUs". Add some (not too much) fairness to the hardware
|> scheduler and thread through the retirement unit. The main
|> execution pipeline (ROB, ports, exec units) remains unchanged.
That is wrong, completely so.
You DON'T just create duplicate register sets, but have to "dual
port" every execution unit - possible by creating a single set
of double the length, and create some new scheduling to manage it.
You have to move some privileged registers and state from out of
(logically) the execution units to the register sets.
I think that Nick is muddled on this one. If the base implementation is
already OoO then there will normally be many more physical registers than
architected ones. To go two-way SMT may not involve adding any physical
registers, but rather involve changes to renaming. "dual port" every
execution unit doesn't make much sense to me. Access to execution units from
either virtual processor is essentially free - they are after all virtual
processors, not real. What is required is that every bit of *architected*
processor state be renamed or duplicated, prehaps that's what Nick is
getting at?
You have to mangle any performance counters and many privileged
registers fairly horribly, because their meanings and constraints
change. Similarly, you have to add logic for CPU state change
synchronisation, because some changes must affect only the current
thread and some must affect both. And you have to handle the case
of the two threads attempting incompatible operations simultaneously.
What operations are incompatible. SMT as implemented in the Pentium 4, say,
allows either virtual processor to do what it likes. One can transition from
user to kernel and back while the other services interrupts or exceptions or
whatever. The only coordination needed for proper operation is what is
needed for two processors - of course the performance may suffer though.
Oh, of course, none of this affects the main flow of control,
but all forms of real engineering (as distinct from academic
demonstrations and marketing) are as much or more about the problem
cases as the normal ones.
|> "Vastly more effective" is a comparative term. What do you
|> expect? SMT won't match SMP under most circumstances. ...
My suspicion is that it wouldn't match CMP, with the same amount
of real estate, under most circumstances. But that is pure
speculation AS IS THE CLAIM OF THE CONVERSE until and unless
someone does some proper analysis.
yes lots of speculation. The difference here is that to CMP processor take
about twice the silicon of one, while with SMT you have the option to use
1.5 cores worth of silicon. Perhaps once >dual cores is cheap and easy SMT
will die because its more effort than its worth, but my bet is that chips
will go both routes with SMT and CMP. Just one more little problem for the
OS developers to deal with
Peter