An SSE unit doesn't issue anything, nor does an FPU. That's part of
Duh! <sheesh!> There is no point in more execution units (of a
type) than the issue width.
That's not entirely true. It might be worth having more execution
resources than peak steady state issue/retire can support so you can
efficiently clear backlogs.
I don't think it is, but there are a lot of designs where fetch !=
issue != execute != retire width. The POWER5 issues upto 8
instructions, but can only execute and retire 5 per cycle. I think the
K7/8 also have some assymetries in the pipelines.
Generally, I agree with you though.
...or perhaps add more architected registers so you're not
thrashing the LD/ST unit with unneeded activity.
You still need to issue loads and stores. Either way, x86 has about 14
GPRs. 4 FPUs would consume 8 operands and product 4, so you're
basically flushing your reg file each cycle. That's assuming you don't
have any instructions that use 3 regs as input.
More ports doesn't mean a microarchitectural change. That's an
implementation detail.
Changing the number of reg ports by one or two is minor. Changing your
L1D cache porting is a pretty major undertaking, especially if you only
had a single port before. Ask Mitch Alsup or someone who does this for
a living.
That wasn't at issue. If there is a justification it'll be done.
You're good at throwing in red herrings, eh?
That's not a red herring if it relates to reality, which it does. AMD
cannot design 3 new architectures. They have said that they have a new
mobile and a new server uarch in the pipeline...combining those
statements results in a particular conclusion.
Now *you* are changing the architecture.
That's right. I'm pointing out that having an FMA is a huge
performance boost. Since x86 doesn't have one, I have to use other
things to show this. Ask anyone who designs chips if FMAs are a good
idea...
It doubles your FLOPs, and if you have the memory to support it, is a
huge boost.
DK