I wonder if this AMD trickery would be applicable to HPC clusters.
There you typically have all cores going 100% on some parallel crunching.
On a desktop, you tend to get different loads on each core, so you can
do the turbo/overdrive on some and low-power on others.
Low power is still low power regardless of intent once taken a step
lower at micron die processing advancements: It's getting to be
greener and greener everyday a "green world" these days, dunchano...
I've used stepping instruction sets on a desktop environment, dropping
the multiplier via system/load software detection on early (sub-2Ghz)
AMD Athlons. (Then came my first Intel, well second - I'd rather not
say what I had before a NEC V30 - which was an unstoppable
overclockable Celeron - Revision D. One hell'va one-trick pony.)
Extensively, theoretically, that sounds precisely where it should be,
exactly as you say;- obviously, an intent of efficiency through shared
resources at code-level load-to-cores (sic) distributional means,
well, that's more about what it is, so far. Really. Like running
smack into a brick wall. All or large portion of advancements being
touted out are from the chipmakers, Intel and AMD. (Notoriously
difficult to write, so they say, predictive branching routines.)