A New Way to Talk about GPUs (ExtremeTech on ATI's R520 and Nvidia G70)

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

http://www.extremetech.com/article2/0,1697,1841940,00.asp

____________________________________
A New Way to Talk about GPUs
By Jason Cross


"I was perusing the forums over at Beyond 3D, marveling at the absolute
guesswork and rumor-mongering going on about ATI's upcoming R520 graphics
chip. If the forum's prognosticators are to be believed, it will run at
around 700MHz, or nowhere close to that. It'll have 32 pipelines, or 24, or
20, or 16. It'll have some sort of unified shader architecture, though
nothing like the GPU in the Xbox 360, or it will have a completely
traditional architecture. It will enable AA together with HDR (a current
sore spot for Nvidia's cards), or that's just totally unfeasible.
The random guesswork isn't really a surprise. ATI has been incredibly quiet
and secretive about its next major GPU architecture: All we really know for
sure is that it will be Shader Model 3.0-compliant, several revisions have
taped out by now, and some version of it was running the impressive Alan
Wake demo at E3 this past May. Some sites post new rumors every week,
usually contradicting the rumors from the week before."



"What really struck me is how the fans of 3D graphics are sticking hard and
fast to a certain way of looking at GPUs. They discuss everything in terms
of "pipelines," with some even going so far as to say that the GeForce 7800
GTX isn't a "true" 24-pipeline chip because it only has 16 raster operation
units (ROPs), and can therefore only really draw 16 pixels per clock, max.
I've spoken with both ATI and Nvidia on the subject, and they both say 16
ROPs is plenty. The truth is, the more-advanced 3D games are so limited by
shader operation speed and texture fetching that the GPU is drawing nowhere
near 16 pixels or samples per clock, they say. I was told by one engineer
that the performance benefit of moving from 16 to 24 ROPs would be less than
5%, but it would come at a considerable cost in transistor count.

In the grand old days of just three or four years ago, even the most
advanced 3D engines basically just layered a few textures on top of each
other with simple blending modes. Every now and then a pixel shader would be
used to make the water look like bumpy Mylar, but beyond that, shaders were
mostly used to perform more of these texture blends at once. It was
appropriate to talk about GPU performance by counting pipelines and how many
pixels or samples could be drawn per second. You had your fill rate, your
clock speed, your memory bandwidth, and that was enough.

The world is changing rapidly. Games that use DirectX 9 level shaders,
either Shader Model 2.0 or 3.0, are tricky. Some shaders use floating-point
math, some integer math. The math required to draw a single pixel is
increasing-not just on spot areas like bumpy and shiny water, but on
virtually every pixel in the game. And it's not just blending together some
textures, either. "Data textures" like normal maps or gloss maps are used to
feed comparatively complex calculations to determine the final color of a
pixel. Compared with the number of pixels a GPU will effectively output per
clock cycle, a whole lot of math is going on, and the number of texture
fetches is going up, too.


We need a new way to talk about GPUs. Pipelines, clock rates, and fill rate
were a useful shorthand a couple of years ago, but that's no longer the
case. What do we do when the same shader units that perform pixel shading
operations are used for vertex shading operations? What do we do when the
arithmetic logic units (ALUs) aren't organized into neat little "pipelines"
or even quads anymore? How do we account for the fact that not all ALUs are
created equal-some can perform more operations per cycle than others, and
different GPUs may have ALUs that perform operations of different types. How
do we account for the increasing value of on-chip caches?

Before long, the performance of GPUs may hinge on some of the same features
that make for a good desktop CPU, things like out-of-order instruction
processing, translation lookaside buffers, or data prefetching logic.

What do you think the most important metric of next-generation GPUs will be?
And what simple, understandable terms should we use to compare them?"

_______________________________________________________________
 
I heard that in two generations of cards down the road, they're going to add
the P.O.F.F. technology. Of course it'll mean an extra fan in your case, but
hey, it's worth it! It sure will be nice to smell burning rubber in games
like Need for Speed! P.O.F.F.= potential odor fragrance fan. *snickers*

/\/\UF/-\S/-\
 
Back
Top