Skybuck said:
Well
I just spent some time reading on about differences and similarities between
CPU's and GPU's and how they work together =D
Performing graphics requires assloads of bandwidth ! Like many GByte/sec.
you have NO idea.
Current CPU's can only do 2 GB/sec which is too slow.
in most cases, though, this is more than enough. there are
memory-intensive applications that need faster memory to keep from
stalling the CPU, but this is mostly a limitation of the FrontSide Bus
and memory modules. the faster and bigger memory gets, the more
expensive it is.
GPU's are becoming more generic. GPU's could store data inside their own
memory. GPU's in the future can also have conditional jumps/program flow
control, etc.
So GPU's are staring to look more and more like CPU's.
inasmuch that GPUs now perform pixel-specific calculations on texture
and video information. GPUs now also perform a lot of vector processing
work, which helps immensely in such tasks as model animation and the like.
It seems like Intel and AMD are a little bit falling behind when it comes to
bandwidth with the main memory.
Maybe that's a result of Memory wars
Like Vram, Dram, DDR, DRRII,
Rambit(?) and god knows what =D
no. those are different memory technologies that evolved to keep up
with the growing demand for faster memory. keep in mind CPUs are only
10x as fast as memory now, and that gap is shrinking FAST. (due to
increasing popularity of DDR and the introduction of DDR-II, which, by
the way, is very fast.)
Though intel and amd probably have a little bit more experience with making
generic cpu's are maybe lot's of people have left and joined GPU's lol.
Or maybe amd and intel people are getting old and are going to retire soon
<- braindrain
i haven't the slightest clue how you got this inane idea. AMD and intel
aren't going anywhere except UP. CPUs are where the Operating System
run, and they are where programs run. everything in a computer,
including its peripherals, depend on the CPU and its support hardware.
software, which depends on the CPU has the capability to offload work to
specialized hardware - which ATi, nVidia, Creative, 3D Labs, et. al.
manufacture. work can even be sent to the sound card for
post-processing and special effects! (this is audio processing that can
NOT be done on the CPU no matter how bad you want it to.)
However AMD and INTEL have always done their best to keep things
COMPATIBLE... and that is where ATI and NVidia fail horribly it seems.
ATi and nVidia are competing to set standards. what they are trying to
do is not create hardware that will work like other hardware. they are
trying to make hardware that is not only better in performance, but
better architecturally with more features and capabilities. all thanks
to the demands of the gaming industry.
you're not making an "apples-to-apples" comparison. you're comparing
two unrelated industries. things don't work that way.
nVidia and ATi write driver software to control their cards and pass
information from software to hardware. their *DRIVERS* provide two
interfaces to the hardware: DirectX and OpenGL. that is all the
standardisation nVidia, ATi, et. al. need. you can write an OpenGL game
and expect it to work on both brands of cards, for example.
My TNT2 was only 5 years old and it now can't play some games lol =D
that's because it doesn't have the capabilities that games expect these
days. games expect the hardware to handle all of the texturing,
lighting, and transformations, and rarely do it in software.
There is something called Riva Tuner... it has NVxx emulation... maybe that
works with my TNT2... I haven't tried it yet.
if memory serves, it activates a driver option. that emulation is in
nVidia's drivers. it's hidden and disabled by default but was made
available for developer testing.
The greatest asset of GPU's is probably that they deliver a whole graphical
architecture with it... though opengl and directx have that as well... these
gpu stuff explain how to do like vertex and pixel shading and all the other
stuff around that level.
again you've got it backwards. the GPU *PROVIDES* the graphical
services *THROUGH* OpenGL and DirectX. but in addition to this, OpenGL
and DirectX themselves provide a /separate/ software-only
implementation. that is a requirement of both standards. anything that
is not done in hardware must be done in software. *BY THE DRIVER.*
in addition, vertex shading is a convenience. nothing more. but it
happens to be a *FAST* convenience.
pixel shading can *only* be done in hardware. it is a technology that
requires so much computational power that ONLY a GPU can provide it. it
would take entirely too much CPU work to do all that.
Though games still have to make sure to reduce the number of triangles that
need to be drawn... with bsp's, view frustum clipping, backface culling,
portal engines, and other things. Those can still be done fastest with
cpu's, since gpu's dont support/have it.
you again have it wrong.
game engines use those techniques to reduce the amount of data that they
send to the GPU. vertices still have to be sent to the GPU before they
are drawn - and sent again for each frame. to speed THAT process up,
game engines use culling techniques so they have less data to send. not
so the GPU has less work to do.
So my estimate would be:
1024x768x4 bytes * 70 Hz = 220.200.960 bytes per second = exactly 210
MB/sec
So as long as a programmer can simply draw to a frame buffer and have it
flipped to the graphics card this will work out just nicely...
that's the way it was done before GPUs had many capabilities - like an
old TNT or the like.
but in those days, it was ALL done one pixel at a time.
So far no need for XX GB/Sec.
Ofcourse the triangles still have to be drawn....
for a triangle to be drawn, first the vertexes must be transformed.
this invoves a good deal of trigonometry to map a 3D point to a dot on
the screen. second, the vertices are connected and the area is filled
with a color. or:
1. vertexes transformed (rememebr "T&L"?)
2. area bounded among the vertices
3. texture processing is performed (if necessary. this is done by pixel
shaders now, so it's all on the GPU. without pixel shaders, this is
done entirely on the CPU.) this includes lighting, coloring (if
applicable), or rendering (in the case of environment maps).
4. the texture map is transformed to be mapped to the polygon.
5. the texture map is then drawn on the polygon.
in an even more extreme case, we deal with the Z-buffer (or the DirectX
W-buffer), vertex transformations (with vertex shaders), screen
clipping, and screen post-processing (like glow, fades, blurring,
anti-aliasing, etc.)
Take a beast like Doom III.
How many triangles does it have at any given time...
Thx to bsp's, (possible portal engines), view frustum clipping, etc...
Doom III will only need to drawn maybe 4000 to maybe 10000 triangles for any
given time. ( It could be more... I'll find out how many triangles later
)
Maybe even more than that...
as you found out very quickly, it draws a LOT of polygons. how many did
Doom use? there were about 7 or 8 on the screen, on average. more for
complex maps. the monsters? single-polygon "imposters" (to use current
terminology).
But I am beginning to see where the problem is.
Suppose a player is 'zoomed' in or standing close to a wall...
Then it doesn't really matter how many triangles have to be drawn....
Even if only 2 triangles have to be drawn... the problem is as follows:
All the pixels inside the triangles have to be interpolated...
And apperently even interpolated pixels have to be shaded etc...
Which makes me wonder if these shading calculations can be interpolated...
maybe that would be faster
But that's probably not possible otherwise it would already exist ?!
Or
somebody has to come up with a smart way to interpolate the shading etc for
the pixels
during the texture shading step, that's irrelevant. most textures are
square - 256x256, 512x512, 1024x1024, etc. (there are even 3D textures,
but i won't go into that.) when those textures are shaded for lighting,
all of those pixels must be processed before the texture can be used.
pixel shaders change that and enable us to do those calculations at
run-time.
i won't explain how, since i'm not entirely sure. i haven't had the
chance to use them yet. they may be screen-based or texture-based. i
don't know. maybe both. i'll find out one of these days.
So now the problem is that:
1024x768 pixels have to be shaded... = 786.432 pixels !
That's a lot of pixels to shade !
really. you figured this out. how nice. imagine doing that on the CPU.
on second thought, go back up and re-read what i said about pixel
shaders being done in software.
There are only 2 normals needed I think... for each triangle... and maybe
with some smart code... each pixel can now have it's own normal.. or maybe
each pixel needs it's own normal... how does bump mapping work at this point
?
normals are only really required for texture orientation and lighting.
with software lighting, the texture is usually processed and then sent
to the GPU before the level begins. some games still do that. some use
vertex lighting which gives a polygon lighting based on the strength of
light at each vertex - rather than at each pixel within the polygon. as
you can imagine, that's quite ugly.
hardware lighting (implied by "T&L") gives that job to the GPU. it
allows the GPU to perform lighting calculations accurately across a
polygon while the CPU focuses on more important things. (you might see
this called "dynamic lighting".)
pixel shaders allow for even more amazing dynamic effects with lights in
real-time.
now, about bump-mapping. well, it is what its name implies. it is a
single-color texture map that represents height data. it is processed
based on the position of lights relative to the polygon it's mapped to.
it adds a lot to the realism of a game.
there are numerous algorithms for this, each with its advantages and
disadvantages. nVidia introduced something cool with one of the TNTs
(or maybe GeForce. my history is rusty.) called "register combiners".
this allows developers to do lots of fancy texture tricks like bump
mapping on the GPU.
the basic idea is that light levels are calculated based on wether the
bump map is rising or falling in the direction fo the light. if you
want to know more, there are a lot of tutorials out there.
In any case let's assume the code has to work with 24 bytes for a normal.
(x,y,z in 64 bit floating point ).
it's not. 32-bit floats are more commonly used because of the speed
advantage. CPUs are quite slow when it comes to floating-point math.
(compared to GPUs or plain integer math.)
The color is also in r,g,b,a in 64 bit floating point another 32 bytes for
color.
Maybe some other color has to be mixed together I ll give it another 32
bytes...
Well maybe some other things so let's round it at 100 bytes per pixel
now you're way off. the actual data involves 16 bytes per vertex (the
fourth float is usually 0.0), with usually 3 vertices per polygon, and a
plane normal (another 4-part vertex, sometimes not stored at all), with
texture coordinates. that's sent to the GPU in a display list.
the GPU performs all remaining calculations itself and /creates/ pixel
data that it places in the frame buffer. the frame buffer is then sent
to the monitor by way of a RAMDAC.
So that's roughly 5.1 GB/sec that has to move through any processor just to
do my insane lighting per pixel
assuming a situation that doesn't exist in game development on current CPUs.
Ofcourse doom III or my insane game... uses a million fricking verteces (3d
points) plus some more stuff.
vertex x,y,z,
vertex normal x,y,z
vertex color r,g,b,a
So let's say another insane 100 bytes per vertex.
let's not.
1 Million verteces * 100 bytes * 70 hz = 7.000.000.000
Which is rougly another 7 GB/sec for rotating, translating, storing the
verteces etc.
no. you're still assuming the video card stores vertices with 64-bit
precision internally. it doesn't. 32-bit is more common. 16-bit and
24-bit is also used on the GPU itself to varying degrees.
So that's a lot of data moving through any processor/memory !
it would be, but it's not.
I still think that if AMD or Intel is smart... though will increase the
bandwidth with main memory... so it reaches the Terabyte age
you're misleading yourself now. it's not Intel or AMD's responsibility.
And I think these graphic cards will stop existing
just like windows
graphic accelerator cards stopped existing...
they still exist. what do you think your TNT2 does? it accelerates
graphics. on windows. imagine that.
And then things will be back to normal =D
Just do everything via software on a generic processor <- must easier I hope
=D
you still seem terribly confused about what even is stored in GPU memory.
the vertex data is actually quite small and isn't stored in video memory
for very long.
the reason video card susually come with so much memory is for TEXTURE,
FRAME BUFFER, and AUXILLIARY BUFFER storage.
the frame buffer is what you see on the screen.
the Z buffer is the buffer that keeps track of the vertex distance from
the screen. this is so it can sort the polygons and display them correctly.
auxilliary buffers can be a lot of things: stencil buffers are used for
shadow volume techniques, for example.
textures take up the majority of that storage space. a single 256x256
texture takes up 262,144 bytes. a 512x512 texture takes up 1MB.
1024x1024 is 4MB. textures as large as 4096x4096 are possible (though
not common) - that's 64MB.
and what of 3D textures? let's take a 64x64x64 texture. small, right?
that's 1MB all on its own.
so how big is the frame buffer? well, if there's only one, that's just
fine. but DirectX supports triple-buffering and OpenGL supports
double-buffering. that means 2 or 3 frames are stored at once. they
are flipped to the screen through the RAMDAC.
and not only must the GPU store and keep track of all that data, but it
PROCESSES IT in real-time with each frame.
your proposal requires we go back to the 200+ instructions per pixel
that games once required. do you expect us to go back to mode 13h where
that kind of computation is still feasible with the same kind of graphic
quality we have now?
for us as developers, the GPU is a Godsend. it has saved us from doing
a lot of work ourselves. it has allowed us to expand our engines to the
point where we can do in real-time that once required vast
super-computer clusters to do over the course of MONTHS. the GeForce 4
(as i recall) rendered Final Fantasy: The Spirits Within at 0.4 frames
per second. it took several MONTHS to render the final movie on CPUs.
one final time: CPUs are general purpose. they are meant to do a lot
of things equally well. GPUs are specialized. they are meant to do one
thing and do it damned well. drivers for those GPUs are written to make
developers' lives easier, and let developers do wat is otherwise impossible.
now, i'll close because lightning may strike the power grid at any
moment after the rain passes.