Curious...I wonder why they decided to split the rendering horizonatally
as opposed to vertically...seems if they split the rendering vertically
they wouldn't have to bother w/ a separate algorithim to balance the rendering
load in realtime.
The only difference from physical point of view is in what order the
framebuffer is stored in memory and it seems less hassle to keep the memory
continuous for rectangular block of display, not split the memory as well as
the display area. This way each GPU could keep own separate framebuffer:
upper and lover half, quickly thinking that would be much less hassle.
If the display is rotated 90 or 270 degrees, I wonder which way the split is
done.. does the split rotate with the display or not? My educated guess is
that it does.
If I understand the process correctly the rendering is
not split 50/50 but based on the rendering load of a scene/screen...
Possibly but doing at efficiently requires some thinking. If we adaptively
move the scanline we split at, it means we must either:
- dynamically reserve more memory for scanlines added to current half-screen
- have the memory preallocated
If preallocated, which option?
- some fixed treshold such as 2/3 is allocated for each screen half,
totaling 33% memory waste, or:
- allocate full buffer for both screen halves, totaling 100% memory waste
If the memory was split 50/50 the memory waste would be 0% and no need for
dynamic allocation (which I doubt is done), my (again educated) guess is
that either they split 50/50 (and no dynamic balancing) or 100/100 (dynamic
balancing). Dynamic balancing has a slight problem, when other GPU reclaims
scanlines from the other, the current buffer contents must be copied to the
other GPU's framebuffer so that the buffers contents remain in sync.
The decision is propably better to do before the frame is rendered. Either
they keep copy of all rendering commands for the whole frame, which I doubt:
requires memory, and introduced one frame latency. Most viable way to do
this is to look at previous frame and track the amont of work for each GPU
and decide the splitting based on that information. Simple and would work
pretty nicely, that's what I'd do.
All things considered: dynamic balancing is propably not done intra-frame.
Furthermore...would this balancing act suckup GPU processing power??
No, because the parts of chip doing the work wouldn't know. It would eat
transistors to implement the balancing, which means (not necessarily
significantly) larger chip. Doing this in CPU would be unfeasible as it
would require write-back from vertex programs to know the coverages for each
screen half (since this is done in the GPU the load balancing don't leave
CPU must work). I'm assuming that load balancing is done at all, ofcourse.
...that could be used to render the scene perhaps???
No, because GPU's are not CPU's which execute generic purpose programs. In
GPU things are implemented in functional blocks, which means transistors are
used to implement fixed functionality for a lot of things that are done.
Sampling, scanconversion, clipping and so on and so on. The fragment- and
vertex programs are exception to this because they are in practise programs
which the "shader state machine" runs, but this is a red herring when it
comes to the principles involved here.
Just seems to make more sense...perhaps someone can shed some light on
my ignorance here???
I get the impression that you have the programmer's outlook on the issue at
hand, while it gives the basic tools to understand the algorithms and how
binary logic works you need a perspective shift to think in terms of "how
many gates would that take?" and thinking of the problem in functional
blocks, because that's how the chip designers do. It's not like von neuman
program or anything, it is more like N-dimensional array of gates. The two
main "camps" of design are synchronous and asynchronous logic, I got the
impression that NV would be in favour of synchronous logic but I could be
wrong. But if you approach the problem form this angle it might clear up
things...