Why do processors continue to increase in bit width, especially GPUs which
are already 128-bit?
Is it just pure marketing? Why else does the march of progress seem to imply
larger bit widths?
To figure out the answer to that question you first need to figure out
just how you are defining bit-width. For example, people often ask
about how PCs are only just now getting to use 64-bit chips when
gaming consoles used 64-bit chips WAY back with the Nintendo 64 (and
perhaps earlier). The answer to that is quite simple actually, the
definition of 64-bit was VERY different for these two systems.
There are a few ways that you can define how wide your chip is:
1. Width of the integer registers
2. Number of (memory address) bits in a standard pointer
3. Width of your data bus
4. Width of your vector registers
5. ???
The first two options tend to be how PC chips are defined these days.
AMD and Intel's new "64-bit" chips mainly differ from their "32-bit"
counterparts in that they can use 64-bit pointers for addressing large
quantities of memory (32-bit pointers limit you to a 4GB virtual
memory space). They also can handle a full 64-bit integer in a single
register, allowing them to easily deal with integer values with a
range larger than 4-billion (32-bit chips can do this as well, it's
just slower and uses more than one register). Of these two reasons
the memory addressibility factor was the real driving force. 4GB of
memory just isn't that much these days, and when you figure on the OS
taking 1 or 2GB of that you are left with a rather tight address
space. Start fracturing that address space up and it gets real hard
to do work with large amounts of data.
Now, the width of the data bus is a bit of an older definition of the
bitness of a CPU, it was quite common back in the 8 and 16-bit days.
It is also the definition that made the Nintendo64 a "64-bit" gaming
console. By this definition the Intel Pentium was the first "64-bit"
x86 CPU. Really this isn't too relevant anymore, especially with the
move to more serialized databuses (fast and narrow vs. the parallel
bus design of wide but slow), though you do sometimes see it with
video cards, particularly as a way to differentiate the low-end cards
from their more expensive brethren.
Of course, this isn't the only way that bitness is often defined in a
video card, sometimes they refer to it's bits according to the width
of the vector registers, our 4th option above. Note that these two
bitness definitions are not linked or mutually exclusive. For
example, you could have one card with a 64-bit data bus and 128-bit
wide registers while another card could have a 256-bit wide data bus
but the same 128-bit wide registers.
Video cards tend to execute the same instruction on MANY pieces of
data in succession. As such they are much better served by more of a
vector processor design approach when compared to standard (scalar)
CPUs which tend to only execute an instruction on one or two pieces of
data. Here a 128-bit vector can, for example, allow you to pack 4
chunks of 32-bit data into one register and run a single instruction
on all 4 chunks. Note that this sort of thing DOES exist in the PC
world as well in the form of SSE2. SSE2 has 128-bit vector registers
that can handle either 4 32-bit chunks of data or 2 64-bit chunks.
They aren't always as efficient at this sort of thing as a GPU, but
that's mainly because a GPU is a pretty application-specific design
while a CPU is much more of a general-purpose design.
Anyway, all that being said, it doesn't exactly answer your question
of "why do we move to higher levels of bitness". The real answer is
simply that it's one method of improving the performance of a computer
chip. It's not the only method, but sometimes it is simply the right
choice to make.