Why 32 bit arithmatic

  • Thread starter Thread starter PLS
  • Start date Start date
P

PLS

I'm very puzzled by the code that VC++ 2005 generated for a simple
statement:

unsigned short *a, b;

b += *a;

generated this
0062CD6B mov eax,dword ptr [ebp-64h]
0062CD6E movzx ecx,byte ptr [eax]
0062CD71 movzx edx,word ptr [ebp-1Ch]
0062CD75 add ecx,edx
0062CD77 call @ILT+11960(@_RTC_Check_4_to_2@4) (560EBDh)
0062CD7C mov word ptr [ebp-1Ch],ax

What I don't understand is why the compiler widened both variabled to 32
bit and did the arithmetic in 32 bit. Why not do the arithmetic in 16
bit?

This code needs to reproduce a checksum from another machine, and it is
essential that the sum be 16 bits long.
 
PLS said:
I'm very puzzled by the code that VC++ 2005 generated for a simple
statement:

unsigned short *a, b;

b += *a;

generated this
0062CD6B mov eax,dword ptr [ebp-64h]
0062CD6E movzx ecx,byte ptr [eax]
0062CD71 movzx edx,word ptr [ebp-1Ch]
0062CD75 add ecx,edx
0062CD77 call @ILT+11960(@_RTC_Check_4_to_2@4) (560EBDh)
0062CD7C mov word ptr [ebp-1Ch],ax

What I don't understand is why the compiler widened both variabled to 32
bit and did the arithmetic in 32 bit. Why not do the arithmetic in 16
bit?

This code needs to reproduce a checksum from another machine, and it is
essential that the sum be 16 bits long.

But the sum is 16 bits long. Look at the last line.

Brian
 
PLS said:
I'm very puzzled by the code that VC++ 2005 generated for a simple
statement:

unsigned short *a, b;

b += *a;

generated this
0062CD6B mov eax,dword ptr [ebp-64h]
0062CD6E movzx ecx,byte ptr [eax]
0062CD71 movzx edx,word ptr [ebp-1Ch]
0062CD75 add ecx,edx
0062CD77 call @ILT+11960(@_RTC_Check_4_to_2@4) (560EBDh)
0062CD7C mov word ptr [ebp-1Ch],ax

What I don't understand is why the compiler widened both variabled to 32
bit and did the arithmetic in 32 bit. Why not do the arithmetic in 16
bit?

This code needs to reproduce a checksum from another machine, and it is
essential that the sum be 16 bits long.

But the sum is 16 bits long. Look at the last line.

Brian

Indeed it is. But if you have run time checking on (the call to
@_RTC_CHECK...) you get an error message when the sum exceeds 16 bits.
This is an entirely artificial error, because I never asked for 32 bit
arithmetic.

So I'm back to my original question. Why were 32 bit registers used?

++PLS
 
PLS said:
So I'm back to my original question. Why were 32 bit registers used?

Compilers do that, because modern CPUs can work many times faster with
full word-sized registers. I would just use unsigned instead of unsigned
short. It doesn't matter if it overflows and gets cropped inside the
register, or doesn't overflow but you trim it. All you care about is
that after the 32-bit sum is calculated, you clear the upper 16 bits:

unsigned sum = 0;
while(...)
{
sum += *p++;
}
unsigned short result = static_cast<unsigned short>(sum); // trim

This will run much faster than the unsigned short arithmetics, and
you'll get the same result anyway.

You can do the static_cast trimming every time you need to ensure that
the value is strictly 16-bit -- which is only needed after summing, in
your case. Who cares what's in the upper bits when it doesn't affect the
lower ones? Just be careful, before you shift (>>), you need to clear
the upper bits out.

You would be surprised what a struggle it is for the CPU to work with
16-bit registers. The ALU is not able to perform 16-bit operations. If
you force the processor to do that, it has no choice but to insert
several micro instructions that normally wouldn't be necessary. For
example, the lower 16 bits must be zero or sign-extend into 32 bits for
the ALU to accept it, then chopped off after the operation is complete,
and masked back to the lower 16 bits of the register (because an
operation with the ax register doesn't corrupt the upper 16 bits of
eax). We're talking about many times more micro instructions than
normal, which could even cause pipeline penalties on top of the wasted
clocks.

Tom
 
Back
Top