E
Egbert Nierop \(MVP for IIS\)
Hi,
Out of curiousity, I sometimes look at the produced assembly after
compilation in release mode.
What you often see, is that CPP, always fully addresses registers to copy
values from a to b...
While stosb,stosw, stosd etc and the same for movs[x] are one statement, and
internally use registers ESI and EDI (source, destination) to copy data.
This seems (imho) more efficient, however, CPP never uses this construct...
it always uses a lot more instructions.
imagine this loop (I simplified the idea, of course, memcpy would be
normally used)
DWORD anArray [10000];
// copy array while skipping uneven element positions
for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
anArray[element] = somesource[element];
could be optimized to
setup source and destination
MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy
mylabel:
MOVSD <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a reason
the C++ compiler team decided not to try to optimize to this level?
Out of curiousity, I sometimes look at the produced assembly after
compilation in release mode.
What you often see, is that CPP, always fully addresses registers to copy
values from a to b...
While stosb,stosw, stosd etc and the same for movs[x] are one statement, and
internally use registers ESI and EDI (source, destination) to copy data.
This seems (imho) more efficient, however, CPP never uses this construct...
it always uses a lot more instructions.
imagine this loop (I simplified the idea, of course, memcpy would be
normally used)
DWORD anArray [10000];
// copy array while skipping uneven element positions
for (int mycounter=5000; mycounter != 0; mycounter--, element+=2)
anArray[element] = somesource[element];
could be optimized to
setup source and destination
MOV EDI, [anArray]
MOV ESI, [somesource]
MOV ECX, myCounter
DEC ECX
CLD // forward copy
mylabel:
MOVSD <--- actual loop and copy instruction
LOOP mylabel <-- decrement ECX until ECX == 0
Q: is the mentioned construct, simply not so efficient or is there a reason
the C++ compiler team decided not to try to optimize to this level?