G
Guest
I am using MSVC++ 2005, and compiling with /02, Maximum Speed (/0x Full
Optimizations doesn't change anything).
Never mind what my code does, it's what the compiler does with my code that
I'm posting about.
This will generate a forward jump, basically skipping over a move (increment)
if (BinaryHeap[current2].key > BinaryHeap[current2+1].key) {
++current2;
}
This code here is much faster:
current2 += (BinaryHeap[current2].key > BinaryHeap[current2+1].key);
Disassembler shows it generates an xor, setg, and add, but no jmp.
Assuming the branches are roughly 50/50, which in my case they are, the code
without the jump is much faster. In my benchmark, the cycles dropped from
about 14k to 12k, because the jump is inside a tight loop.
I don't know what the ratio of branch taken vs. branch not taken would have
to be for the default code to be faster. I can only guess that this
optimization wasn't used because it wasn't a sure optimization. But IMO it's
pretty close to a sure thing. I'm sure there is a way for a programmer who
knows a jump will be taken less often to write his code in such a way that it
generates a single jump rather than multiple ops, and is therefore faster.
So I think the code without the jump should be generated by default by the
compiler. This allows me to have my cake (readability) and eat it too
(performance).
Thoughts? Code available upon request.
Optimizations doesn't change anything).
Never mind what my code does, it's what the compiler does with my code that
I'm posting about.
This will generate a forward jump, basically skipping over a move (increment)
if (BinaryHeap[current2].key > BinaryHeap[current2+1].key) {
++current2;
}
This code here is much faster:
current2 += (BinaryHeap[current2].key > BinaryHeap[current2+1].key);
Disassembler shows it generates an xor, setg, and add, but no jmp.
Assuming the branches are roughly 50/50, which in my case they are, the code
without the jump is much faster. In my benchmark, the cycles dropped from
about 14k to 12k, because the jump is inside a tight loop.
I don't know what the ratio of branch taken vs. branch not taken would have
to be for the default code to be faster. I can only guess that this
optimization wasn't used because it wasn't a sure optimization. But IMO it's
pretty close to a sure thing. I'm sure there is a way for a programmer who
knows a jump will be taken less often to write his code in such a way that it
generates a single jump rather than multiple ops, and is therefore faster.
So I think the code without the jump should be generated by default by the
compiler. This allows me to have my cake (readability) and eat it too
(performance).
Thoughts? Code available upon request.