Missed jmp Optimization?

Guest · May 10, 2006

I am using MSVC++ 2005, and compiling with /02, Maximum Speed (/0x Full
Optimizations doesn't change anything).

Never mind what my code does, it's what the compiler does with my code that
I'm posting about.

This will generate a forward jump, basically skipping over a move (increment)
if (BinaryHeap[current2].key > BinaryHeap[current2+1].key) {
++current2;
}

This code here is much faster:
current2 += (BinaryHeap[current2].key > BinaryHeap[current2+1].key);
Disassembler shows it generates an xor, setg, and add, but no jmp.

Assuming the branches are roughly 50/50, which in my case they are, the code
without the jump is much faster. In my benchmark, the cycles dropped from
about 14k to 12k, because the jump is inside a tight loop.
I don't know what the ratio of branch taken vs. branch not taken would have
to be for the default code to be faster. I can only guess that this
optimization wasn't used because it wasn't a sure optimization. But IMO it's
pretty close to a sure thing. I'm sure there is a way for a programmer who
knows a jump will be taken less often to write his code in such a way that it
generates a single jump rather than multiple ops, and is therefore faster.
So I think the code without the jump should be generated by default by the
compiler. This allows me to have my cake (readability) and eat it too
(performance).

Thoughts? Code available upon request.

Carl Daniel [VC++ MVP] · May 10, 2006

AalaarDB said:
I am using MSVC++ 2005, and compiling with /02, Maximum Speed (/0x
Full Optimizations doesn't change anything).

Never mind what my code does, it's what the compiler does with my
code that I'm posting about.

This will generate a forward jump, basically skipping over a move
(increment) if (BinaryHeap[current2].key >
BinaryHeap[current2+1].key) { ++current2;
}

This code here is much faster:
current2 += (BinaryHeap[current2].key > BinaryHeap[current2+1].key);
Disassembler shows it generates an xor, setg, and add, but no jmp.

Assuming the branches are roughly 50/50, which in my case they are,
the code without the jump is much faster. In my benchmark, the
cycles dropped from about 14k to 12k, because the jump is inside a
tight loop.
I don't know what the ratio of branch taken vs. branch not taken
would have to be for the default code to be faster. I can only guess
that this optimization wasn't used because it wasn't a sure
optimization. But IMO it's pretty close to a sure thing. I'm sure
there is a way for a programmer who knows a jump will be taken less
often to write his code in such a way that it generates a single jump
rather than multiple ops, and is therefore faster. So I think the
code without the jump should be generated by default by the compiler.
This allows me to have my cake (readability) and eat it too
(performance).

Thoughts? Code available upon request.

Interesting. My thought is that it's too specialized an optimization to be
worth the trouble of modifying the compiler. It's likely that on a typical
code base it'll make an immeausrably small performance improvement. It
sounds like your case is special.

-cd

Vs2005 Beta and Profile Guided Optimizations	5	Sep 2, 2004
visual c++ .net 2003 bug in c4090 waring	6	Dec 29, 2005
C++/CLI the fastest compiler? Yes, at least for me. :-)	44	Mar 12, 2006
The Witcher 3 upgrade	4	Dec 14, 2022
Optimization issue with VC7.1	1	Dec 17, 2004
static instance of a class optimized out of existence...	6	Jun 1, 2006
error C2248 - bug in MSVC71	1	Oct 20, 2003
Debug optimized code? (with inlined methods)	7	Jan 30, 2004

Missed jmp Optimization?

Guest

Carl Daniel [VC++ MVP]

Ask a Question

Similar Threads