must go faster!

bill · Feb 19, 2005

I am trying to figure out if I can use sse to help execute arithmetic
operations faster. I have 900 values that must each be scaled with a divide
and multiply. This happens repeatedly. Any examples I can be pointed to
would be greatly appreciatted. I realize I could do just one multiply
(instead of multiply and divide) but I still want to do 900 (or as many as I
can) at once.

Any ideas would be appreciatted.

Bill

Steve McLellan · Feb 20, 2005

Hi,

If you post the algorithm, people may be able to help optimise it. We do a
lot of intensive maths and use vector libraries (from Apple and Intel) to
take care of the low-level stuff, like multiplication and division. We use
the Intel Integrated Performance Primitives library and the performance is
incredible (especially on Intel libraries) over doing hand-coded loops. That
said, it may be possible to squeeze some performance out simply by
optimising the algorithm (as you say, combining the multiplication and
division).

Steve

bill · Feb 20, 2005

Ok here we go. A little mix of psuedocode and real code.

For 900 blocks
read signed integer values - there are 4
scale values ; scaled value = read value / 32768 * 360
store value as double
next block

read value is 16 bit signed int
stored scaled value is of type double

I realize I could just do : (double)round((double)read value/91.02222)

But if I could do a vector, I could go fast. maybe do 900 at a time. I'm
just not up on single instruction multiple data stuff.

Just an example,,, please.

Thanks,
Bill

Fredrik Wahlgren · Feb 20, 2005

Multiplication should be faster than division. Thus, instead of division by
91.0222, you can multiply by 0.010986328125

/Fredrik

Steve McLellan · Feb 21, 2005

Hi,

Like Fredrick said, you can make that a multiplication. Vectorised code is
useful for doing (like we do here) multiple iterations of exponentials,
logs, fourier transforms etc over 65000 digit chunks. For 900 blocks I'd
expect this kind of thing to be practically instantaneous on a modern
processor.

Steve

Gilles Vollant [MVP] · Feb 26, 2005

did you try compile with SSE and SSE2 (with Visual C++ 6.0/processor pack or
Visual Studio .Net) ?
pehaps this can help. This is just an idea...

Gilles Vollant [MVP] · Feb 26, 2005

did you try compile with SSE and SSE2 (with Visual C++ 6.0/processor pack or
Visual Studio .Net) ?
pehaps this can help. This is just an idea...

Fred Hebert · Mar 17, 2005

I am new to MSC and .NET, coming form a Unix/Linux & Borland background. I
am evaluating using MS tools instead of Borland tools for our PC/Windows
apps.

Anyhow, are you guys telling me that the expression "32768 * 360" is not
optimized by the compiler? I thought all compilers would optimize all
constant mathematical expressions. For documentation purposes I often use
things like "60 * 60 * 24" to represent seconds in a day rather than just
putting 86400. I have looked at the actual code generated and the
constants have been evaluated to a single value, at least on other
compilers...

Back to the original question, I had some ideas about your original
discussion:

1. Array libraries are great. It is often much quicker to look up complex
math than to calculate it.

2. You didn't specify what you were doing with these numbers, or the degree
of accuracy needed. Can you use binary math instead of real math? I mean
there are several published routines that use binary approximations for
things like sin(), cos() and others that are really fast, and the accuracy
is good enough for calculating rotations of thing being displayed on the
screen.

3. Again I was not sure if your example was exactly what you were trying to
do, or just a short for instance, but there are a lot of shortcuts you can
take if high precision is not necessary. Take a look at some of the math
that gamers use to do quick calculations for drawing and scaling objects on
the screen. They are not high precision, but they are FAST.

Carl Daniel [VC++ MVP] · Mar 17, 2005

Fred Hebert said:
I am new to MSC and .NET, coming form a Unix/Linux & Borland background. I
am evaluating using MS tools instead of Borland tools for our PC/Windows
apps.

Anyhow, are you guys telling me that the expression "32768 * 360" is not
optimized by the compiler? I thought all compilers would optimize all
constant mathematical expressions. For documentation purposes I often use
things like "60 * 60 * 24" to represent seconds in a day rather than just
putting 86400. I have looked at the actual code generated and the
constants have been evaluated to a single value, at least on other
compilers...

Of course they are - I'm not sure how you reached that conclusion based on
this thread, but...

-cd

Fred Hebert · Mar 17, 2005

Of course they are - I'm not sure how you reached that conclusion
based on this thread, but...

-cd

From the statement "I realize I could do just one multiply (instead of
multiply and divide) but" in the first message, and "Multiplication should
be faster than division. Thus, instead of division by 91.0222, you can
multiply by 0.010986328125" in the fourth message. I would not think that
made any difference.

I haven't looked at the machine code generated by the MS compiler yet, and
probably won't. I just thought it sounded odd to be worrying about things
like that or order of operators. My experience is that modern compilers
generally optimize those things pretty well.

The guys sounded like they were fairly familiar with math routines, I was
just wondering why they were worrying about something that I thought was
insignificant. 20 years ago programmers had to be concerned about the small
details. Good programmers often took time to "optimize" their code for the
compiler, manually align structures on word boundaries, etc...

Aren't modern compilers wonderful?

Bruno van Dooren · Mar 18, 2005

it has been a long time since i looked at the details, but division is
generally slower than multiplication. the reason is that the algorithms
involved in division are more complex than the routines for multiplication.

the sort of optimization you talk about is done only with integer operations
(at least that is my experience with the TI compiler for DSPs). the reason
is that most floating point math is impossible to convert to a 100 %
equivalent result if decimal values are used.

for example, in my naive programming newbie years i had a piece of code like
this:

current = 0;
while(current != 1.2)
{
//do stuff
current+= 0.3;
}

can you guess what happened: right the loop never finished. the reason for
this is that 4 times 0.3 is not equal to 1.2 (check floating point
definition if you don't believe me).

floating point math is not simple, and there are a number of rules you have
to follow if you want to have accurate results.

multiplying with the reverse of a number is not 100% the same as doing a
division. it would be inexcusable if a compiler changed the functionality of
a program without guaranteeing the same results.

dividing an integer by 2 however is often replaced by shifting the bits 1
position to the right, because it is faster and it has the exact same
result.

kind regards,
Bruno.

Severian · Mar 18, 2005

On Fri, 18 Mar 2005 09:31:11 +0100, "Bruno van Dooren"

dividing an integer by 2 however is often replaced by shifting the bits 1
position to the right, because it is faster and it has the exact same
result.

In standard C and C++, this is necessarily true only for unsigned
values!

The result of right-shifting a signed quantity is
implementation-defined.

must go faster!

bill

Steve McLellan

bill

Fredrik Wahlgren

Steve McLellan

Gilles Vollant [MVP]

Gilles Vollant [MVP]

Fred Hebert

Carl Daniel [VC++ MVP]

Fred Hebert

Bruno van Dooren

Severian