Hello,
Is there any work being done on using specific features of a processor
to increase performance? For example, on AMD Athlon XPs, there are 4
integer execution pipelines. I can get a 500% decrease in time if I
do a loop like this:
int sums0=0, sums1=0, sums2=0, sums3=0, sums=0;
for(x=0;x<nums.Length/4;x+=4)
{
sums0+=nums[x];
sums1+=nums[x+1];
sums2+=nums[x+2];
sums3+=nums[x+3];
}
sums=(sums0+sums1)+(sums2+sums3);
where nums[] is an array of integers. I know this would be hard to
implement in the JIT, but isn't one of the (main) ideas behind the JIT
is the ability to do run-time optimizations for whatever platform the
code is running on?
Thanks,
Austin Ehlers
We've done more perf work in the JIT for out next version than for our
previous version, but we still won't be generating SSE2 or MMX code in our
codegen.
The rationale behind not doing SSE2 was that we didn't have the time to do a
vectorizing optimizations. If you use SSE2 for scalar operations, it's not
always faster than the equivalent x87 code in 'normal' code (adds and muls
have different latencies in SSE2 vs x87 (mul has lower latency in SSE2, but
add is higher, IIRC), plus some operations (casting from doubles to floats
or floats to doubles) are quite slow in SSE2 compared to x87. We also have
to support processors without SSE2.
So, with all these arguments against it, we decided to focus our work on
improving our x87 codegen and leaving the door open for an SSE2
implementation, instead of putting all our eggs in the SSE2 basket.