Performance Breakdown Writing to Memory Location

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi Folks!

For some reasons, this code:
float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
tmp = 0.;

}
cout << "J: " << i << endl;
}

is 10-100 times faster than this

float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
data2d[j] = tmp; // the problem
tmp = 0.;

}
cout << "J: " << i << endl;
}


Why is assigning a value that slow?

can anybody help me?

Thanks in advance for your efforts

-Chucker
 
Chucker said:
Hi Folks!

For some reasons, this code:
float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
tmp = 0.;

}
cout << "J: " << i << endl;
}

is 10-100 times faster than this

float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
data2d[j] = tmp; // the problem
tmp = 0.;

}
cout << "J: " << i << endl;
}


Why is assigning a value that slow?



What is tmp_mat? what's data2d?

First off, I hope you're measuring a release build as debug build timings
are pretty close to meaningless.

Secondly, in the first loop, tmp is probably stored in a floating point
register for the entire operation, while the second form has to make an
additional 4,000,000 memory writes. That's got to take a bit of time.

-cd
 
Sorry, maybe I did not make myself clear.

1.) All the "unknown" variables are float types

2.) I know that loop 1 does not write to a memory location. I am looking for
the most performant way to do this.

Thanks

Chucker

Carl Daniel said:
Chucker said:
Hi Folks!

For some reasons, this code:
float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
tmp = 0.;

}
cout << "J: " << i << endl;
}

is 10-100 times faster than this

float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
data2d[j] = tmp; // the problem
tmp = 0.;

}
cout << "J: " << i << endl;
}


Why is assigning a value that slow?



What is tmp_mat? what's data2d?

First off, I hope you're measuring a release build as debug build timings
are pretty close to meaningless.

Secondly, in the first loop, tmp is probably stored in a floating point
register for the entire operation, while the second form has to make an
additional 4,000,000 memory writes. That's got to take a bit of time.

-cd
 
Chucker said:
Sorry, maybe I did not make myself clear.

1.) All the "unknown" variables are float types
OK.

2.) I know that loop 1 does not write to a memory location. I am
looking for the most performant way to do this.

You may have already found it. Have you looked at a disassembly of the
first loop (without the writes)? In an optimized build, the compiler may
have simply omitted much (or all) of the loop if it can prove that the only
side-effect of the whole thing is to assign 0.0 to tmp.

-cd
Thanks

Chucker

Carl Daniel said:
Chucker said:
Hi Folks!

For some reasons, this code:

float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
tmp = 0.;

}
cout << "J: " << i << endl;
}

is 10-100 times faster than this


float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
data2d[j] = tmp; // the problem
tmp = 0.;

}
cout << "J: " << i << endl;
}

Why is assigning a value that slow?



What is tmp_mat? what's data2d?

First off, I hope you're measuring a release build as debug build
timings are pretty close to meaningless.

Secondly, in the first loop, tmp is probably stored in a floating
point register for the entire operation, while the second form has
to make an additional 4,000,000 memory writes. That's got to take a
bit of time.

-cd
 
Back
Top