For-loop expression?

  • Thread starter Thread starter Dave Veeneman
  • Start date Start date
D

Dave Veeneman

In a for-loop, is a calculated expression re-calculated on each pass through
the loop, or only once, when the loop is initialized? For example, assume
the following loop:

for (int i = 0; i < myArray.Length - 1; i++)
{
// code here
}

Is "myArray.Length - 1" calculated once, or on each pass through the loop?
Or, to put it another way, is there any advantage to:

int n = myArray.Length - 1;
for (int i = 0; i < n; i++)
{
// code here
}

Thanks in advance.
 
Dave Veeneman said:
In a for-loop, is a calculated expression re-calculated on each pass through
the loop, or only once, when the loop is initialized? For example, assume
the following loop:

for (int i = 0; i < myArray.Length - 1; i++)
{
// code here
}

Is "myArray.Length - 1" calculated once, or on each pass through the loop?
Or, to put it another way, is there any advantage to:

int n = myArray.Length - 1;
for (int i = 0; i < n; i++)
{
// code here
}

Thanks in advance.

Well, in C++ it would calculate on each loop, unless this value stays
unchanged and compiler has code optimisation.
I am a newbie for C#, so i can't guaranty accuracy of my answer, but i would
rather do:

int len = Array.Lenght() - 1;
for (int i = 1 ; i < n ; i++ )

Regards,
Andrey aka MuZZy
 
Dave Veeneman said:
In a for-loop, is a calculated expression re-calculated on each pass through
the loop, or only once, when the loop is initialized?

The conditional to break out of the for-loop is executed every pass through
the loop. So to answer your question,
for (int i = 0; i < myArray.Length - 1; i++)

Performs a subtraction myArray.Length - 1 times.
int n = myArray.Length - 1;
for (int i = 0; i < n; i++)

Performs a subtraction once.

In general, its advisable to move invariant logic out of the loop conditional.

Normally, it makes little difference when you're looping over the .Length of
an array, because the IL is optimized for this situation:

for ( int i = 0; i < myArray.Length; ++i)

If you look at the difference between this statement, and what you might
think would be a faster statement in,

for ( int i = 0, iMax = myArray.Length; i < iMax; ++i)

You'll find that the IL of the former executes the IL instruction, ldlen, once
each time through the loop, whereas the latter executes ldlen once before
entering the loop and stores it on the stack. Instead of calling ldlen, the
second loop looks-up the length in the stack on each iteration.

On the other hand, as soon as you start doing arithmetic operations on
the length, the iterations of the loop multiply the number of operations
being performed. Thus, whenever operations on the length must be
performed, and the length if fixed, it's a good idea to move it outside
of the loop.

What's more, take a quantity that many developers might think of as being
equivalent to an array's Length -- the Count property of a Collection like
ArrayList. What is the cost of having Count in the conditional of a for-loop
you may ask?

Count is a virtual property, so there is very little in the way of optimization of
IL here. A callvirt operation has to walk the vtable of (potentially) several
accessor methods. Thus, the innocuous-looking access of the Count on
a collection is easily one of the more expensive choices a developer can
undertake in their for-loop conditionals.


Derek Harmon
 
In general, its advisable to move invariant logic out of the loop conditional.

Well, unless it harms readability...
Normally, it makes little difference when you're looping over the .Length of
an array, because the IL is optimized for this situation:

for ( int i = 0; i < myArray.Length; ++i)

I don't believe it's the IL which is optimised here - it's the JIT
compiler which can optimise things appropriately.

Indeed, I believe that when you're using simple code such as the above,
you can actually get a performance *hit* by moving the invariant out of
the loop. I believe the JIT compiler can (in certain situations)
recognise the IL generated from a construct such as the above and do
clever things like removing bounds checks if it knows that the variable
used to access the array won't change value. I believe such
optimisations aren't performed when the length is hoisted out of the
loop.

Here's a sample program:

using System;

public class Test
{
static void Main()
{
int[] x = new int[100000];

Hoisted(x);
NotHoisted(x);
}

static void Hoisted(int[] foo)
{
int n = foo.Length;
for (int i=0; i < n; i++)
{
foo = foo*2;
}
}

static void NotHoisted(int[] foo)
{
for (int i=0; i < foo.Length; i++)
{
foo = foo*2;
}
}
}

Using cordbg with JIT optimisations on, here's the code for Hoisted and
NotHoisted:


Hoisted:
[0000] push esi
[0001] mov esi,dword ptr [ecx+4]
[0004] xor edx,edx
[0006] cmp esi,0
[0009] jle 00000016
[000b] cmp edx,dword ptr [ecx+4]
[000e] jae 00000013
[0010] mov eax,dword ptr [ecx+edx*4+8]
[0014] add eax,eax
[0016] mov dword ptr [ecx+edx*4+8],eax
[001a] inc edx
[001b] cmp edx,esi
[001d] jl FFFFFFEE
[001f] pop esi
[0020] ret
[0021] xor ecx,ecx
[0023] call 724FF980
[0028] int 3

NotHoisted:
[0000] xor edx,edx
[0002] cmp dword ptr [ecx+4],0
[0006] jle 00000012
[0008] mov eax,dword ptr [ecx+edx*4+8]
[000c] add eax,eax
[000e] mov dword ptr [ecx+edx*4+8],eax
[0012] inc edx
[0013] cmp edx,dword ptr [ecx+4]
[0016] jl FFFFFFF2
[0018] ret

Here, it looks like the JIT compiler hasn't done the hoisting for us,
but it's removed the array bounds checking. In both cases, you still
end up with a single length check on each iteration, but (to my mind)
the NotHoisted C# code is more readable. Basically, you can't gauge
performance based on IL alone. (It's hard to do it from assembly, too,
but there we go...)
What's more, take a quantity that many developers might think of as being
equivalent to an array's Length -- the Count property of a Collection like
ArrayList. What is the cost of having Count in the conditional of a for-loop
you may ask?

Count is a virtual property, so there is very little in the way of optimization of
IL here. A callvirt operation has to walk the vtable of (potentially) several
accessor methods. Thus, the innocuous-looking access of the Count on
a collection is easily one of the more expensive choices a developer can
undertake in their for-loop conditionals.

It's still unlikely to end up being a bottleneck, and I view
readability as being *much* more important. To me, it's more
immediately clear what

for (int i=0; i < collection.Count; i++)
{
....
}

does than

int n = collection.Count;
for (int i=0; i < n; i++)
{
....
}
 
Derek Harmon said:
Derek Harmon

Great reading Derek.

I would like to continue on this subject by taking it from another angle.

The overhead of having calculations in the for statement or calculating an
invariant result only once, may be tiny. But there is also a matter of
understandability and maintainability.

If you calculate the invarant result before the loop people maintaining your
code wiill immediately understand that the value is invariant. If you do the
calculation inside the for-statement you are signaling that the value may
very well vary. The coder must check whether the result is invariant or not.

This is not really an issue on something as typical like taking Length - 1
on a list. It's quite obvious. But I always try to write code that is
instantly obvious at first glance. Hence it is my recommendation to never
evaluate invariant expressions inside for- or while-loops.

This is also why I try to use foreach whenever a can parse a vector without
using an index.

happy coding

- Michael S
 
Michael S said:
I would like to continue on this subject by taking it from another angle.

The overhead of having calculations in the for statement or calculating an
invariant result only once, may be tiny. But there is also a matter of
understandability and maintainability.

Absolutely, but...
If you calculate the invarant result before the loop people maintaining your
code wiill immediately understand that the value is invariant. If you do the
calculation inside the for-statement you are signaling that the value may
very well vary. The coder must check whether the result is invariant or not.

I don't think that's actually very likely - it's certainly not
something I've come across often. The length of an actual array never
changes, so you'd have to change which array you were using in order
for the length to change. A collection's size may change, but in that
case you almost always have to use a different construct anyway to
avoid problems.

If the upper bound of the loop *could* change, I'd probably highlight
that with a comment instead.
This is not really an issue on something as typical like taking Length - 1
on a list. It's quite obvious. But I always try to write code that is
instantly obvious at first glance. Hence it is my recommendation to never
evaluate invariant expressions inside for- or while-loops.

However, I find it's more instantly obvious what

for (int i=0; i < array.Length; i++)
{
....
}

means than

int n = array.Length;
for (int i=0; i < n; i++)
{
....
}

So while we agree that readability is very important, we certainly
*disagree* on which is the more readable form :)
This is also why I try to use foreach whenever a can parse a vector without
using an index.

Agreed on that front.
 
Jon Skeet said:
Absolutely, but...
not.

I don't think that's actually very likely - it's certainly not
something I've come across often. The length of an actual array never
changes, so you'd have to change which array you were using in order
for the length to change. A collection's size may change, but in that
case you almost always have to use a different construct anyway to
avoid problems.

If the upper bound of the loop *could* change, I'd probably highlight
that with a comment instead.


However, I find it's more instantly obvious what

for (int i=0; i < array.Length; i++)
{
...
}

means than

int n = array.Length;
for (int i=0; i < n; i++)
{
...
}

So while we agree that readability is very important, we certainly
*disagree* on which is the more readable form :)


Agreed on that front.

For simple statements I also agree. Especially for calculating typical
things like i < arr.Lengthor i < myColumn.Count. But for more complex
calculations I move them outside the loop.

In the end, performance should be the least factor as the overhead is so
slight. It's more important to choose one coding style and stick to it for
all code in on project, one development-team or company.

It's when you interchange coding-styles you confuse programmers.

- Michael S
 
Michael S said:
For simple statements I also agree. Especially for calculating typical
things like i < arr.Lengthor i < myColumn.Count. But for more complex
calculations I move them outside the loop.

It would depend on the calculation, but yes - if the calculation wasn't
immediately obvious in intent, it would help to hoist it just so that
there was a variable name associated with it, so that the name could
end up being the thing which increased readability.
In the end, performance should be the least factor as the overhead is so
slight. It's more important to choose one coding style and stick to it for
all code in on project, one development-team or company.

It's when you interchange coding-styles you confuse programmers.

Agreed - although of course some style changes are more confusing than
others. (Colleagues of mine put member variables in different places to
me. That's not a problem. If they started using different naming
conventions, that would be much more of an issue.)
 
I ran some tests. I'm interested in the same information for some projects
being worked on. It appears that the difference in performance between the
two is so negligable that if I run the test numerous times, intermittenly,
one or the other will be better in performance but never just one of them.
Here's the source to my test:


Just place a multiline text field on the form called "txtResults".

----------------------------------
Code in Form1
----------------------------------
private void button4_Click(object sender, System.EventArgs e) {
IntervalTimer timer = new IntervalTimer();

int[] x = new int[1000000];
int j;
float average1 = 0;
float average2 = 0;
int count = 100;

txtResults.Text = "";

for (j=0; j<count; j++) {
timer.Start();
for (int i=0; i<x.Length-1; i++) {
x = i*2;
}
timer.Stop();

average1 += timer.GetSeconds();
txtResults.Text += "Inlined: " + timer.GetSeconds().ToString() +
"\r\n";
}

txtResults.Text += "Inlined Average: " + Convert.ToString(average1
/ j) + "\r\n";

for (j=0; j<count; j++) {
timer.Start();

int k = x.Length-1;

for (int i=0; i<k; i++) {
x = i*2;
}
timer.Stop();

average2 += timer.GetSeconds();
txtResults.Text += "Outlined: " + timer.GetSeconds().ToString()
+ "\r\n";
}


txtResults.Text += "Outlined Average: " + Convert.ToString(average2
/ j) + "\r\n";

float delta = 0;
if (average1 > average2) {
delta = 1 - (average2 / average1);
txtResults.Text = "Outlining is " + delta.ToString() + "%
faster" + "\r\n" + txtResults.Text;
}
else {
delta = 1 - (average1 / average2);
txtResults.Text = "Inlining is " + delta.ToString() + "% faster"
+ "\r\n" + txtResults.Text;
}
}

-------------------------------------------
Code in Timer.cs
-------------------------------------------

using System;
using System.Runtime.InteropServices;


namespace Processor
{

/* Code originally from "Advanced .NET" (WROX). Modified according to my
needs.
*
* Description:
*
* Using the Windows high-performance query counter API's, we can measure
the length
* of time an operation executed. Simply call Start() and Stop() then
get the time
* with Seconds(), Ticks(), or ToString()...
*
*/
public class IntervalTimer {
[DllImport("kernel32.dll")]
static extern private int QueryPerformanceCounter(out long count);

[DllImport("kernel32.dll")]
static extern private int QueryPerformanceFrequency(out long count);

public enum TimerState {
NotStarted,
Stopped,
Started
}

private TimerState state;
private long ticksAtStart;
private long intervalTicks;
private static long frequency;
private static int decimalPlaces;
private static string formatString;
private static bool initialized = false;



public IntervalTimer() {
if (!initialized) {
QueryPerformanceFrequency(out frequency);
decimalPlaces = (int)Math.Log10(frequency);

formatString = String.Format("Interval: {{0:F{0}}} seconds
({{1}} MiliSeconds)", decimalPlaces);
initialized = true;
}
state = TimerState.NotStarted;
}

public void Start() {
state = TimerState.Started;
ticksAtStart = CurrentTicks;
}

public void Stop() {
intervalTicks = CurrentTicks - ticksAtStart;
state = TimerState.Stopped;
}

public float GetSeconds() {
if (state != TimerState.Stopped)
throw new TimerNotStoppedException();

return (float)intervalTicks/(float)frequency;
}

public float GetRealTimeSeconds() {
if (state == TimerState.Started) {
intervalTicks = CurrentTicks - ticksAtStart;
return (float)intervalTicks/(float)frequency;
}

return GetSeconds();
}

public void Reset() {
ticksAtStart = 0;
intervalTicks = 0;
}

public int GetMilliSeconds() {
if (state != TimerState.Stopped)
throw new TimerNotStoppedException();

return (int)Math.Round((decimal)(GetSeconds()*1000), 0);
}


public long GetTicks() {
if (state != TimerState.Stopped)
throw new TimerNotStoppedException();

return intervalTicks;
}

private long CurrentTicks {
get {
long ticks;
QueryPerformanceCounter(out ticks);
return ticks;
}
}

public override string ToString() {
if (state != TimerState.Stopped)
return "Interval timer, state: " + state.ToString();

return String.Format(formatString, GetSeconds(),
GetMilliSeconds());
}
}



public class TimerNotStoppedException : ApplicationException {

public TimerNotStoppedException()
: base("Timer is either still running or has not been started") {
}
}
}
 
Back
Top