std::for_each

  • Thread starter Thread starter sashan
  • Start date Start date
S

sashan

Sometimes I write code using std::for_each. Sometimes I use it on
several container classes that are the same size.

for_each(v.begin(), v.end(), f1());
for_each(u.begin(), u.end(), f2());

This seems inefficient so I rewrite it as:

int n = v.size();
for (int i = 0; i < n; ++i)
{
f1(v);
f2(u);
}

Is there some mechanism that lets me say the following:

for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2()));

Thanks
 
sashan said:
Sometimes I write code using std::for_each. Sometimes I use it on
several container classes that are the same size.

for_each(v.begin(), v.end(), f1());
for_each(u.begin(), u.end(), f2());

This seems inefficient so I rewrite it as:

int n = v.size();
for (int i = 0; i < n; ++i)
{
f1(v);
f2(u);
}


What makes you think this is any more efficient? If I had to guess, I would
bet that your second version runs *slower* due to the fact you are reducing
memory locality jumping back and forth between the different arrays and
functions. Also, by indexing into the array each time instead of using the
iterators which can simply be incremented, you are making it more difficult
for the compiler to optimize your sequential access (if u and v are less
trivial than a vector, like perhaps a deque, you are making it significantly
more difficult). Current compilers should be able to optimize away the
indexing, but you certainly aren't doing the compiler any favors.

Ken
 
"seems inefficient"...do you have proof that it actually *is* inefficient
for your particular case?

Not only have you turned two lines of code into 4+, but you've replaced the
well-known std::for_each idiom with a custom for() loop that everybody
looking at this code will have to take the time to inspect.

Dan
 
Ken said:
Sometimes I write code using std::for_each. Sometimes I use it on
several container classes that are the same size.

for_each(v.begin(), v.end(), f1());
for_each(u.begin(), u.end(), f2());

This seems inefficient so I rewrite it as:

int n = v.size();
for (int i = 0; i < n; ++i)
{
f1(v);
f2(u);
}


What makes you think this is any more efficient?

Less comparisons and less increments, I guess.

I've attached a program that times the operations using the above
methods. In the program method 1 uses the for_each algorithm, method 2
is a for loop like above and method 3 is a for loop using iterators.
Method 3 is the sort of thing I was aiming for with my initial question.
The program tests method 1 and 2 and 3 on std::vector and method 1 and 3
on std::list (std::list doesn't support random access so method 2
doesn't work).

Please check it and tell me if anything's wrong (maybe I'm using
QueryPerformance counter wrong or something like that)

In summary, when in release build, method 1 is the slowest, method 2 and
3 are the fastest for std::vector. For std::list method 3 is the
fastest. Compiled using vc7.1.


#include <iostream>
//#include <iomanip.h>
#include <math.h>
#include <windows.h>
#include <vector>
#include <functional>
#include <algorithm>

using namespace std;

struct Sqr :
unary_function<int,int>
{
void operator()(argument_type& i){i *= i;}
};

int vector_timing()
{
cout << "vector timing" << endl;
LARGE_INTEGER Freq, Start, End;
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
BOOL Valid = QueryPerformanceFrequency(&Freq);
if (!Valid)
return -1;

//method 1
vector<int> u0(1000,3),v0(1000,3),w0(1000,3),x0(1000,3);
QueryPerformanceCounter(&Start);
for_each(u0.begin(), u0.end(), Sqr());
for_each(v0.begin(), v0.end(), Sqr());
for_each(w0.begin(), w0.end(), Sqr());
for_each(x0.begin(), x0.end(), Sqr());
QueryPerformanceCounter(&End);
std::cout << "Method 1 " << End.QuadPart - Start.QuadPart << endl;

//method 2
vector<int> u1(1000,3),v1(1000,3),w1(1000,3),x1(1000,3);
QueryPerformanceCounter(&Start);
vector<int>::size_type n = u1.size();
for (int i = 0; i < n; ++i)
{
Sqr()(u1);
Sqr()(v1);
Sqr()(w1);
Sqr()(x1);
}
QueryPerformanceCounter(&End);
std::cout << "Method 2 " << End.QuadPart - Start.QuadPart << endl;

//method 3
vector<int> u2(1000,3),v2(1000,3),w2(1000,3),x2(1000,3);
vector<int>::iterator itu2, itv2, itw2, itx2;
QueryPerformanceCounter(&Start);
for (itu2 = u2.begin(), itv2 = v2.begin(), itw2 = w2.begin(), itx2 = x2.begin();
itu2 != u2.end();
++itu2, ++itv2, ++itw2, ++itx2)
{
Sqr()(*itu2);
Sqr()(*itv2);
Sqr()(*itw2);
Sqr()(*itx2);
}
QueryPerformanceCounter(&End);
std::cout << "Method 3 " << End.QuadPart - Start.QuadPart << endl;
}

int list_timing()
{
LARGE_INTEGER Freq, Start, End;
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
BOOL Valid = QueryPerformanceFrequency(&Freq);
if (!Valid)
return -1;

cout << "list timing" << endl;
//method 1
vector<int> u0(1000,3),v0(1000,3),w0(1000,3),x0(1000,3);
QueryPerformanceCounter(&Start);
for_each(u0.begin(), u0.end(), Sqr());
for_each(v0.begin(), v0.end(), Sqr());
for_each(w0.begin(), w0.end(), Sqr());
for_each(x0.begin(), x0.end(), Sqr());
QueryPerformanceCounter(&End);
std::cout << "Method 1 " << End.QuadPart - Start.QuadPart << endl;

//method 3
vector<int> u2(1000,3),v2(1000,3),w2(1000,3),x2(1000,3);
vector<int>::iterator itu2, itv2, itw2, itx2;
QueryPerformanceCounter(&Start);
for (itu2 = u2.begin(), itv2 = v2.begin(), itw2 = w2.begin(), itx2 = x2.begin();
itu2 != u2.end();
++itu2, ++itv2, ++itw2, ++itx2)
{
Sqr()(*itu2);
Sqr()(*itv2);
Sqr()(*itw2);
Sqr()(*itx2);
}
QueryPerformanceCounter(&End);
std::cout << "Method 3 " << End.QuadPart - Start.QuadPart << endl;
}

int main()
{
vector_timing();
list_timing();
return 0;
}
 
sashan said:
Ken said:
Sometimes I write code using std::for_each. Sometimes I use it on
several container classes that are the same size.

for_each(v.begin(), v.end(), f1());
for_each(u.begin(), u.end(), f2());

This seems inefficient so I rewrite it as:

int n = v.size();
for (int i = 0; i < n; ++i)
{
f1(v);
f2(u);
}


What makes you think this is any more efficient?

Less comparisons and less increments, I guess.

I've attached a program that times the operations using the above
methods. In the program method 1 uses the for_each algorithm, method 2
is a for loop like above and method 3 is a for loop using iterators.
Method 3 is the sort of thing I was aiming for with my initial question.
The program tests method 1 and 2 and 3 on std::vector and method 1 and 3
on std::list (std::list doesn't support random access so method 2
doesn't work).

Please check it and tell me if anything's wrong (maybe I'm using
QueryPerformance counter wrong or something like that)


What's wrong is that with only 1000 iterations nothing really matters,
because just about anything is fast enough.
In summary, when in release build, method 1 is the slowest, method 2 and
3 are the fastest for std::vector. For std::list method 3 is the
fastest. Compiled using vc7.1.

When I increase it to a million iterations, I get this result:
vector timing
Method 1 70041
Method 2 382012
Method 3 450109
list timing
Method 1 69126
Method 3 399203



Here for_each is *much* faster.

Bo Persson
 
There was a silly mistake in the previous post - the list_timing function used vector as it's container class. I've posted an ammended version that corrects the mistake. It also allows you to change the size of the container and the number of times the various methods are performed. I also change the function object to add one to the given argument instead of squaring.

Here are some results (these are all based on release builds):
For a container size of 1000000 repeating each method 1 time:
vector timing
Method 1 166452
Method 2 431866
Method 3 372398
list timing
Method 1 756946
Method 3 674451

For a container size of 1000000 repeating each method 100 time:
vector timing
Method 1 12679666
Method 2 36049916
Method 3 36056914
list timing
Method 1 81043026
Method 3 70404243

For a container size of 1000 repeating each method 100 time:
vector timing
Method 1 2284
Method 2 1095
Method 3 1308
list timing
Method 1 9081
Method 3 7118

For a container size of 1000 repeating each method 1 time:
vector timing
Method 1 25
Method 2 17
Method 3 18
list timing
Method 1 96
Method 3 62

For the case of the std::list Method 3 is fastest. In the case of std::vector method 1 (the for_each method) is faster when the container size is sufficiently large (in this case 1 000 000). How does one know when a container is sufficiently large? Dunno.




#include <iostream>
//#include <iomanip.h>
#include <math.h>
#include <windows.h>
#include <vector>
#include <functional>
#include <algorithm>
#include <list>

using namespace std;

struct AddOne :
unary_function<int,void>
{
void operator()(argument_type& i);
};

void AddOne::operator() (argument_type& i)
{
i += 1;
}

int vector_timing(long Size, long RepCount)
{
cout << "vector timing" << endl;
LARGE_INTEGER Freq, Start, End;
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
BOOL Valid = QueryPerformanceFrequency(&Freq);
if (!Valid)
return -1;

//method 1
vector<int> u0(Size,3),v0(Size,3),w0(Size,3),x0(Size,3);
QueryPerformanceCounter(&Start);
for (int i = 0; i < RepCount; ++i)
{
for_each(u0.begin(), u0.end(), AddOne());
for_each(v0.begin(), v0.end(), AddOne());
for_each(w0.begin(), w0.end(), AddOne());
for_each(x0.begin(), x0.end(), AddOne());
}
QueryPerformanceCounter(&End);
std::cout << "Method 1 " << End.QuadPart - Start.QuadPart << endl;

//method 2
vector<int> u1(Size,3),v1(Size,3),w1(Size,3),x1(Size,3);
QueryPerformanceCounter(&Start);
vector<int>::size_type n = u1.size();
for (int i = 0; i < RepCount; ++i)
{
for (int i = 0; i < n; ++i)
{
AddOne()(u1);
AddOne()(v1);
AddOne()(w1);
AddOne()(x1);
}
}
QueryPerformanceCounter(&End);
std::cout << "Method 2 " << End.QuadPart - Start.QuadPart << endl;

//method 3
vector<int> u2(Size,3),v2(Size,3),w2(Size,3),x2(Size,3);
vector<int>::iterator itu2, itv2, itw2, itx2;
QueryPerformanceCounter(&Start);
for (int i = 0; i < RepCount; ++i)
{
for (itu2 = u2.begin(), itv2 = v2.begin(), itw2 = w2.begin(), itx2 = x2.begin();
itu2 != u2.end();
++itu2, ++itv2, ++itw2, ++itx2)
{
AddOne()(*itu2);
AddOne()(*itv2);
AddOne()(*itw2);
AddOne()(*itx2);
}
}
QueryPerformanceCounter(&End);
std::cout << "Method 3 " << End.QuadPart - Start.QuadPart << endl;
return 0;
}

int list_timing(long Size, long RepCount)
{
LARGE_INTEGER Freq, Start, End;
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
ZeroMemory(&Freq, sizeof(LARGE_INTEGER));
BOOL Valid = QueryPerformanceFrequency(&Freq);
if (!Valid)
return -1;

cout << "list timing" << endl;
//method 1
list<int> u0(Size,3),v0(Size,3),w0(Size,3),x0(Size,3);
QueryPerformanceCounter(&Start);
for (int i = 0; i < RepCount; ++i)
{
for_each(u0.begin(), u0.end(), AddOne());
for_each(v0.begin(), v0.end(), AddOne());
for_each(w0.begin(), w0.end(), AddOne());
for_each(x0.begin(), x0.end(), AddOne());
}
QueryPerformanceCounter(&End);
std::cout << "Method 1 " << End.QuadPart - Start.QuadPart << endl;

//method 3
list<int> u2(Size,3),v2(Size,3),w2(Size,3),x2(Size,3);
list<int>::iterator itu2, itv2, itw2, itx2;
QueryPerformanceCounter(&Start);
for (int i = 0; i < RepCount; ++i)
{
for (itu2 = u2.begin(), itv2 = v2.begin(), itw2 = w2.begin(), itx2 = x2.begin();
itu2 != u2.end();
++itu2, ++itv2, ++itw2, ++itx2)
{
AddOne()(*itu2);
AddOne()(*itv2);
AddOne()(*itw2);
AddOne()(*itx2);
}
}
QueryPerformanceCounter(&End);
std::cout << "Method 3 " << End.QuadPart - Start.QuadPart << endl;
return 0;
}

int main()
{
vector_timing(1000,1);
list_timing(1000,1);
return 0;
}
 
When I increase it to a million iterations, I get this result:
vector timing
Method 1 70041
Method 2 382012
Method 3 450109
list timing
Method 1 69126
Method 3 399203



Here for_each is *much* faster.
Sorry there was a mistake in the code I originally posted. The function
list_timing used std::vector as its container when it was meant to use
std::list. I've corrected the mistake and posted it as a reply to my
original code post.
 
Dan said:
"seems inefficient"...do you have proof that it actually *is* inefficient
for your particular case?

It seemed inefficient because more comparisons and increments happen in
this case

for_each(v.begin(), v.end(), f1());
for_each(u.begin(), u.end(), f2());
 
sashan said:
For the case of the std::list Method 3 is fastest. In
the case of std::vector method 1 (the for_each method)
is faster when the container size is sufficiently large (in
this case 1 000 000). How does one know when a
container is sufficiently large? Dunno.

I suspect that the reason method 3 is faster than method 1 for list containers
is because you are only checking for the end of one container, not all four.
This is a valid optimization if you know that the lists are all the same
length. The difference is pretty minimal though, especially if the operation
is less trivial than adding one, so unless it was highly speed critical I
would still favor the for_each method.

I suspect the reason method 1 is slower with small containers is because it is
method 1. That is, method 1 is suffering all the cache misses, while the
other two methods benefit from the data being already cached from method 1's
operation. With larger data sets, the early part of the data set is pushed
out of the cache by the later data before the second method runs, so it is has
to deal with the same cache misses that the first one did.

I'll bet if you run "method 1" second, it will perform better with the small
containers.

Ken
 
I suspect that the reason method 3 is faster than method 1 for list containers
is because you are only checking for the end of one container, not all four.
This is a valid optimization if you know that the lists are all the same
length. The difference is pretty minimal though, especially if the operation
is less trivial than adding one, so unless it was highly speed critical I
would still favor the for_each method.

Would you use the following if it were available?
for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2()));
I suspect the reason method 1 is slower with small containers is because it is
method 1. That is, method 1 is suffering all the cache misses, while the
other two methods benefit from the data being already cached from method 1's
operation. With larger data sets, the early part of the data set is pushed
out of the cache by the later data before the second method runs, so it is has
to deal with the same cache misses that the first one did.

I'll bet if you run "method 1" second, it will perform better with the small
containers.

That sounded like a reasonable sort of argument so I tried it and got
these results for 1000 vectors

vector timing
Method 2 16
Method 3 17
Method 1 28
list timing
Method 3 67
Method 1 96

Method 1 (the for_each method) is still slower.
 
sashan said:
Would you use the following if it were available?
for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2()));

I wouldn't be averse to it, though the syntax as you write it couldn't work
without changes to the language. Perhaps if there was a form of for_each that
took pair<Iterator,Iterator>'s.

You can sort of fake it out with:

mismatch(v.begin(),v.end(),u.begin(),function_runner());

where function_runner does:

bool operator()(LHS_t lhs, RHS_t rhs) {
f1(lhs);
f2(rhs);
return true;
}

Of course, I'm not sure that's any clearer than writing it out or creating
your own for_each2 function.
That sounded like a reasonable sort of argument so I tried it and got
these results for 1000 vectors

vector timing
Method 2 16
Method 3 17
Method 1 28

Very odd. If it wasn't cache misses, I'd be very curious to know why the
smaller arrays perform so differently than the larger ones.

Ken
 
sashan said:
Would you use the following if it were available?
for_each((v.begin(), u.begin()), (v.end(), u.end()), (f1(), f2()));

BTW, if you want to suggest language/library features like this, a good place
to bring it up would be comp.std.c++.

Ken
 
I think might be possible to get pretty close to this already, say something along the lines of:

typedef std::vector<int> IntVector_t;
#include <iterator>
struct TwoIntVectorsAsOne
{
struct function_wrapper
{
function_wrapper(Sqr& uf) : m_uf(uf) {}
void operator()(const TwoIntVectorsAsOne& tao);
private:
Sqr& m_uf;
};

struct iterator : public std::iterator<std::forward_iterator_tag, TwoIntVectorsAsOne>
{
iterator() {}

TwoIntVectorsAsOne& operator*() const;
iterator& operator++();
friend bool operator==(const iterator& x, const iterator& y);
friend bool operator!=(const iterator& x, const iterator& y) {
return (!(x == y));
}
};

iterator begin();
iterator end();

TwoIntVectorsAsOne(const IntVector_t& iv1, const IntVector_t& iv2) : m_iv1(iv1), m_iv2(iv2) {}

private:
const IntVector_t& m_iv1;
const IntVector_t& m_iv2;
};

vector<int> u3(10,3),v3(10,3);
TwoIntVectorsAsOne two_as_one(u3, v3);
Sqr sqr;
TwoIntVectorsAsOne::function_wrapper fw(sqr);
for_each(two_as_one.begin(), two_as_one.end(), fw);

Obviously, this is FAR from complete, but it does illustrate one possible approach using the existing language.

Dan
 
Back
Top