Interesting std::replace_if problem

  • Thread starter Thread starter Andrew Maclean
  • Start date Start date
A

Andrew Maclean

I guess this problem can be distilled down to: How do I search through a
string, find the first matching substring, replace it, and continue through
the string doing this. Can replace_if() be used to do this?

Here is a concrete example:
If I have a string with sequences of CRLF and possibly just CR's, is there a
simple way of replacing the CRLF characters with a LF?

If you look at the function below is it possible to use a replace_if to do
this?


void CGeneralMessageDlg::ToLF( std::string & s)
{
char CR = '\r';
char LF = '\n';
std::string CRLF = "\r\n";
// Problem: how can I replace multiple CRLF's in a string with just one LF?
// This is my naive attempt, which doesn't work:
//
std::replace_if(s.begin(),s.end(),std::bind2nd(std::equal_to<std::string>(),
CRLF),LF);

// Replace any CR's with a LF.

std::replace_if(s.begin(),s.end(),std::bind2nd(std::equal_to<char>(),CR),LF)
;
}


Any comments/help would be appreciated.

Thankyou
Andrew



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
I guess this problem can be distilled down to: How do I search through a
string, find the first matching substring, replace it, and continue through
the string doing this. Can replace_if() be used to do this?

Here is a concrete example:
If I have a string with sequences of CRLF and possibly just CR's, is there a
simple way of replacing the CRLF characters with a LF?

I guess all examples using the STL would result in an O(N^2) algorithm, so
you would be better using an algorthm that can do it for you in linear
time. Something like this:


#include <iostream>
#include <string>
#include <vector>

using namespace std;


int main ()
{
string s = "I want to replace all occurancess oof a doubble cc with a sinngle cc x ccharacter. Is "
"that ook witcch you???";

cout<<s<<endl;

//Simple 2 pass procedure that does that in O(N) time.
std::vector<int> pos; //Record the position of the double c's to remove.

string::iterator i = s.begin();
//Assume that the string is non-empty.

//Recording loop. We are sure that 3 c's can never be together?
while (!s.empty() && i != s.end()-1)
{
if (*i == 'c' and *(i+1) == 'c')
pos.push_back (i-s.begin()), *i = 'x';
++i;
}

//Removing loop.
std::vector<int>::iterator iter = pos.begin();
while (iter != pos.end())
{
//Move all characters between *iter and *(iter+1) left by
//(iter-pos.begin()+1) positions.
int Offset = (iter-pos.begin()+1);
int Start = *iter + 2;
int End = (iter+1 == pos.end()?s.size():*(iter+1) + 2);

while (Start != End)
s[Start-Offset] = s[Start], ++Start;
++iter;
}

s.erase (s.end()-pos.size(), s.end());
cout<<s<<endl;
}


It's fun writing programs like these! They are quite interesting for the
perceived simplicity at first sight!


Regards,
-Dhruv.




[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
I guess this problem can be distilled down to: How do I search through
a string, find the first matching substring, replace it, and continue
through the string doing this. Can replace_if() be used to do this?

No. The replace_if function can only replace single elements.
Here is a concrete example:
If I have a string with sequences of CRLF and possibly just CR's, is
there a simple way of replacing the CRLF characters with a LF?
If you look at the function below is it possible to use a replace_if
to do this?

No.

I don't think that there is anything in the standard library which will
really help. I'd just write a simple loop, and do it by hand:

void
GeneralMessageDlg::toLF( std::string& s )
{
std::string result ;
size_t end = s.size() ;
size_t current = 0 ;
while ( current < end ) {
switch ( s[ current ] ) {
case '\n' :
case '\r' :
result += '\n' ;
current = s.find_first_not_of( "\r\n", current ) ;
break ;

default :
result += s[ current ] ;
++ current ;
break ;
}
}
s.swap( result ) ;
}

IMHO, it is a lot simpler to generate to a new string, rather than
modifying in place.

I think that the Boost regex classes have something that will do exactly
what you want. I'd verify, but I'm stuck with a compiler too old to
handle Boost. But something like:

boost::regex_replace( std::back_inserter( result ),
s.begin(),
s.end(),
boost::regex( "[\\n\\r]+" ),
"\n" ) ;

should do the trick, I think. Maybe someone who can actually use the
library can verify that I've got it right. And if you are doing this a
lot, you might want to declare the regular expression a static object,
so that it will only be constructed once. (It's probably premature
optimization and just guessing on my part, but I presume that
construction of a regular expression is a fairly expensive operation.)

--
James Kanze GABI Software mailto:[email protected]
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
Andrew Maclean said:
I guess this problem can be distilled down to: How do I search through a
string, find the first matching substring, replace it, and continue through
the string doing this. Can replace_if() be used to do this?

No it can not, since it only replace a single element.
Here is a concrete example:
If I have a string with sequences of CRLF and possibly just CR's, is there a
simple way of replacing the CRLF characters with a LF?

If you know the file contains only CRLF you can simply do
last = std::remove(first, last, '\r');
That will leave the \n's.

Alternatively use this function:

template <typename _ForwardIter>
_ForwardIter replace_crlf (_ForwardIter first, _ForwardIter last)
{
_ForwardIter res = first;
while(first != last)
{
static const char crlf[] = { '\r', '\n' };
_ForwardIter eol = std::search(first, last, crlf + 0, crlf + 2);
res = std::copy(first, eol, res);
if(eol == last)
break;
*res++ = '\n';
first = eol;
std::advance(first, 2);
}
return res;
}

A similar result could be achieved with std::unique with a custom
binary predicate function, but degenerate cases can be constructed
which will make it do the "wrong" thing, i.e. collapsing a "bad"
sequence like "\r\n\n\n" into only one character.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
I guess this problem can be distilled down to: How do I search through a
string, find the first matching substring, replace it, and continue through
the string doing this. Can replace_if() be used to do this?

Not in general. In this case, your substring can be reduced to a single
char and bending the rules allows a solution. A predicate object is not
allowed to have state because the original may be copied more than once.
It is also not allowed to have side effects for no good general reason.
It must not change the sequence being processed for obvious reasons, but
other side effects are harmless.

The objective is to replace "\r\n" and '\r' with '\n'. What we need is
a predicate with memory of previous character. We can remove the '\n'
from the "\r\n" on pass one and replace all '\r' with '\n' on pass two.
To make the predicate usable, we contain a pointer to an external char
for the memory.

struct Pred {
char* pred;
Pred (char& pred) : pred(&pred) {
}
bool operator () (char cur) const {
bool result(cur == '\n' && *pred == '\r');
*pred = cur;
return result;
}
};

Now we can use that to remove_if some newlines and then use replace to
change the returns to newlines.

int main () {
string s("This is\r\na two line thing\rwith a third line\n");
char pred('\n'); // Anything other than '\r'
s.erase(remove_if(s.begin(), s.end(), Pred(pred)), s.end());
assert(s == "This is\ra two line thing\rwith a third line\n");
replace(s.begin(), s.end(), '\r', '\n');
assert(s == "This is\na two line thing\nwith a third line\n");
}

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
Andrew Maclean said:
I guess this problem can be distilled down to: How do I search through a
string, find the first matching substring, replace it, and continue through
the string doing this. Can replace_if() be used to do this?

Given iterators, first and last, a predicate pred and a replace value,
new_value std::replace_if() performs the following:

For each iterator i in the range [first,last) if pred(*i) is true then
the assignment *i = new_value is performed.

Now the type of *i, and new_value is 'char' in your case. So you can
only replace a single characther by another in the string. Thus you
cannot replace a substring by another using replace_if(). To do this
it would be better to use the std::string member functions as follows:

const char* crlf = "\r\n";
std::string s("asb\r\nasdf\r\n\r\nasd");
int index = s.find( crlf )
while (index != std::string::npos) {
s.replace( index, 2, "\r" );
index = s.find( crlf, index );
}
Here is a concrete example:
If I have a string with sequences of CRLF and possibly just CR's, is there a
simple way of replacing the CRLF characters with a LF?
In this specific case, you can use std::remove_if() to remove all CR's

std::string crlf("\r\n");
std::remove_if( crlf.begin(), crlf.end(),
std::bind2nd( std::equal_to<char>(), '\r' ) );


HTH
~ aniket

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
Andrew Maclean said:
I guess this problem can be distilled down to: How do I search through a
string, find the first matching substring, replace it, and continue through
the string doing this. Can replace_if() be used to do this?
<snip>

I am of the opinion that a regular expression library is the most
correct way to do this. foreach with a lookahead and a functor-stored
result can also get you there in O(n) time if memory isn't a prob.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
I don't think that there is anything in the standard library which will
really help. I'd just write a simple loop, and do it by hand:

/* Make it a free function for ease of testing
GeneralMessageDlg:: */

toLF( std::string& s )
{
std::string result ;
size_t end = s.size() ;
size_t current = 0 ;
while ( current < end ) {
switch ( s[ current ] ) {
case '\n' :
case '\r' :
result += '\n' ;
current = s.find_first_not_of( "\r\n", current ) ;
break ;

default :
result += s[ current ] ;
++ current ;
break ;
}
}
s.swap( result ) ;
}

int main () {
std::string s("\n\n\n\n\n\n\n\n\n"
"Where did my top margin go?\n\n\n");
toLF(s);
std::cout << s;
}

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
No. The replace_if function can only replace single elements.



No.

I don't think that there is anything in the standard library which will
really help. I'd just write a simple loop, and do it by hand:

How about this horribly inefficient algorithm that uses the STL. I can not
see anything majorly wrong with this?

#include <iostream>
#include <string>
#include <vector>

using namespace std;


int main ()
{
string s = "I want to replace all occurancess oof a doubble cc with a sinngle cc x ccharacter. Is "
"that ook witcch you???";

cout<<s<<endl;

string::size_type pos = 0;

pos = s.find ("cc", pos);

while (pos != string::npos)
//I don't know why s::npos does not work? Any ideas anyone? Maybe there is
//something wrong with the way I have understood what npos is?
{
s.replace (pos, 2, 1, 'x');
pos = s.find ("cc", pos+1);
}
cout<<s<<endl;
}



--
Regards,
-Dhruv.

Proud to be a Vegetarian.
http://www.vegetarianstarterkit.com/
http://www.vegkids.com/vegkids/index.html


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
while (pos != string::npos)
//I don't know why s::npos does not work? Any ideas anyone? Maybe there is
//something wrong with the way I have understood what npos is?

It is a static size_type const data member of string.

assert(string::npos == s.npos);

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
while (pos != string::npos)
//I don't know why s::npos does not work? Any ideas anyone? Maybe there is
//something wrong with the way I have understood what npos is?

Oops, now I know why it does not work. Because I got to write s.npos, not
s::npos. Silly me!
Sorry for the stupid question ;-)



--
Regards,
-Dhruv.

Proud to be a Vegetarian.
http://www.vegetarianstarterkit.com/
http://www.vegkids.com/vegkids/index.html


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
John Potter said:
On 18 Feb 2004 16:21:01 -0500, (e-mail address removed) wrote:
/* Make it a free function for ease of testing
GeneralMessageDlg:: */
toLF( std::string& s )
{
std::string result ;
size_t end = s.size() ;
size_t current = 0 ;
while ( current < end ) {
switch ( s[ current ] ) {
case '\n' :
case '\r' :
result += '\n' ;
current = s.find_first_not_of( "\r\n", current ) ;
break ;
default :
result += s[ current ] ;
++ current ;
break ;
}
}
s.swap( result ) ;
}
int main () {
std::string s("\n\n\n\n\n\n\n\n\n"
"Where did my top margin go?\n\n\n");
toLF(s);
std::cout << s;
}

That's what he asked for at one point. (At another point, he asked for
the classical dos2unix. It's very difficult to write a function which
is correct for two conflicted requirements.)

--
James Kanze GABI Software mailto:[email protected]
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
John Potter said:
On 18 Feb 2004 16:21:01 -0500, (e-mail address removed) wrote:
toLF( std::string& s )
{
std::string result ;
size_t end = s.size() ;
size_t current = 0 ;
while ( current < end ) {
switch ( s[ current ] ) {
case '\n' :
case '\r' :
result += '\n' ;
current = s.find_first_not_of( "\r\n", current ) ;

This is the point in question.
break ;
default :
result += s[ current ] ;
++ current ;
break ;
}
}
s.swap( result ) ;
}
int main () {
std::string s("\n\n\n\n\n\n\n\n\n"
"Where did my top margin go?\n\n\n");
toLF(s);
std::cout << s;
}
That's what he asked for at one point. (At another point, he asked for
the classical dos2unix. It's very difficult to write a function which
is correct for two conflicted requirements.)

Yes, the set of responses indicates that the problem statement was less
than clear. I doubt that what you wrote is what you intended. If you
replace each of my /n in the above with /r/n, there is no change in the
result. Your code is equivalent to s/[\r\n]+/\n/. It reduces any
sequence of CR and NL characters to a single NL.

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
John Potter said:
On 23 Feb 2004 13:56:28 -0500, (e-mail address removed) wrote:
John Potter said:
On 18 Feb 2004 16:21:01 -0500, (e-mail address removed) wrote:
toLF( std::string& s )
{
std::string result ;
size_t end = s.size() ;
size_t current = 0 ;
while ( current < end ) {
switch ( s[ current ] ) {
case '\n' :
case '\r' :
result += '\n' ;
current = s.find_first_not_of( "\r\n", current ) ;
This is the point in question.

That line is intentional. I know what it does.
break ;
default :
result += s[ current ] ;
++ current ;
break ;
}
}
s.swap( result ) ;
}
int main () {
std::string s("\n\n\n\n\n\n\n\n\n"
"Where did my top margin go?\n\n\n");
toLF(s);
std::cout << s;
}
That's what he asked for at one point. (At another point, he asked
for the classical dos2unix. It's very difficult to write a function
which is correct for two conflicted requirements.)
Yes, the set of responses indicates that the problem statement was
less than clear. I doubt that what you wrote is what you intended.

What I wrote was intended to meet the statement I had before my eyes in
the original code:

// Problem: how can I replace multiple CRLF's in a string with just one LF?

I interpret replacing multiple CRLF's with just one LF as meaning that a
sequence of CR's and LF's, regardless of its length, would be replaced
by a single LF. Having reread the original posting, and seen the other
responses, I think he probably meant each sequence CRLF would be
replaced by a single LF, multiple times. Or something like that. But
that's not the problem I tried to solve.
If you replace each of my /n in the above with /r/n, there is no
change in the result. Your code is equivalent to s/[\r\n]+/\n/. It
reduces any sequence of CR and NL characters to a single NL.

That's a simplification of his request. It replaces a multiple sequence
of CRLF's with a single LF. It also replaces a single LF with an LF;
while he didn't ask for it, that shouldn't cause a problem. It also
replaces multiple sequences of LFCR's with a single LF, or multiple
sequences of LF's with a single LF -- he didn't ask for either, but it
is hard to imagine where the asked for replacement would make sense, and
these replacements not.

I suspect that I did solve the wrong problem. In real life, of course,
I'd start be defining the problem more exactly: is it only CRLF which
needs replacing? What about LFCR? CR alone (from a Mac)? Perhaps I
want options to control it. As for the implementation, I tend to use
simple state machines for such jobs. Such as boost::regex (which isn't
really very simple, but it is a state machine that is already written,
so it saves me some work).

--
James Kanze GABI Software mailto:[email protected]
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
What I wrote was intended to meet the statement I had before my eyes in
the original code:
// Problem: how can I replace multiple CRLF's in a string with just one LF?
I interpret replacing multiple CRLF's with just one LF as meaning that a
sequence of CR's and LF's, regardless of its length, would be replaced
by a single LF.

Now I see. Thanks for the details. I read his code and interpreted his
prose accordingly.

A line of code is worth 1000 words.

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 
Back
Top