How to replace multi-spaces within a string with single-space

J

José Joye

Hello,

I was wondering if there is a method that exists to replace multi-spaces
within a string with single-space.
eg:
"12 3 4 56" --> "12 3 4 56"

I think this could be done by looking at each char within a loop and copying
the char to a stringBuilder instance
if current and previous char are not spaces...
But as always, I would prefer to use an existing method ;-)

Thanks,
José
 
J

Jon Skeet

José Joye said:
Yes, in my case, efficiency is a topic. However, I agree that my strings are
quite small and the number of multi-spaces should not be to many.

I was wondering if the Regex solution is slower than the other solutions. If
yes, do you know how much slower?

I really don't know. Could you post a sample selection of strings
(including whatever proportion would have no multi-spaces at all)? If
so, I can benchmark a few ways of doing it...
 
J

Jon Skeet

José Joye said:
In fact, my strings are OCR-B lines read from Bank/Post Slips.
Each line should contains at most 80 chars. I have removed the Heading and
leading spaces with the Trim()
method.

So this can be some samples:
"0100 000187004>221 74101 02080003 95200208060+ 010184 473>"
"01 00000179008>0 00050175 7500100007054 24008+ 0103 97904>"
"01 0000006630 3>104922 351100079647820 000008+ 010194507>"
"0 00000000000 000111033122108+ 077782103 >"
"5 00002700>"
"0100000241504>113730619000003472360720026+ 010231043>"

Righto.

Running the code at the bottom, here are the results I got:

Benchmarking type MultiSpace
Run #1
RegexReplace 00:00:51.1034832
RegexReplaceWithTest 00:00:45.9443136
CompiledRegexReplaceWithTest 00:00:15.3420608
StringReplace 00:00:06.0687264
StringBuilderSingleChar 00:00:03.6252128
StringBuilderBlock 00:00:02.1831392
Run #2
RegexReplace 00:00:51.0333824
RegexReplaceWithTest 00:00:45.9260384
CompiledRegexReplaceWithTest 00:00:15.0316144
StringReplace 00:00:06.0386832
StringBuilderSingleChar 00:00:03.6652704
StringBuilderBlock 00:00:02.1330672

It looks like the StringBuilderBlock method is the best by a reasonably
significant margin. The code for that on its own would be:

public static void FlattenSpaces (string x)
{
if (x.IndexOf (" ")==-1)
return x;

StringBuilder builder = new StringBuilder(x.Length);

int start=0;
while (true)
{
int nextDoubleSpace = x.IndexOf (" ", start);
if (nextDoubleSpace==-1)
break;
builder.Append (x, start, nextDoubleSpace+1-start);
start = nextDoubleSpace+2;
while (start < x.Length && x[start]==' ')
start++;
}
builder.Append (x, start, x.Length-start);
return builder.ToString();
}


Benchmark code (run with -runtwice on my box):
// See http://www.pobox.com/~skeet/csharp/benchmark.html
// for how to run this code.

using System;
using System.Text;
using System.Text.RegularExpressions;

public class MultiSpace
{
static readonly string[] TestCases =
{
"0100 000187004>221 74101 02080003 95200208060+ "+
" 010184 473>",
"01 00000179008>0 00050175 7500100007054 24008+ "+
"0103 97904>",
"01 0000006630 3>104922 351100079647820 000008+ 010194507>",
"0 00000000000 000111033122108+ 077782103 >",
"5 00002700>",
"0100000241504>113730619000003472360720026+ 010231043>",
};

static long check;
static int iterations = 100000;

public static void Init(string[] args)
{
if (args.Length != 0)
iterations = Int32.Parse(args[0]);
}

public static void Reset()
{
check=0;
}

public static void Check()
{
if (check != 279*iterations)
throw new Exception ("Invalid check total: "+check);
}

[Benchmark]
public static void RegexReplace()
{
long total=0;

for (int i = iterations; i>0; i--)
{
foreach (string s in TestCases)
{
string x=s;
x = Regex.Replace (x, " +", " ");
total+=x.Length;
}
}
check=total;
}

[Benchmark]
public static void RegexReplaceWithTest()
{
long total=0;

for (int i = iterations; i>0; i--)
{
foreach (string s in TestCases)
{
string x=s;
if (x.IndexOf(" ")!=-1)
x = Regex.Replace (x, " +", " ");
total+=x.Length;
}
}
check=total;
}

static Regex compiledRegex = new Regex (" +",
RegexOptions.Compiled);
[Benchmark]
public static void CompiledRegexReplaceWithTest()
{
long total=0;

for (int i = iterations; i>0; i--)
{
foreach (string s in TestCases)
{
string x=s;
if (x.IndexOf(" ")!=-1)
x = compiledRegex.Replace (x, " ");
total+=x.Length;
}
}
check=total;
}

[Benchmark]
public static void StringReplace()
{
long total=0;

for (int i = iterations; i>0; i--)
{
foreach (string s in TestCases)
{
string x=s;
while (x.IndexOf(" ")!=-1)
x=x.Replace(" ", " ");
total+=x.Length;
}
}
check=total;
}

[Benchmark]
public static void StringBuilderSingleChar()
{
long total=0;

for (int i = iterations; i>0; i--)
{
foreach (string s in TestCases)
{
if (s.IndexOf (" ")==-1)
{
total+=s.Length;
continue;
}

StringBuilder builder = new StringBuilder(s.Length);
bool inSpace=false;
foreach (char c in s)
{
if (c==' ')
{
if (!inSpace)
builder.Append(c);
inSpace=true;
}
else
{
builder.Append(c);
inSpace=false;
}
}
total+=builder.ToString().Length;
}
}
check=total;
}

[Benchmark]
public static void StringBuilderBlock()
{
long total=0;

for (int i = iterations; i>0; i--)
{
foreach (string x in TestCases)
{
if (x.IndexOf (" ")==-1)
{
total+=x.Length;
continue;
}

StringBuilder builder = new StringBuilder(x.Length);

int start=0;
while (true)
{
int nextDoubleSpace = x.IndexOf (" ", start);
if (nextDoubleSpace==-1)
break;
builder.Append (x, start, nextDoubleSpace+1-start);
start = nextDoubleSpace+2;
while (start < x.Length && x[start]==' ')
start++;
}
builder.Append (x, start, x.Length-start);
total+=builder.ToString().Length;
}
}
check=total;
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top