Regular Expressions faster in Java ?..

  • Thread starter Thread starter pawel
  • Start date Start date
P

pawel

I have made some comparision C# to Java RegularExpression. The problem was
to find out if the rule match some text.
Matching were done for precompiled regular expressions, in 100000 iterations
loop. Those loops were executed 11 times and average value of consumend time
was calculated. Below are codes for both classes.
And I found, that Java implementation is 2 to 5 times faster than C# (it
depends on complexity of expression).
Maybe my test were to simple? And Java made some optimisations, that the
code doesnt run (couse it really does nothing usefull)?

--
Pawe³

<<RegMatchTest.java>>
public class RegMatchTest
{

public static void main(String[] args) throws Exception
{
String pat[] = {"a*c?(d|f+)", "g.*c?(r|d+)"};
String word ;
long num = 100000;

File f = new File ("text.txt");

char buff[] = new char[(int)f.length()];
FileReader fr = new FileReader (f);
fr.read(buff);
word = new String (buff);


System.out.println("Testing for "+num+" loops.");
long avgSum = 0 ;
for (int n = 0 ; n <= 10 ; ++n)
{
long t1 = System.currentTimeMillis();
new RegMatchTest().Test1 (pat, word, num);
long t2 = System.currentTimeMillis();
System.out.println("Elapsed time : " + (t2-t1) + " ms");
if (n > 0)
avgSum += t2-t1;
}
System.out.println("\nAverage time : "+ (avgSum/10) +" ms");
}

boolean Test1 (String[] pat, String word, long len)
{
Pattern p[] = {Pattern.compile(pat[0]),
Pattern.compile(pat[1])};

boolean b = false ;
for (int n = 0 ; n < len ; ++n)
{
Matcher m = p[n%2].matcher(word);
b = m.matches();
}
return b ;
}


<<class1.cs>>
class Class1
{
[STAThread]
static void Main(string[] args)
{
string[] pat = {@"a*c?(d|f+)", @"g.*c?(r|d+)"};
string word ;
long num = 100000;

System.IO.StreamReader tr = new System.IO.StreamReader ("text.txt") ;
word = tr.ReadToEnd () ;

Console.WriteLine("Testing for "+num+" loops.");
long avgSum = 0 ;
for (int n = 0 ; n <= 10 ; ++n)
{
DateTime t1 = DateTime.Now;
new Class1().Test1 (pat, word, num);
DateTime t2 = DateTime.Now;
TimeSpan ts = t2 - t1 ;
Console.WriteLine("Elapsed time : " + (ts.TotalMilliseconds) + " ms");
if (n > 0)
avgSum += (long)ts.TotalMilliseconds;
}
Console.WriteLine("\nAverage time : "+ (avgSum/10) +" ms");
}

bool Test1 (string[] pat, String word, long len)
{

Regex[] p = {new Regex (pat[0], RegexOptions.Compiled),
new Regex (pat[1], RegexOptions.Compiled)};
bool b = false ;
for (int n = 0 ; n < len ; ++n)
{
Match m = p[n%2].Match(word);
}
return b ;
}
}
 
I noticed the same in the past. Regex seems to be poorly supported by C#.
They are real slow even when compiled. I heard there are people porting the
boost package to C# but haven't found it yet.

Yves
 
Back
Top