A test about searching in a String I promised Jay B.

  • Thread starter Thread starter Cor
  • Start date Start date
C

Cor

Hallo,

I have promised Jay B yesterday to do some tests.

The subject was a string evaluation that Jon had send in. Jay B was in doubt
what was better because there was a discussion in the C# newsgroup on 25
September. The regular expressions where in that newsgroup too involved.

I told yesterday night, to Jay that I would test all 4 methods and the
stupid method I was thinking of the first time that night when I saw Jon's
code. Herfried said from that, that it would become slow with a string from
10Mb. So I told Herfried that I would make the test program if he would type
in a real mixed string for me. But till now he did not send that. (Or maybe
he has again problems with the newsgroup server)

To Jay B I told also that I would try to test extra the difference between
the Microsoft.visual basic function Instr and the system function indexof.

The methods I tested were:
- a do until method with index of (Jay B)
- a do method with Instr (Jon)
- the same I changed something to indexof (Jon)
Those works all with strings
The others only with characters as a search item
- a stupid method that only needs 2 lines code what became in my thoughts
yesterday night
- a indexof with a char (Jay B)
- a for each with a char (Jay B)
I made a test program that I send with this message beneath.
Because all is dependable on the computer you use, I don't give absolute
figures if you wish you can try it yourself. On my computer it is impossible
to see with any method a difference from more than 1/1000 of a second with a
string shorter than 5000 characters.
With more than that amount of characters that the stupid method I thought of
yesterday becomes visible slower.

When it are real long strings, than with a charsearch was in my test the
fastest method the "do for loop" with "indexof" and second was the for "each
char" both from Jay B.

But for me the most surprising was that the Microsoft.visualbasic function
with "Instr" from Jon was almost 2 times faster with a 3 character string
than the net.system method "indexoff".

I do the test program beneath; it is made with a windowsform project, and
need 2 textboxes from which textbox1 needs to have multi line true, a
button1 and a large label1.

Texbox1 is for the text to make a string with (automaticly made till
Herfried send his in).
Textbox2 the length of the text string to make (Yes you understand it
till....)

I hope this gives some idea's (too about some people who said the
microsoft.visualbasic functions are slow and outdated, This you can of
course not test with C#.)
Cor
\\\
Private Sub Button1_Click(ByVal sender _
As Object, ByVal e As System.EventArgs) Handles Button1.Click
If Me.TextBox1.Text = "" Then
Me.label1.Text = "Enter strings in uper textbox"
Exit Sub
End If
If Not IsNumeric(Me.TextBox2.Text) Then
Me.label1.Text = "Enter value in down textbox"
Exit Sub
End If
TestString.Build(Me.TextBox1.Text.ToString, CInt(Me.TextBox2.Text))
Dim delimiter As String
If TestString.StringToTest.Length > 3 Then
delimiter = TestString.StringToTest.Substring(0, 3)
Else
Exit Sub
End If
Dim i As Integer
Dim labeltext As New System.Text.StringBuilder
Dim count As Integer
Dim testname As String
For i = 1 To 3
Dim StartTick As Integer = Environment.TickCount
Select Case i
Case 1
testname = "Jay B, string "
count = Test1(TestString.StringToTest, delimiter)
Case 2
testname = "Jon, string "
count = Test2(TestString.StringToTest, delimiter)
Case 3
testname = "Jon, but with indexof"
count = Test3(TestString.StringToTest, delimiter)
End Select
Dim Elapsed As Integer = Environment.TickCount - StartTick
labeltext.Append(testname & "count: " & _
count & " Elapsed : " & Elapsed.ToString & vbCrLf)
Next
delimiter = delimiter.Substring(0, 1)
labeltext.Append("characters" & vbCrLf)
For i = 1 To 6
Dim StartTick As Integer = Environment.TickCount
Select Case i
Case 1
testname = "Jay B, string "
count = Test1(TestString.StringToTest, delimiter)
Case 2
testname = "Jon, string "
count = Test2(TestString.StringToTest, delimiter)
Case 3
testname = "Jon, but with indexof "
count = Test3(TestString.StringToTest, delimiter)
Case 4
If TestString.StringToTest.Length < 599999 Then
testname = "Cor stupid "
count = Test4(TestString.StringToTest, delimiter)
Else
testname = "Skipped test"
End If
Case 5
testname = "Jay B char do until "
count = test5(TestString.StringToTest, delimiter)
Case 6
testname = "Jay B char for each "
count = test6(TestString.StringToTest, delimiter)
End Select
Dim Elapsed As Integer = Environment.TickCount - StartTick
labeltext.Append(testname & "count: " & _
count & " Elapsed : " & Elapsed.ToString & vbCrLf)
Next
Me.label1.Text = labeltext.ToString

End Sub
Private Sub Form1_Load(ByVal sender As Object, ByVal _
e As System.EventArgs) Handles MyBase.Load
Me.Button1.Text = "start test"
End Sub
Public Function Test1(ByVal input As String, ByVal delimiter _
As String) As Integer 'Jay B 1(string)
Dim count, index As Integer
index = input.IndexOf(delimiter)
Do Until index < 0
count += 1
index = input.IndexOf(delimiter, index + 1)
Loop
Return count
End Function
Public Function Test2(ByVal strInput As String, ByVal strDelimiter _
As String) As Int32 'Jon (string)
Dim iStart As Int32, iCount As Int32, iResult As Int32
iStart = 1
iCount = 0
Do
iResult = InStr(iStart, strInput, strDelimiter)
If iResult = 0 Then Exit Do
iCount += 1
iStart = iResult + 1
Loop
Return iCount
End Function
Public Function Test3(ByVal input As String, ByVal delimiter _
As String) As Integer 'Jon with indexof(x,x,x)
Dim iStart As Int32, iCount As Int32, iResult As Int32
iStart = 0
iCount = 0
Do
iResult = input.IndexOf(delimiter, iStart)
If iResult = -1 Then Exit Do
iCount += 1
iStart = iResult + 1
Loop
Return iCount
End Function
Public Function Test4(ByVal input As String, ByVal delimiter _
As String) As Integer 'Cor stupid
Dim teststring As String() = Split(input, delimiter)
Return teststring.Length - 1
End Function
Public Function test5(ByVal input As String, ByVal _
delimiter As Char) As Integer 'Jay 1(char)
Dim count, index As Integer
index = input.IndexOf(delimiter)
Do Until index < 0
count += 1
index = input.IndexOf(delimiter, index + 1)
Loop
Return count
End Function
Public Shared Function test6(ByVal input As String, _
ByVal delimiter As Char) As Integer 'JayB 2(char)
Dim count As Integer
For Each ch As Char In input
If ch = delimiter Then
count += 1
End If
Next ch
Return count
End Function
End Class
Public Class TestString
Private Shared mStringTest As String
Public Shared ReadOnly Property StringToTest() As String
Get
Return mStringTest
End Get
End Property
Public Shared Sub Build(ByVal strToTest As String, ByVal x As Integer)
Dim strTest As New System.Text.StringBuilder
Dim strTextbox As String() = Split(strToTest, vbCrLf)
Do While strTest.ToString.Length < x
Dim i As Integer
For i = 0 To strTextbox.Length - 1
Dim y As Integer
For y = 0 To i
If strTest.ToString.Length < x / 2 Then
strTest.Append(strTest.ToString & strTextbox(i))
Else
strTest.Append(strTest.ToString.Substring(0, x / 2) _
& strTextbox(i))
End If

Next
Next
Loop
mStringTest = strTest.ToString.Substring(0, x)
End Sub
End Class
///
 
Hi Cor,

|| Herfried said from that, that it would become slow
|| with a string from 10Mb. So I told Herfried that I
|| would make the test program if he would type in a
|| real mixed string for me. But till now he did not send
|| that. (Or maybe he has again problems with the
|| newsgroup server)

I've just been speaking to Herfried at the Hospital - it's not looking
good. He got as far as character 432764 before the pain was too much to bear.
The doctors have diagnosed the most severe case of RSI that they have seen in
years. Not surprising when Herfried was typing as fast as he couldto get you
the file in time. Unfortunately amputation may be the most reasonable outcome.
:-((

Regards,
Fergus
 
I told that it will consume a lot of memory, but I never told that the
performance will be bad.
I've just been speaking to Herfried at the Hospital
- it's not looking good. He got as far as character
432764 before the pain was too much to bear. The doctors
have diagnosed the most severe case of RSI that

What's RSI?
they have seen in years. Not surprising when Herfried was
typing as fast as he couldto get you the file in time.
Unfortunately amputation may be the most reasonable outcome.
:-((

:-(
 
Hi Herfried,

Repetitive Strain Injury - what you get from typing too much for too long.
How's the wrist now - do the doctors still think the hand will have to come
off? ;-)

Regards,
Fergus
 
Cor
Because all is dependable on the computer you use, I don't give absolute
figures if you wish you can try it yourself. On my computer it is impossible
to see with any method a difference from more than 1/1000 of a second with a
string shorter than 5000 characters.
When timing really fast functions like this, what you can do is put the
function in a loop, and call the function 1000 times, as long as you call
all the functions 1000 times you are safe to compare the amount of time they
took. You can increase the # of times you loop depending on how quick the
function. The danger becomes you start timing the loop itself and lose the
fact you are timing a function.

Jon Skeet a C# MVP wrote up the following on benchmarking, the samples are
in C# however the concepts should apply to VB.NET benchmarks.

http://www.yoda.arachsys.com/csharp/benchmark.html

BTW: Thank you for the code!

Hope this helps
Jay
 
Jay,
Thanks for the tip, but I am not intrested in differences between functions
who take less than 1/1000 of a second, if you are, we can try it.
Cor
 
Herfried,
I am very sorry, I did not know that it would take such an amount of effort
for you to make that string.
Don't botter, I think that string of 422764 characters is as well as 10Mb,
other wise I just can do:
stringA = StringA + StringA + StringA and then it has enough characters to
do the test.

But lets not talk about those not so important things, more important is
your health.
I am worried about your hamster too, is somebode feeding it?

I hope the RSI is soon over.

Good luck

Cor
 
Cor said:
I am very sorry, I did not know that it would take such an amount of effort
for you to make that string.
:-)

Don't botter, I think that string of 422764 characters is as well as 10Mb,
other wise I just can do:
stringA = StringA + StringA + StringA and then it has enough characters to
do the test.

Use "&" to concatenate a string. You you want it to be faster, use a 'StringBuilder'.
But lets not talk about those not so important things, more important is
your health.
I am worried about your hamster too, is somebode feeding it?

I feeded it in the morning.
I hope the RSI is soon over.

;-)
 
Fergus Cooney said:
Repetitive Strain Injury - what you get from typing too much for too long.
How's the wrist now - do the doctors still think the hand will have to come
off? ;-)

ROFLM*O
 
Hi Herfried,

|| Use "&" to concatenate a string.

What goes wrong if you don't ?

Regards,
Fergus
 
Herfried,
Use "&" to concatenate a string. You you want it to be faster, use a
'StringBuilder'.

Thank you for the advice I will use that when you have sended the string.
:-)
Cor
 
Fergus,
|| Use "&" to concatenate a string.

What goes wrong if you don't ?
I was sure that Herfried would answer with this when I did make the sentence

And as a reserve I did not using stringbuilder so I was absolute sure.

I had my next answer already ready.
:-)))
Cor
 
Cor said:
Thank you for the advice I will use that when you have sended the string.
:-)

I'll send you a 200 MB string by mail if you give me your mail address.
 
Cor said:
I was sure that Herfried would answer with this when I did make
the sentence
ROFL

And as a reserve I did not using stringbuilder so I was absolute sure.

I hope you know that the 'StringBuilder' would be the better choice.
 
Back
Top