Efficiency of Chr() (or &=)?

  • Thread starter Thread starter Paulo Becker
  • Start date Start date
P

Paulo Becker

Hi!

I'm coding an RC4 simulator, and in order to manipulate some data I've
created functions that transform strings into byte arrays (where each
array position contains the ASCII code of the character in the
string), and from byte arrays into strings.

Here they are:

Public Shared Function StringToByteArray(ByVal strString As
String) As Array
Dim arrByteArray(strString.Length - 1) As Byte
For intI As Integer = 0 To strString.Length - 1
arrByteArray(intI) =
CByte(Microsoft.VisualBasic.Asc(strString(intI)))
Next
Return arrByteArray
End Function

Public Shared Function ByteArrayToString(ByVal arrByteArray As
Array) As String
Dim strString As String = ""
For intI As Integer = 0 To UBound(arrByteArray)
strString &= Chr(arrByteArray(intI))
Next
Return strString
End Function

The "StringToByteArray" function works great, and the performance
seems acceptable. However, when I run "ByteArrayToString", the
performance is orders of magnitude inferior to that of its
counterpart. With an array of about 400,000 positions, it took around
10 minutes to create the string.

I'm not sure if the bottleneck is in the "&=" operator or in Chr().
Does anyone have any idea, or any suggestions as to what I could do to
make this more efficient?

Thank you very much for your attention.
 
Hi!

I'm coding an RC4 simulator, and in order to manipulate some data I've
created functions that transform strings into byte arrays (where each
array position contains the ASCII code of the character in the
string), and from byte arrays into strings.

Here they are:

Public Shared Function StringToByteArray(ByVal strString As
String) As Array
Dim arrByteArray(strString.Length - 1) As Byte
For intI As Integer = 0 To strString.Length - 1
arrByteArray(intI) =
CByte(Microsoft.VisualBasic.Asc(strString(intI)))
Next
Return arrByteArray
End Function

Public Shared Function ByteArrayToString(ByVal arrByteArray As
Array) As String
Dim strString As String = ""
For intI As Integer = 0 To UBound(arrByteArray)
strString &= Chr(arrByteArray(intI))
Next
Return strString
End Function

The "StringToByteArray" function works great, and the performance
seems acceptable. However, when I run "ByteArrayToString", the
performance is orders of magnitude inferior to that of its
counterpart. With an array of about 400,000 positions, it took around
10 minutes to create the string.

I'm not sure if the bottleneck is in the "&=" operator or in Chr().
Does anyone have any idea, or any suggestions as to what I could do to
make this more efficient?

Thank you very much for your attention.

Have you looked at System.Text.Encoding?

dim s as string = Encoding.Ascii.GetString (bytearray)
dim b() as byte = Encoding.Ascii.GetBytes (s)
 
Paulo Becker said:
Hi!

I'm coding an RC4 simulator, and in order to manipulate some data
I've created functions that transform strings into byte arrays
(where each array position contains the ASCII code of the character
in the string), and from byte arrays into strings.

You know that .Net Strings are Unicode encoded and each character takes two
bytes?
Here they are:

Public Shared Function StringToByteArray(ByVal strString As
String) As Array
Dim arrByteArray(strString.Length - 1) As Byte
For intI As Integer = 0 To strString.Length - 1
arrByteArray(intI) =
CByte(Microsoft.VisualBasic.Asc(strString(intI)))
Next
Return arrByteArray
End Function

Public Shared Function ByteArrayToString(ByVal arrByteArray As
Array) As String
Dim strString As String = ""
For intI As Integer = 0 To UBound(arrByteArray)
strString &= Chr(arrByteArray(intI))
Next
Return strString
End Function

The "StringToByteArray" function works great, and the performance
seems acceptable. However, when I run "ByteArrayToString", the
performance is orders of magnitude inferior to that of its
counterpart. With an array of about 400,000 positions, it took
around 10 minutes to create the string.

I'm not sure if the bottleneck is in the "&=" operator or in Chr().
Does anyone have any idea, or any suggestions as to what I could do
to make this more efficient?

Thank you very much for your attention.

- Use the Stringbuilder class to build strings.
- Use System.Text.Encoding.Unicode.GetBytes to get the original Unicode
encoded Bytes from a String.
- Use System.Text.Encoding.Unicode.GetString to convert a Unicode encoded
byte array into a String.


Armin
 
Hi!

I'm coding an RC4 simulator, and in order to manipulate some data I've
created functions that transform strings into byte arrays (where each
array position contains the ASCII code of the character in the
string), and from byte arrays into strings.

Here they are:

Public Shared Function StringToByteArray(ByVal strString As
String) As Array
Dim arrByteArray(strString.Length - 1) As Byte
For intI As Integer = 0 To strString.Length - 1
arrByteArray(intI) =
CByte(Microsoft.VisualBasic.Asc(strString(intI)))
Next
Return arrByteArray
End Function

Public Shared Function ByteArrayToString(ByVal arrByteArray As
Array) As String
Dim strString As String = ""
For intI As Integer = 0 To UBound(arrByteArray)
strString &= Chr(arrByteArray(intI))
Next
Return strString
End Function

The "StringToByteArray" function works great, and the performance
seems acceptable. However, when I run "ByteArrayToString", the
performance is orders of magnitude inferior to that of its
counterpart. With an array of about 400,000 positions, it took around
10 minutes to create the string.

I'm not sure if the bottleneck is in the "&=" operator or in Chr().
Does anyone have any idea, or any suggestions as to what I could do to
make this more efficient?

Thank you very much for your attention.

String concatentation (&=) is incredibly inefficient when done a large
number of times. Each concatenation causes the creation of a new
String object, a copy of the old contents into the new, and
destruction of the old String object.

Use a StringBuilder instead:

Dim str as New StringBuilder(UBound(arrByteArray)+1)
For intI as Integer = 0 to UBound(arrByteArray)
str.Append(Chr(arrByteArray(intI))
Next
Return str.ToString()
 
The problem is that each time you update a string you actually create a new
string (with the corresponding memory allocation & copy and that's getting
worse as the string is longer etc...).

When doing a high number of updates to a string you may want to use a
StringBuilder object instead (basically it works on a "string buffer" for
which you can specify the targeted dimension, that you repeateadly update
without allocation/copy and from which you get the final string once all
changes are done).

As a side note those functions are AFAIK already built-in .NET : you have a
string constructor that accepts a char array and you have a ToCharArray
method that returns an array. If you need to perform some encoding
System.Text.Encoding shoud also have related functions to create a string
from an array using a particular encoding or to create an array from a
string using a particular encoding.
 
String concatentation (&=) is incredibly inefficient when done a large
number of times. Each concatenation causes the creation of a new
String object, a copy of the old contents into the new, and
destruction of the old String object.

Use a StringBuilder instead:

Dim str as New StringBuilder(UBound(arrByteArray)+1)
For intI as Integer = 0 to UBound(arrByteArray)
str.Append(Chr(arrByteArray(intI))
Next
Return str.ToString()

Thanks, that did the trick perfectly.

And regarding the other suggestions to use the built-in GetBytes and
GetString encoding functions, I did look into that and tried using
them in my program, but funny stuff happened if I used
Encoding.Unicode or Encoding.ASCII - like weird characters in the
plaintext after a cypher-decypher cycle. However, when I used
Encoding.Default.GetString and Encoding.Default.GetBytes, I ended up
with the same results that my own conversion functions gave.

Since I didn't feel any performance difference between my functions
and the built-in ones, I'm gonna stick with mine for now. Unless of
course someone has a good reason for me not to do that :)

Thanks a lot for all your answers guys!
Cheers!
 
Paulo Becker said:
Thanks, that did the trick perfectly.

And regarding the other suggestions to use the built-in GetBytes and
GetString encoding functions, I did look into that and tried using
them in my program, but funny stuff happened if I used
Encoding.Unicode or Encoding.ASCII - like weird characters in the
plaintext after a cypher-decypher cycle.

It depends on how you use them. In your first post I read "some data". Is it
really necessary to store it in a string?
If you use Unicode, you must be aware that the array contains an even number
of bytes. ASCII is of no use because it's a 7-bit character code.
However, when I used
Encoding.Default.GetString and Encoding.Default.GetBytes, I ended up
with the same results that my own conversion functions gave.

Since I didn't feel any performance difference between my functions
and the built-in ones, I'm gonna stick with mine for now. Unless of
course someone has a good reason for me not to do that :)

You say that the GetBytes/GetString functions are as slow(/fast) as if you
are concatenating strings? That's surprising. Also be aware that the
conversion to the Default encoding costs additional performance in both
directions.


Armin
 
Paulo said:
And regarding the other suggestions to use the built-in GetBytes and
GetString encoding functions, I did look into that and tried using
them in my program, but funny stuff happened if I used
Encoding.Unicode or Encoding.ASCII - like weird characters in the
plaintext after a cypher-decypher cycle. However, when I used
Encoding.Default.GetString and Encoding.Default.GetBytes, I ended up
with the same results that my own conversion functions gave.

That makes sense, as the Asc method is using the Encoding.Default
encoding to encode the character. You should get a bit better
performance if you use the encoding directly instead of calling Asc that
will use the encoding to encode each character separately.
 
Back
Top