Bizarre: string.split() method

  • Thread starter Thread starter JV
  • Start date Start date
Well, here's some VB as an example. They have a string which is doubly
delimited. In other words, each token in the string is itself another
delimited string.

Public Function ExampleFunction(ByVal stringToSplit As String, _

Optional ByVal innerDelimiter As String = "~", _

Optional ByVal outerDelimiter As String = "#~#")


Dim bigArray() As String

Dim smallArray() As String

bigArray = stringToSplit.Split(outerDelimiter)

' (etc.)

Now, to translate that to C# you would need a String.Split() function which
accepts a String for delimiter parameter and treats the entire string as the
delimiter, not a list of delimiter characters. It's a function that
probably makes sense to have in the framework string class (along with a few
others that I always thought they should had added). To accomplish this,
you would have to do one of the following:

1) Use Regex.Split() -- probably the best choice, but a bit ugly
2) Use Microsoft.VisualBasic.Strings.Split() -- requires linking to the VB
library, but is an easy solution
3) Roll your own Split() function and pass it the string you want to split
(not very object-oriented, unfortunately, but workable)
 
JV said:
The point is that we are led to believe that both VB and C# use the same
framework classes, but here we find that string class behavior in VB is not
necessarily equal to string class behavior in C#. This seems very
inappropriate at best. I am fine with VB having additional library
functions available to handle strings, but both languages should be using
the identical framework string class.

I don't think that's what's happening. I don't think the Split method is
being passed what you think it's being passed. This is why Implicit Type
Conversion Is Evil And Must Die.

Sub Main()

Dim toSplit As String
Dim sections As String()
Dim section As String

toSplit = "AbrBbrCbrDbrEbrF"
sections = toSplit.Split("br")
For Each section In sections
Console.WriteLine(section)
Next
Console.ReadLine()

End Sub

gives me:

A
rB
rC
rD
rE
rF

Can you provide an example where it uses the entire string as a
delimiter?

This suggests that 'b' is being used as a delimiter. "br" is being
implicitly converted to 'b'. ParamArray allows you to pass a single
char[] parameter or a set of char parameters separated by commas. VB is
saying "that is a single string parameter to be converted to a char",
not "that is a string to be converted to an array of chars".

Try passing "AB" to:

Sub Foo(ByVal ParamArray chars As Char())
Console.WriteLine(chars.Length)
End Sub

and see what you get.

System.String is fine. VB is broken.
 
JV,

I see some code that makes absolute no sense because there is no option
strict in involved. Only a true strings are used so there is no late binding
in this case.

This means that the system.string.split method should work here the same in
C# as in VBNet.

However I would be glad when you show us when not.

You real should be able to give a better sample to state what you mean.

Cor
..
 
William,

Just try it, create a simple basic VBNet windows forms project, set in the
loadforms of that something simple (a vb function) as
dim i as integer = Cint("1")

Do the same with a C# project and set in that the same convert function
using system.net
int i = Convert.ToInt32("1");

Build both and look at the size of the exe.

I wrote express "sometimes" because normally the differences are not that
extreme, while I have seen that C# programs can be in other circumstances
smaller than VBNet.

However here it is to show that there is not something extra added to the
CLI code.

Cor
 
JV,
With Option Strict Off, the following:
| bigArray = stringToSplit.Split(outerDelimiter)

Gets converted to:
| bigArray = stringToSplit.Split(CChar(outerDelimiter))

So VB does not (cannot) split on the String delimiter when calling
String.Split, it is splitting on the first character of the string
delimiter!

Here's the IL from VB 2003 for the above 2 statements:

//000244: bigArray = stringToSplit.Split(outerDelimiter)
IL_0001: ldarg.1
IL_0002: ldc.i4.1
IL_0003: newarr [mscorlib]System.Char
IL_0008: stloc.3
IL_0009: ldloc.3
IL_000a: ldc.i4.0
IL_000b: ldarg.3
IL_000c: call char
[Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.CharType::FromString(string)
IL_0011: stelem.i2
IL_0012: ldloc.3
IL_0013: callvirt instance string[]
[mscorlib]System.String::Split(char[])
IL_0018: stloc.0
//000245:
//000246: bigArray = stringToSplit.Split(CChar(outerDelimiter))
IL_0019: ldarg.1
IL_001a: ldc.i4.1
IL_001b: newarr [mscorlib]System.Char
IL_0020: stloc.3
IL_0021: ldloc.3
IL_0022: ldc.i4.0
IL_0023: ldarg.3
IL_0024: call char
[Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.CharType::FromString(string)
IL_0029: stelem.i2
IL_002a: ldloc.3
IL_002b: callvirt instance string[]
[mscorlib]System.String::Split(char[])
IL_0030: stloc.0

| Now, to translate that to C# you would need a String.Split() function
which
| accepts a String for delimiter parameter and treats the entire string as
the
No to translate that to C# you need to take the first Char of
outerDelimiter, I would probably simply use the Chars indexer.

| bigArray = stringToSplit.Split(outerDelimiter[0])

For another example see Steve Walker's example.

Hope this helps
Jay


| Well, here's some VB as an example. They have a string which is doubly
| delimited. In other words, each token in the string is itself another
| delimited string.
|
| Public Function ExampleFunction(ByVal stringToSplit As String, _
|
| Optional ByVal innerDelimiter As String = "~", _
|
| Optional ByVal outerDelimiter As String = "#~#")
|
|
| Dim bigArray() As String
|
| Dim smallArray() As String
|
| bigArray = stringToSplit.Split(outerDelimiter)
|
| ' (etc.)
|
| Now, to translate that to C# you would need a String.Split() function
which
| accepts a String for delimiter parameter and treats the entire string as
the
| delimiter, not a list of delimiter characters. It's a function that
| probably makes sense to have in the framework string class (along with a
few
| others that I always thought they should had added). To accomplish this,
| you would have to do one of the following:
|
| 1) Use Regex.Split() -- probably the best choice, but a bit ugly
| 2) Use Microsoft.VisualBasic.Strings.Split() -- requires linking to the
VB
| library, but is an easy solution
| 3) Roll your own Split() function and pass it the string you want to split
| (not very object-oriented, unfortunately, but workable)
|
|
| | > JV,
| >
| > Can you show the sample code that you have used in both languages?
| >
| > Cor
| >
|
|
 
JV said:
I'm in favor of THAT!

Quite. Motherhood & apple pie question.

The only people who need it don't actually understand what it is, and if
they did understand, they wouldn't want it...
 
I STRONGLY agree with JV!!!

--
Carlitos


JV said:
My friend, I have been coding in C# since the 2nd Beta of Visual Studio.
And I have about 10 years of C++ before that. The problem I have is with a
conversion of an application originally written in VB.NET. It does not make
sense to me that strings would behave differently in two DotNet languages as
the string class is supposed to be a framework class, not a
language-specific class.
 
Back
Top