How to parse various types without a switch?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi,
I need to read a big CSV file, where different fields should be converted to
different types,
such as int, double, datetime, SqlMoney, etc.

I have an array, which describes the fields and their types. I would like
to somehow store a reference to parsing operations in this array
(such as Int32.Parse, Double.Parse, SqlMoney.Parse, etc),
so I can invoke the appropriate one without writing a long switch.

Using reflection is not an option for performance reasons.

I tried to create a delegate, but since Int32.Parse, Double.Parse, etc.
all have different return types, creating a common delegate type
appears to be impossible.

For now I ended up writing wrappers around Parse methods for each type,
such as
static object ParseDouble(string str) {return double.Parse(str);}
then inserting delegates to these methods into the array.

This seems to work, but it looks pretty ugly. I still hope that .NET
framework
has a way to do it in official way, which I overlooked. For example,
it has interface IConvertible, which can be used to achieve opposite:
convert
an object to various types, but I cannot find an official way to do parsing.

Thank you
John
 
John,

You have crossposted (nothing wrong with) however I think that you be better
of with crossposting this to a language newsgroup too, because we don't know
what program language you are using now.

The two largest developer newsgroups of Microsoft are beside Excel.developer
and ASPNET

microsoft.public.dotnet.csharp
and
microsoft.public.dotnet.languages.vb

At least when you post to the two newsgroups that you are posting now, tell
us than what program language you use.

I hope this helps,

Cor
 
since Parse() is not declared in interface, you have to create
wrappers.
It would be like object Parse(string csvstrword);

(In C we have sscanf, it can read into many datatypes.)
 
Using reflection is not an option for performance reasons.

Reflection doesn't have to be slow. You can't get rid of the overhead, but
if you code it correctly it can be quite fast.
I tried to create a delegate, but since Int32.Parse, Double.Parse, etc.
all have different return types, creating a common delegate type
appears to be impossible.

Return an object and unbox it (if applicable) after the delegate returns.
This will have less overhead than reflection.

Or, if you can resolve the actual type of the value being parsed you can
create a sort of generic converter function using Convert.ChangeType:

public object TryParse(object /* string */ val, System.Type type) {
try {
return Convert.ChangeType(val, type);
}
catch {
return null;
}
}

And call it like:

string s = "3.41";
double d = (double) TryParse(s, System.Double);
 
I have a better solution, if I understand your problem correctly :P

Assumption.
since CSV has data, all in strings, like
1, "name" , "rank", "1.2.2006"
.....

Example:
[STAThread]
static void Main(string[] args)
{
IFF[] i = new IFF[2];
i[0] = new BFF(new Int32());
i[1] = new BFF(new Double());

object o = i[0].parse("1");//You have CSV in a proper manner
o = i[1].parse("1.1");
}

public interface IFF
{
object parse(string s);
}

public class BFF : IFF
{
private object myobj;
public BFF(object ob)
{
myobj = ob;
}

public object parse(string s)
{
try
{
Type tp = myobj.GetType();
System.Reflection.MethodInfo mi = tp.GetMethod("Parse",new
Type[]{typeof(System.String)});
object[] param = new object[1];
param[0] = s ;
object o = mi.Invoke(myobj,param);
//myobj.Parse(o);
return o;
}
catch
{
return null;
}
}
}
 
John,

I thought that Paul was more active in some other newsgroups the last time.

However I see he is it here as well.

See what he wrote about your problem in this newsgroup.
http://groups-beta.google.com/group/microsoft.public.dotnet.general/msg/a7c295497ae67bf4?hl=en&

I am not so familiar with those Ini files so maybe you can search for that
when the message from Paul is not sufficient enough or wait until he sees
this. Your subject is however not one that in my opinion gets direct the eye
from Paul.

I hope this helps,

Cor
 
I think i have provided John a pretty easy way.
Note that reflection can be used only once, just extract methodinfo and
then keep invoking it in parse() function.
 
Thank you for the answer.
You have crossposted

It is very hard to figure out the difference between
microsoft.public.dotnet.framework and microsoft.public.dotnet.general
What is the proper place to post questions about the library?
I think that you be better of with crossposting this to a language
newsgroup too, because we don't know what program language you are using
now.
At least when you post to the two newsgroups that you are posting now,
tell us than what program language you use.

I don't understand, what's the difference? My question is about the library
(Framework),
not the language. I guess all (or most) of the solutions, such as
interfaces, delegates, reflection, etc. are available to both C# and VB.
Currently I write on C#, but this should not matter.

Thank you
John
 
Thank you for the answer.
since Parse() is not declared in interface, you have to create wrappers.

Yes, this is what I ended up doing (see my original post):

static object ParseDouble(string str) {return double.Parse(str);}
.....

I just wanted to double check whether I am missing some standard solution,
and, actually, I was missing Convert.ChangeType, as Klaus H. Probst showed.

John
 
Thank you for answer.
Reflection doesn't have to be slow. You can't get rid of the overhead, but
if you code it correctly it can be quite fast.

May be it does not have to be, but it is.

Unfortunately, in my own benchmarking calling a static empty method
without arguments using reflection (MethodInfo.Invoke) is 300 times
slower than a direct call,
calling the same method using a delegate is 10% slower.
Calling Int32.Parse using reflection is 15 times slower than a direct call,
delegate is 0.5% slower.
So, I will not use reflection is a loop (unless you show that my results are
wrong).

can create a sort of generic converter function
Convert.ChangeType(val, type);

Thank you, I completely missed this one in my original search.
However, I am not going to use it.
Convert.ChangeType is implemented as a kind of a switch,
which I tried to avoid at the first place. As the result:
- it is 30% slower than the delegate
- it can only handle standard types and not the Sql* types.

Thank you
John
 
Your problem is easily handled using the decorator pattern with a builder
pattern to construct the parsing. Observer pattern can be used, but is not
terribly efficient. Switch statements are not needed in the parsing, but
may be needed in the builder.

Take at look at the builder pattern and the decorator pattern by googling
these names. They are standard OO patterns from the Gang of Four (GoF).

Let me know if you want help implementing them.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
 
Nick,

I have the idea I miss something here, why not try to convert a CSV using
OleDB with an Ini file.
I don't know if it completly works setting the datatypes because without
that it is almost forever string.

Can you enlighten me what I miss?

(seriously meant)

Cor
 
Hi Cor,

I looked up the post that you refer to. For some reason, I hadn't seen that
reply from Paul, but it is an excellent reply. Honestly, if the format of
the CSV file is rarely changing or changes only with advance notice, his
answer is far-and-away the best answer to use. The TEXT OleDb driver is
debugged and easy to configure.

My suggestion would only be valid if the application needs to adapt itself
to the data on the fly. In other words, if the app needs to allow the user
to provide a format, or the format can be deduced, but it cannot be
configured in advance.

In that case, you should create a simple decorator pattern. The endpoint
object would discard the remainder of the line. You decorate the object
with a class for each data type, reading right to left, to create the object
structure in memory. The builder does this work. Then, it is a matter of
sending each line through the data structure. The tokens are pulled off
from left to right (reverse of the order in which it was built). It is fast
and dynamic. No need for reflection.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
 
John,
In addition to the other comments.

My first choice would be Convert.ChangeType as Klaus shows.


My second choice would be an Adapter pattern, similar to your array of
Wrapper Delegates. In addition to using Delegates as you did, I would
consider using a series of classes that implemented an interface or shared a
common base class.

Something like:

Public Interface IConverter

Function Parse(ByVal s As String) As Object

End Interface

Public Class DoubleConverter
Implements IConverter

Public Function Parse(ByVal s As String) As Object Implements
IConverter.Parse
Return Double.Parse(s)
End Function

End Class

Public Class ConverterCollection
Inherits DictionaryBase

Public Sub Add(ByVal type As Type, ByVal converter As IConverter)
MyBase.InnerHashtable.Add(type, converter)
End Sub

Default Public ReadOnly Property Item(ByVal type As Type) As
IConverter
Get
Return DirectCast(MyBase.InnerHashtable.Item(type),
IConverter)
End Get
End Property

End Class

Public Shared Sub Main()
Dim converters As New ConverterCollection
converters.Add(GetType(Double), New DoubleConverter)

End Sub

The disadvantage of the interface/class method is the proliferation of
classes. The advantage of the delegate method is the elimination of all the
classes...

As to performance: Remember the 80/20 rule. That is 80% of the execution
time of your program is spent in 20% of your code. I will optimize (worry
about performance, memory consumption) the 20% once that 20% has been
identified & proven to be a performance problem via profiling (CLR Profiler
is one profiling tool).

For info on the 80/20 rule & optimizing only the 20% see Martin Fowler's
article "Yet Another Optimization Article" at
http://martinfowler.com/ieeeSoftware/yetOptimization.pdf

Hope this helps
Jay



<John> wrote in message | Hi,
| I need to read a big CSV file, where different fields should be converted
to
| different types,
| such as int, double, datetime, SqlMoney, etc.
|
| I have an array, which describes the fields and their types. I would like
| to somehow store a reference to parsing operations in this array
| (such as Int32.Parse, Double.Parse, SqlMoney.Parse, etc),
| so I can invoke the appropriate one without writing a long switch.
|
| Using reflection is not an option for performance reasons.
|
| I tried to create a delegate, but since Int32.Parse, Double.Parse, etc.
| all have different return types, creating a common delegate type
| appears to be impossible.
|
| For now I ended up writing wrappers around Parse methods for each type,
| such as
| static object ParseDouble(string str) {return double.Parse(str);}
| then inserting delegates to these methods into the array.
|
| This seems to work, but it looks pretty ugly. I still hope that .NET
| framework
| has a way to do it in official way, which I overlooked. For example,
| it has interface IConvertible, which can be used to achieve opposite:
| convert
| an object to various types, but I cannot find an official way to do
parsing.
|
| Thank you
| John
|
|
|
 
Doh!

I should add that System.ComponentModel.TypeConverter might be a class that
you could leverage instead of creating your own IConverter class.

You can use TypeDescripter.GetConverter to get the TypeConverter for a Type
or Object. If performance was a consideration I would consider caching the
TypeConverters.

Of course instead of storing the converters in their own hash table as I
showed earlier, you could store them in type describing each field...

Something like:

Public Class FieldDescription

Public Name As String

Public Type As Type

Public Converter As IConverter

' alternate to IConverter or your delegate...
Public Converter As TypeConverter

End Class

Hope this helps
Jay

| John,
| In addition to the other comments.
|
| My first choice would be Convert.ChangeType as Klaus shows.
|
|
| My second choice would be an Adapter pattern, similar to your array of
| Wrapper Delegates. In addition to using Delegates as you did, I would
| consider using a series of classes that implemented an interface or shared
a
| common base class.
|
| Something like:
|
| Public Interface IConverter
|
| Function Parse(ByVal s As String) As Object
|
| End Interface
|
| Public Class DoubleConverter
| Implements IConverter
|
| Public Function Parse(ByVal s As String) As Object Implements
| IConverter.Parse
| Return Double.Parse(s)
| End Function
|
| End Class
|
| Public Class ConverterCollection
| Inherits DictionaryBase
|
| Public Sub Add(ByVal type As Type, ByVal converter As IConverter)
| MyBase.InnerHashtable.Add(type, converter)
| End Sub
|
| Default Public ReadOnly Property Item(ByVal type As Type) As
| IConverter
| Get
| Return DirectCast(MyBase.InnerHashtable.Item(type),
| IConverter)
| End Get
| End Property
|
| End Class
|
| Public Shared Sub Main()
| Dim converters As New ConverterCollection
| converters.Add(GetType(Double), New DoubleConverter)
|
| End Sub
|
| The disadvantage of the interface/class method is the proliferation of
| classes. The advantage of the delegate method is the elimination of all
the
| classes...
|
| As to performance: Remember the 80/20 rule. That is 80% of the execution
| time of your program is spent in 20% of your code. I will optimize (worry
| about performance, memory consumption) the 20% once that 20% has been
| identified & proven to be a performance problem via profiling (CLR
Profiler
| is one profiling tool).
|
| For info on the 80/20 rule & optimizing only the 20% see Martin Fowler's
| article "Yet Another Optimization Article" at
| http://martinfowler.com/ieeeSoftware/yetOptimization.pdf
|
| Hope this helps
| Jay
|
|
|
| <John> wrote in message || Hi,
|| I need to read a big CSV file, where different fields should be converted
| to
|| different types,
|| such as int, double, datetime, SqlMoney, etc.
||
|| I have an array, which describes the fields and their types. I would like
|| to somehow store a reference to parsing operations in this array
|| (such as Int32.Parse, Double.Parse, SqlMoney.Parse, etc),
|| so I can invoke the appropriate one without writing a long switch.
||
|| Using reflection is not an option for performance reasons.
||
|| I tried to create a delegate, but since Int32.Parse, Double.Parse, etc.
|| all have different return types, creating a common delegate type
|| appears to be impossible.
||
|| For now I ended up writing wrappers around Parse methods for each type,
|| such as
|| static object ParseDouble(string str) {return double.Parse(str);}
|| then inserting delegates to these methods into the array.
||
|| This seems to work, but it looks pretty ugly. I still hope that .NET
|| framework
|| has a way to do it in official way, which I overlooked. For example,
|| it has interface IConvertible, which can be used to achieve opposite:
|| convert
|| an object to various types, but I cannot find an official way to do
| parsing.
||
|| Thank you
|| John
||
||
||
|
|
 
Hi,
Thank you for your answer.
I looked up the post that you refer to. For some reason, I hadn't seen
that reply from Paul, but it is an excellent reply.

As I already answered last Sunday,
this is an interesting solution, but I am not sure I want to redesign
my program to read CSV to a dataset, then extract data from the dataset
instead of parsing them directly.
But I will remember this as an option for the future.
My suggestion would only be valid if the application needs to adapt itself
to the data on the fly.

Yes, currently I store the field names in the first row of the CSV.
This allows easy viewing of the CSV in Excel and pretty flexible parsing.
You decorate the object with a class for each data type.

If this will require writing a new class for each data type,
then it will require at least twice more lines of code than
the wrapper/delegate approach. And in general the code looks more complex.
No need for reflection.

I never considered the reflection for performance reasons
despite what some other people suggested.

Thank you
John
 
Hi,
Thank you for your answer.
The disadvantage of the interface/class method is the proliferation of
classes.

Exactly. I asked this question at the first place because I didn't like
the proliferation of wrappers. Am just lazy. After typing 2 or 3 wrappers
I was bored and posted this message last Saturday.
But the proliferation of classes is much worse
because it will require 2 (or 3) times more lines of code than the wrappers.
I am certainly not going to type them.

By the way, in C++ I may have used templates to automatically
generate wrappers. I had big hopes for the generics feature in .NET,
but I was disappointed when found that generics cannot be used
in this (and in many other cases) because there is no
base class or interface common to all the types, which exposes Parse
method. But without such common interface the compiler refuses
to compile the generic class. Stupid!
Remember the 80/20 rule.

Yes, I am reminded of this rule every time I am, as a user, have to suffer
running a slow program. I know what the developers were thinking.
(CLR Profiler is one profiling tool).

My CSV may have few millions of records. Remembering that
reflection is 15 times (1500%) slower than delegates in this case,
I don't want to waste my time on the profiler
(in this case a wristwatch if sufficient).

So, I am sticking to the wrappers/delegates,
but I appreciate all answers, I learned about TEXT OleDb driver
and Convert.ChangeType and may use them in the future.

Thank you
John
 
Hello John,

In your original post, you stated:
I was simply pointing out that there is a way that you had overlooked... the
decorator pattern.
Note that you can do this with the visitor pattern as well. You chose the
observer pattern. Personally, I would not have done, but that is your
choice.
If this will require writing a new class for each data type,
then it will require at least twice more lines of code than
the wrapper/delegate approach. And in general the code looks more complex.

I would agree that there are more classes. I would also state that you have
a much more OO approach this way.
I would disagree that it looks more complex. On the contrary, it looks much
simpler.
I never considered the reflection for performance reasons
despite what some other people suggested.

I am aware of your other responses dealing with reflection. I pointed this
out to let you know that, unlike other suggestions, this pattern does not
require reflection.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
 
Back
Top