Behind the scenes -- how is a DataTable implemented?

  • Thread starter Thread starter Mountain Bikn' Guy
  • Start date Start date
M

Mountain Bikn' Guy

When one gets a row from a database (ie, a DataTable), the row contains a
typed value in each column. How is this typically implemented behind scenes.
I want to build this functionality myself. The reason I want to do this is
because I need an in-memory table without any of the overhead of a DataSet
or DataTable.

Thanks!
 
Bill Vaughn posted an explanation to this a while back, but I think it's
been archived since they are archiving more frequently.

If I remember correctly, he said it was done via a linked list but I don't
remember the specifics (my apologies).

I'm not sure what your ultimate objective is, so it's hard to make a
suggestion, but would using a DataReader to populate a
collection/arraylist/IEnumerable object etc get you what you need. Do you
need to walk this object in both directions?
 
Hi William,
Thanks for your reply. I will try to find Bill Vaughn's post on Google
Groups.

My objective is to work with a "table" in memory. I do not need a DataReader
at all. All I need is an in-memory table that has typed columns and allows
me to get a row just as I would from a DataTable. I just don't want the
overhead of a DataTable. I do not need any DBMS functionality. Essentially,
I just need raw memory bytes that are type specific.

I could use a collection of typed array columns -- I just need to figure out
a fast way to work with row objects that would consist of one "slot" from
each of the typed array columns.

Let me know if you have any other ideas. Thanks.
Dave
 
The DataTable is (more or less) a set of linked arrays--one for each column.
It is really pretty efficient. I think you may be recreating the wheel here.

--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
MVP, hRD
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
 
Have you tried creating a DataTable from code, columns and all? Under the
scenes, all a datatable is is an 'in memory' data type. I've played with
this quite a bit, and while not professing to be an expert, at least have a
good case to make. A DataTable isn't a huge heavy object by any means. If
you are only using it to hold data, it's not notably different from using a
strongly typed collection. I created a myDataTable object that was simply a
class implementing CollectionBase. I had my own .AddRow() method which
simply added a blank object to the collection.

If I may ask, why is it that a DataTable has overhead that makes it a deal
breaker? I can tell you from personal experience that a lot of 'overhead'
is often perceived (not saying that's the case hear, just speaking from
personal experience). Remember that it's a reference type, so this can have
a HUGE performance impact depending on how you are using it. I can't
understate this point. Pass by ref or val all you want, it's a reference
type and unless you are careful, it can really cause you some problems
performance wise.

If you
 
I'm basing my performance opinion on a series of different tests I've run
over time. Each time I have tried to use DataTables, there has been a
negative performance impact. I've experimented with lots of different
things. Comments inline below.

William Ryan said:
Have you tried creating a DataTable from code, columns and all?

Yes, that's how I always do it.
Under the
scenes, all a datatable is is an 'in memory' data type.

I understand. This is how I have used them. I was shocked to find that I
could create an ASCII file on disk much faster than I could add data to an
in-memory DataTable. Also, deleting a column from an in-memory DataTable is
extremely slow.

I've played with
this quite a bit, and while not professing to be an expert, at least have a
good case to make. A DataTable isn't a huge heavy object by any means. If
you are only using it to hold data, it's not notably different from using a
strongly typed collection.

In my experience it is much slower. I don't know why.
I created a myDataTable object that was simply a
class implementing CollectionBase. I had my own .AddRow() method which
simply added a blank object to the collection.

So how did you allow for different typed columns? Sounds like all your
columns were either not typed or of the same type.
If I may ask, why is it that a DataTable has overhead that makes it a deal
breaker? I can tell you from personal experience that a lot of 'overhead'
is often perceived (not saying that's the case hear, just speaking from
personal experience).

We're doing computationally intensive stuff. The original design (using the
standard .NET approach) took about 48 hrs of continuous running to finish a
results set. The current design, still in C#, does the same thing in 2 hrs
45 min. We've had to pay attention to performance stuff others don't need to
be concerned about.
Remember that it's a reference type, so this can have
a HUGE performance impact depending on how you are using it. I can't
understate this point. Pass by ref or val all you want, it's a reference
type and unless you are careful, it can really cause you some problems
performance wise.

Care to give an example?
 
I'm open to suggestions, but we have not been able to get the performance
required out of a DataTable yet. (see my reply to William Ryan)

I would certainly like to know more about the DataTable internal structure.
For example:

1. How is a row object implemented? If the DataTable is a set of linked
columns, what is the row and how can a row be made type-specific for each
cell/column?

2. How is a row inserted into a table between other rows if the DataTable is
a set of linked arrays? (Actually, in my solution I don't need to insert
between other rows, but I'm just curious how this is implemented.)

What I most need to know is how to implement a row object that is type
specific at each column and that doesn't use boxing/unboxing for primitives.

Regards,
Dave
 
Back
Top