DataReader vs DataSet Performance.

  • Thread starter Thread starter Ashish
  • Start date Start date
A

Ashish

hi Folks,

I am about to start development on a new framework, and contemplating
whether to use datareader to create objects or to use DataSets to create
object.

For a test scenario i created two classes , with each having 8
properties. Then i created two Data creation logics, one that fetched
dataSets from DAL and creates the objects, and second one creates
objects from DataReader.

the dataaccess is done through the microsoft dataaccess application block

these are my performance results.


Objects created 4 220 22000

DataReader 1.40 2.006 50.59
(Avg Time in secs)

DataSets 1.42 2.01 50.2
(Avg Time in secs)



the code is exactly the same (except the actual creation logic) , the
database is SQL Server 2000

so the question is , are datareaders really fast (as microsoft says) or
am i missing something. ?

Please comment
 
I have not noticed any difference in performance (they should be faster in
theory, but the don't appear to be in implementation). I'm not using
DataReader any more because of the issues I'm having when using them over a
slower broadband connection. From now on all my code will be
DataSet/DataTable until I can find the real reason to the problems of
DataReader.
 
Ashish- here's another performance metric that you may find useful:

http://www.devx.com/vb2themax/Article/19887

Anyway, to your question. 1- DataAdapters use DataReaders internally to
fill datatables so they can at best, be as fast as a reader -x . However,
and this is a big however, I can easily use a datareader which would be
slower. I could refer to the columns by StringName instead of index. I
could not check dbnull values, throw exceptions and try/catch them. I could
not get the real type and have to convert the type for each individual
value. My point is that datareaders, all else being equal are faster, but
that doesn't mean much b/c there's a lot of room in All else being equal.
Also, if you stored the datareader info in an array or other object, you
would also use resources so it's not free by any means. But, you would have
to build pretty sophisticated business objects, functionality which
datatables /datasets/dataviews already have so it's easy to come up with a
case where using an Adapter is faster overall (remember that performance
gains in one area don't matter if you have bottlenecks which offset them as
a condition of your methodology).

use Output params where appropriate. use ExecuteScalar where appropriate.
Use Readers when you dno't need to cache the data and use adapters for the
rest- just my two cents ;-)
 
Are you measuring data retrieval only? Or you include object creation?

If you are including object creation, that is not a good test, since you are
not looking at data access time alone.

Also, you should test with more then 4 rows. A lot more. Like thousands of
rows. And make sure you time the data retrieval time only.

I suspect what you will find is that the datareader is faster. However,
since most applications don't need thousands of rows, and they spend most of
their time processing the data, the data retrieval time is going to get lost
in the overall time required to complete a particular process. So it won't
end up mattering anyway, and you should just use whatever object is better
suited for the job.
 
hi guys,

thanks for your comments. Some of you have mentioned additional thing
that might skew results so i want to mention somethings here ..

1. Iam indexing columns using column index when using DataReaders, and
indexing columns using column names using DataSets.

2. Iam checking for nulls in both scenarios, so conditionals are same.

3. the results mentioned here are over creations of 4 objects, 220
objects, and 22,000 objects (In reference to Marina's comments)

4. The code for setting properties using DataReaders is

mUid = CInt(reader.GetInt32(0))
mName = CStr(reader.GetString(1))
mSite = CStr(reader.GetString(2))

If Not IsDBNull(reader.GetValue(4)) Then
mCurrentBuildVersion = reader.GetString(4)
End If


If Not IsDBNull(reader.GetValue(5)) Then
mCurrentBuildDateTime = reader.GetDateTime(5)
End If


If Not IsDBNull(reader.GetValue(6)) Then
mSourceControlName = reader.GetString(6)
End If

If Not IsDBNull(reader.GetValue(7)) Then
mComments = reader.GetString(7)
End If

5. The code for setting properties using DataSets is

mUid = CInt(dr("uid"))
mName = CStr(dr("Name"))
mSite = CStr(dr("Site"))

If Not IsDBNull(dr(("CurrentVersion"))) Then
mCurrentBuildVersion = CStr(dr("CurrentVersion"))
End If


If Not IsDBNull(dr("BuildDateTime")) Then
mCurrentBuildDateTime = CDate(dr("BuildDateTime"))
End If


If Not IsDBNull(dr("SourceControlName")) Then
mSourceControlName = CStr(dr("SourceControlName"))
End If

If Not IsDBNull(dr("Comments")) Then
mComments = CStr(dr("Comments"))
End If

6. Iam planning to run webservices on this framework, so speed is very
crucial for me...

I just want to confirm that this notion of DataReaders being faster than
datasets is incorrect
 
Here is what I was trying to say.

It takes 1 second to get the data using a datareader, and 2 seconds using a
dataset. So that datareader is twice as fast. However, it takes 99 seconds
to create your objects, and for them to initialize themselves.

So, now total time using datareader is 100 seconds, and 101 using the
dataset. So now, they look like they are almost the same, instead of
datareader being twice as fast.

Now, I made up all the numbers, but I was trying to illustrate a point.

In general, you should be more worried about having a SQL Connection open
longer then it needs to be. With datareaders, you are processing 1 row at a
time, and it takes some amt of time to initialize your object.

Because of this, I would recommend you use a dataset. You want to get your
data, close the connection, and then process the data.
 
Well then it means that in the bigger scheme of things, cost of fetching
data is about around 1% ( as calculated from results) as compared to
cost of accessing data..

well then i guess all the people who tell you to profile application for
Data access or the one who complain about chatty applications are wrong ?
Iam not implying one way or other, but want to ascertain what operations
are costly from an application prespective ...
 
I'm not saying that it is always 1% of the time spent. I said that I was
making up the data to illustrate a point.

I have no idea what it is in your case because you just have the total time
it takes for you to run your process. Maybe for you it's 1%, maybe it's
90%. I have no idea. And because we don't know, it isn't fair to compare
just the total time it takes, because that time could be dominated by code
that has nothing to do with the actual fetching of data from the database.
 
Ashish,

This is a moot question, because head to head performance they will be
nearly the same. But that may not be so in a production application, where
you are doing more than just "reading" data.

The real choice between a dataset and datareader is actually based on a
number of reasons - performance being one of them.

1. Performance will be better in datareaders in single user scenarios,
because datareaders need to process upon a row fully before moving to the
next - so you add the processing time to the time you keep the connection
open. Datasets - which are filled using datareaders, keep the connection
open and non-pooled for a very short time, you actually work on disconnected
data - so they are much faster in multi user scenarios.

2. Traversal direction

3. Memory concerns

4. Locking/architecture concerns.

5. etc.

I have a discussion on datasets vs. datareaders in Chapter 14 of my book.

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
 
Marina,

the calculation of 1% is based on the hypothesis that cost of object
creation is constant in datareaders and datasets,

hence for 4 object creation scenario the different is

(time for dataset - time for datareader) / time for dataset

(1.42- 1.4) /1.42 * 100 = 1.4 %
 
I am not donig anything *different* between the two tests ...,

one object creation is having 1 update, 2 selects, and 1 inserts, but
they are same for both the tests...,

the testing environment is exactly the same...

iam not complaining that they are slow or something, what iam saying is
that why there is not a lot of differece between them (as people say)
 
mUid = CInt(reader.GetInt32(0))
mName = CStr(reader.GetString(1))
mSite = CStr(reader.GetString(2))

Maybe I don't understand how VB.NET works, but why are you casting the results
of the gets in the above code?

Doesn't CInt() convert a string to an integer? Why would you need to convert an
integer value to an integer? You are doing the same thing with the strings
returned by the reader.GetString() methods.

In C# we would do it like this:
int mUid = reader.GetInt32(0);
string mName = reader.GetString(1);
string mSite = reader.GetString(2);

If you are losing speed maybe it is because you are calling an extra and
unnecessary casting function for each column returned from the reader.

You only need to cast when using the reader like this:
int mUID = Convert.ToInt32(reader.GetValue());

Try it without the casting and see if the reader is faster. I'll bet it is.
Otis
 
iam not complaining that they are slow or something, what iam saying is
that why there is not a lot of differece between them (as people say)

Because, per your test case - there isn't really much of a difference.
Again, datasets = disconnected data. To fill a dataset you keep the
connection open for very little time.

So if your operation is "Fill Data" "Work with Data" "Update Data", and the
time required to do these is T1, T2, and T3,

Using a dataset, your total time the connex. was open is TDs = T1 + T2
Using a datareader, your time was TDr = T1 + T2 + T3

TDr > TDs

Therefore, Datareader connection pooling sucks, therefore datareader is
slower in multi user scenarios.

Similarly, because a datareader doesn't have the overhead of creating a
dataset, it is faster in single user scenarios.

So you CANNOT definitively say DR is faster than DS in every single
situation out there. People who claim that - are bullshitting. It truly
depends on your situation and architecture what may actually be faster.

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
 
Ashish,

I think that there are enough answers by Marina, which I would not have done
with the same intent.

I am always interested what Microsoft says.

Do you have a link to where this is written?
so the question is , are datareaders really fast (as microsoft says)

That than in relation to dataadapters?

Thanks in advance

Cor
 
iam not complaining that they are slow or something, what iam saying is
that why there is not a lot of differece between them (as people say)

Because, per your test case - there isn't really much of a difference.
Again, datasets = disconnected data. To fill a dataset you keep the
connection open for very little time.

So if your operation is "Fill Data" "Work with Data" "Update Data", and the
time required to do these is T1, T2, and T3,

Using a dataset, your total time the connex. was open is TDs = T1 + T2
Using a datareader, your time was TDr = T1 + T2 + T3

TDr > TDs

Therefore, Datareader connection pooling sucks, therefore datareader is
slower in multi user scenarios.

Similarly, because a datareader doesn't have the overhead of creating a
dataset, it is faster in single user scenarios.

So you CANNOT definitively say DR is faster than DS in every single
situation out there. It truly depends on your situation and architecture
what may actually be faster.

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
 
Back
Top