Dataset VS DataReader - Performance

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi All

We have had a HOT discussion on the subject ... to resolve this I need a authentic answer (preferably with some supportings) to my question follows...

What and How much difference (in terms of end-user user response time in seconds) it will make to an end-user of a web site if we use Typed DataSet as a dataobject OR custom defined dataobject..

1) with @ 100 - 200 concurrent users in given circumstances...
2) web page needs to process about 500 Rows (@ 50 columns) (MAX 1000 rows in few circumstances.
3) once the processing is done dataset/datareader will be cleaned-up.
4) if we dont use dataset we will be using some custom defined dataobject, populated programmatically using by looping thru datareader
5) Inclination towards using typed - dataset is owing to ease of development, and other out-of-the box functionality provided by microsoft

Any lights on this will be appriciated.
 
Forgot to mention that..

We have Three layers like .. code Behind - Business Logic Layer - Data Access Layer (Where data access layer MUST return either TypedDataSet OR Custom Defined Data object)
 
This just came up the other day and I as well as some others posted links
with stats on the subject.
"Speed in ADO.NET" by Manu on 5/26

John Papa wrote an article on this in this month's MSDN magazine
http://www.msdn.microsoft.com/data/default.aspx?pull=/msdnmag/issues/04/06/datapoints/default.aspx
as well

Speed isn't the only question here and there are sooo many other variables
in the equation, particularly over the web, that I'd think long and hard
about it....the winner may end up feeling like King Pyrrhus. Anyway, the
datareader is going to be the fastest.
http://www.awprofessional.com/articles/article.asp?p=169485 Typed Datasets
are faster than nontyped. Index based lookups are the fastest way to
reference values in a datarow.

My point is that the difference between the two may be trivial compared to
index tuning strategies, load balancing the server etc. The table structure
and what you join on (hopefully you aren't using any joins you're using
DataRelations if you are use DataTables) etc etc all have a huge bearing on
performance and dataset vs datareader is not the biggest area for
improvement in most instances.


Also, those business objects, how you load them, how you deal with them
could also have a huge bearing. My point is that there are a ton of places
you could more than offset any benefit from one over the other and many
times, the speed differences can be screwed up by improper use.

Walk through 10000 rows of a reader and use the name based lookup ie
rdr.GetString("LastName") vs rdr.GetString[1] and see what happens. using
ordinal based lookups for instance can greatly speed performance so DON'T
Fall for the trap of just looking to one issue.

But if you need an answer to your question, you'l find it with the links I
provided above, and it's very clear 'all else being equal' - a state very
hard to isolate

--

W.G. Ryan, eMVP

http://forums.devbuzz.com/
http://www.knowdotnet.com/williamryan.html
http://www.msmvps.com/WilliamRyan/
http://www.devbuzz.com/content/zinc_personal_media_center_pg1.asp
BKS said:
Hi All,

We have had a HOT discussion on the subject ... to resolve this I need a
authentic answer (preferably with some supportings) to my question
follows...
What and How much difference (in terms of end-user user response time in
seconds) it will make to an end-user of a web site if we use Typed DataSet
as a dataobject OR custom defined dataobject...
1) with @ 100 - 200 concurrent users in given circumstances...
2) web page needs to process about 500 Rows (@ 50 columns) (MAX 1000 rows in few circumstances.)
3) once the processing is done dataset/datareader will be cleaned-up.
4) if we dont use dataset we will be using some custom defined dataobject,
populated programmatically using by looping thru datareader.
5) Inclination towards using typed - dataset is owing to ease of
development, and other out-of-the box functionality provided by microsoft.
 
Thanks for your reply... William,

I should have phrased the question like this... Typed Dataset VS Custom Defined Data Object

Option 1. Using a Typed Dataset - dataadapter to populate the t-dataset
Option 2. Using a Custom defined dataobject - use datareader to populate it

Now in this case... Which option is better in terms of performance and how much is the difference. Option 1 is certainly easy to code. comments please.
 
If you read through the links I posted and the original post I referred to..
If speed is your sole concern than go with the datareader. There's some
performance tests to indicate why I make that claim in the orignal post I
referred to. I think that only taking speed into consideration is extremely
shortsighted. I've heard all sorts of stuff like "well I don't think that
speed is everything but the customer does ..." I really don't buy it except
in rare cirucmstances.. Focusing only one area can have you make some
really bad decisions and few people are really willing to make all the
sacrifices that accompany such tunnel vision.

Speed is absolutely important but maintainability is also important. Where
you draw the line is a subjective thing.

But the most important poitn I can make is that if I wanted to , I could
code a dataReader solution that's slower than a DataTable solution. I could
code a solution with either that's slow as hell. The speed of a car doesn't
necessarily dictate how fast you can get home. Traffic, construction, if
the car is running properly, if it's towing a large boat ,all can have
tremendous impact. The same holds here. Your queries are probably the
biggest influencer in performance. If those are slow the rest is academic.
Also, say that you had a Datareader with a three table join. It's entirely
possible that pulling three seperate queries with datatables and using
DataRelations would be faster b/c they arne't pulling over redundant data.

Also, a 'good' solution may be bad in one situatino and great in a different
one. Record sizes, indexes, bandwidth, index fragmentation, server load and
a whole host of other things have HUGE bearings on performance.

In sum - my two cents:
1) Don't look at speed as the only factor. Everything is interrelated.
After all, would any customer want an app that returned data in .000000002
seconds but was often wrong? Vs something that did it in 1 second that was
always right? We're talking about computer time here and often users can't
tell the difference gained from the faster method
2) Don't think that DataReader vs DataTable is the sole, or even Primary
influencer of speed. All else being equal a reader is faster. But you can
code either one and do things either way to shift the balance here..how you
code, how you query all have a huge impact, and a much bigger one than the
object you use .

HTH,

Bill
--

W.G. Ryan, eMVP

http://forums.devbuzz.com/
http://www.knowdotnet.com/williamryan.html
http://www.msmvps.com/WilliamRyan/
http://www.devbuzz.com/content/zinc_personal_media_center_pg1.asp
BKS said:
Thanks for your reply... William,

I should have phrased the question like this... Typed Dataset VS Custom Defined Data Object.

Option 1. Using a Typed Dataset - dataadapter to populate the t-dataset.
Option 2. Using a Custom defined dataobject - use datareader to populate it.

Now in this case... Which option is better in terms of performance and how
much is the difference. Option 1 is certainly easy to code. comments please.
 
Hi BKS,

I thought Bill did answer this already. However,

Using the dataadapter gives you a common way of using the data in dotNet, so
when there is a change in the crew, it will probably easy to start doing
maintanance for the new ones.

Creating your own defined data object will most certainly not do that, with
all even worse thing when it is not good done.

That does not mean you should not do it, however only performance is in my
opinion a bad pilot.

By instance 1000000000000000 times a nanosecond will be in much cases less
important than 1 time 24 hours completly down.

However just my thought,

Cor
 
Early on, I was using a lot of DataReaders in one of the apps I'm currently
building. I've stopped doing that. For development, I'm connecting to SQL2k
over fast ethernet on an old AMD-450 server. What I've found -- regardless
of stored procedures, paramaterized, ad hoc query, etc., is that if you have
to open 2 DataReaders, it is much more efficient to load up a DataSet with
all your data (even with a join) instead of making the 2nd trip to open and
close the connection. Also, leaving the first connection open with a Dataset
and filling a new adapter seems significantly quicker than 2 datareader
reads. This is undoubtedly due to connection pooling. Your mileage may vary.

BKS said:
Hi All,

We have had a HOT discussion on the subject ... to resolve this I need a
authentic answer (preferably with some supportings) to my question
follows...
What and How much difference (in terms of end-user user response time in
seconds) it will make to an end-user of a web site if we use Typed DataSet
as a dataobject OR custom defined dataobject...
1) with @ 100 - 200 concurrent users in given circumstances...
2) web page needs to process about 500 Rows (@ 50 columns) (MAX 1000 rows in few circumstances.)
3) once the processing is done dataset/datareader will be cleaned-up.
4) if we dont use dataset we will be using some custom defined dataobject,
populated programmatically using by looping thru datareader.
5) Inclination towards using typed - dataset is owing to ease of
development, and other out-of-the box functionality provided by microsoft.
 
Many thanks to William Ryan, Cor and Earl... I have got the answer... I am also thinking very much same as you have thought of... now as you have said its now supported with your email and experiance you bring in. Many thanks again. :)
 
Well said, that discussion helped clear up alot for me as well concerning
the datareader vs. dataset debate.

Ron

BKS said:
Many thanks to William Ryan, Cor and Earl... I have got the answer... I am
also thinking very much same as you have thought of... now as you have said
its now supported with your email and experiance you bring in. Many thanks
again. :)
 
Assuming a spherical cow ...

The DataSet is populated by a DataReader behind the scenes, so the
DataReader is always lighter in weight than a DataSet, strongly typed or
not. The question, however, is whether or not you can do your processing
using Microsoft's code rather than creating your own loop. Depending on the
processing, you may be able to use XML objects, which may be faster than
your DataReader loop. From what you have mentioned, the load is not that
heavy either way, so focusing on what is easiest for you and your fellow
devs to work with may be a consideration, even if it is not the best
perf-wise.

Considering the small amount of time for use of the objects, if I am reading
it correctly, a DataSet seems like a better option. Once again, this is
making a lot of assumptions on the types of data and the types of processing
you might be performing on the data.

The fact that your app is three tier really does not make much difference
here, except that your architecture may favor custom business objects. If
not, the DataSet is much easier to maintain than custom objects. Plus, the
code has already been debugged and tested by someone else, taking some of
the burden off your shoulders.

If it influences your decision at all, MS is committed to the DataSet, as
you will see in the 2.0 Framework.

--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

************************************************
Think Outside the Box!
************************************************
BKS said:
Hi All,

We have had a HOT discussion on the subject ... to resolve this I need a
authentic answer (preferably with some supportings) to my question
follows...
What and How much difference (in terms of end-user user response time in
seconds) it will make to an end-user of a web site if we use Typed DataSet
as a dataobject OR custom defined dataobject...
1) with @ 100 - 200 concurrent users in given circumstances...
2) web page needs to process about 500 Rows (@ 50 columns) (MAX 1000 rows in few circumstances.)
3) once the processing is done dataset/datareader will be cleaned-up.
4) if we dont use dataset we will be using some custom defined dataobject,
populated programmatically using by looping thru datareader.
5) Inclination towards using typed - dataset is owing to ease of
development, and other out-of-the box functionality provided by microsoft.
 
Because it compares to ADO "classic", one point that I think is most
noteworthy is that opening and closing a connection costs a heavy
performance penalty. When you have to traverse the network and make/break
more than one connection, I'm not sure that the DataReader vs. DataSet issue
is any more complex than paying the connection penalty. Despite it's
"high-speed, off-the-wire, forward-only" capability, I'm not convinced the
standalone DataReader is really that relevant to any application.
 
Yes and no.

While opening and closing a connection is expensive, there are mechanisms in
place in both ADO and ADO.NET to pool connections, so the connection issue
will have a hit the first time, but none with subsequent connections.

I will agree that the DataReader is not a great thing, either way. The
savings are big enough in an application that is being hammered, but
DataSets are so much easier to maintain. Which is less expensive? Another
employee with deep knowledge or procs and memory?

DataReaders are great with forward only data, like reports. Other than that,
I would err on the side of DataSets, esp. considering where MS is going with
data access in .NET 2.0.

--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

************************************************
Think Outside the Box!
************************************************
 
Earl,
one point that I think is most
noteworthy is that opening and closing a connection costs a heavy
performance penalty.
Agreed, opening & closing a connection is expensive! This is one reason that
the framework supports connection pooling (the framework keeps the physical
connection open for you). Also why if you are filling a DataSet or Domain
objects with multiple queries, you should open the connection, do your work,
and close the connection.
I'm not sure that the DataReader vs. DataSet issue
is any more complex than paying the connection penalty.

Remember that a DataSet populated by a DataAdapter is using a DataReader
under the covers.

Which means you can open a single Connection and use multiple DataReaders
against it!

So neither has a connection penalty over the other. In other words they both
share the exact same connection penalty!
"high-speed, off-the-wire, forward-only" capability, I'm not convinced the
standalone DataReader is really that relevant to any application.
A DataReader is relevant, because using either Domain Objects or DataSets is
relevant!

As for Domain Objects verses DataSets, I would suggest BKS, you & others
review Martin Fowler's book "Patterns of Enterprise Application
Architecture" from Addison Wesley
http://www.martinfowler.com/books.html#eaa. The book covers both Domain
Model verses Table Module (with Record Set) and when to use one over the
other.

http://www.martinfowler.com/eaaCatalog/domainModel.html
http://www.martinfowler.com/eaaCatalog/tableModule.html
http://www.martinfowler.com/eaaCatalog/recordSet.html

Hope this helps
Jay
 
Back
Top