Dataset versus Classes

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

One of the people that I work with and I were going back and forth about
whether its better to return data from a persistence layer as a dataset or as
a class array. So, I throw together a quick form that would generate a
dataset/datatable woth 1001 records and an array of similar classes that has
the same content.

The class array and dataset were generated and then bother were serialized
to memory streams. The length of the memory streams were checked to see what
the actual size of these things were.

I knew that a class array would be a lot more efficient, but I had no idea
that the dataset would be SIX times the size of the class array.

So, I was wondering if anyone else has run across this issue, and how its
affected development with other groups.



......here's the code I used to come up with these numbers..........

Public Class SizeTest
Inherits System.Windows.Forms.Form

#Region " Windows Form Designer generated code "

Public Sub New()
MyBase.New()

'This call is required by the Windows Form Designer.
InitializeComponent()

'Add any initialization after the InitializeComponent() call

End Sub

'Form overrides dispose to clean up the component list.
Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
If disposing Then
If Not (components Is Nothing) Then
components.Dispose()
End If
End If
MyBase.Dispose(disposing)
End Sub

'Required by the Windows Form Designer
Private components As System.ComponentModel.IContainer

'NOTE: The following procedure is required by the Windows Form Designer
'It can be modified using the Windows Form Designer.
'Do not modify it using the code editor.
Friend WithEvents Label1 As System.Windows.Forms.Label
Friend WithEvents Label2 As System.Windows.Forms.Label
Friend WithEvents Label3 As System.Windows.Forms.Label
<System.Diagnostics.DebuggerStepThrough()> Private Sub
InitializeComponent()
Me.Label1 = New System.Windows.Forms.Label
Me.Label2 = New System.Windows.Forms.Label
Me.Label3 = New System.Windows.Forms.Label
Me.SuspendLayout()
'
'Label1
'
Me.Label1.Location = New System.Drawing.Point(0, 0)
Me.Label1.Name = "Label1"
Me.Label1.Size = New System.Drawing.Size(256, 23)
Me.Label1.TabIndex = 0
Me.Label1.Text = "Label1"
'
'Label2
'
Me.Label2.Location = New System.Drawing.Point(0, 32)
Me.Label2.Name = "Label2"
Me.Label2.Size = New System.Drawing.Size(264, 23)
Me.Label2.TabIndex = 1
Me.Label2.Text = "Label2"
'
'Label3
'
Me.Label3.Location = New System.Drawing.Point(0, 64)
Me.Label3.Name = "Label3"
Me.Label3.Size = New System.Drawing.Size(248, 23)
Me.Label3.TabIndex = 2
Me.Label3.Text = "Label3"
'
'SizeTest
'
Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
Me.ClientSize = New System.Drawing.Size(292, 273)
Me.Controls.Add(Me.Label3)
Me.Controls.Add(Me.Label2)
Me.Controls.Add(Me.Label1)
Me.Name = "SizeTest"
Me.Text = "Form1"
Me.ResumeLayout(False)

End Sub

#End Region

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles MyBase.Load
Dim ds As New DataSet
Dim dt As New DataTable
Dim i As Int32
Dim sizeDS As Int64
Dim sizeAr As Int64
Dim comparison As Single

Dim ms As System.IO.MemoryStream
Dim bf As New
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
Dim classArray() As TestClass
Dim al As New ArrayList

'Initialize the Table
ds.Tables.Add(dt)
dt.Columns.Add("Id", GetType(Int32))
dt.Columns.Add("FirstName", GetType(String))
dt.Columns.Add("LastName", GetType(String))
dt.Columns.Add("BirthDate", GetType(DateTime))

'Create the table rows and Class instances
For i = 0 To 1000
dt.Rows.Add(New Object() {i, "Johnathan", "Quagmire",
DateTime.Now})
al.Add(New TestClass(i, "Johnathan", "Quagmire", DateTime.Now))
Next

classArray = CType(al.ToArray(GetType(TestClass)), TestClass())

'Serialize
ms = New System.IO.MemoryStream
bf.Serialize(ms, ds)
sizeDS = ms.Length
Label1.Text = "Serialized Dataset: " & ms.Length.ToString("#,###") &
" bytes"
ms.Close()

ms = New System.IO.MemoryStream
bf.Serialize(ms, classArray)
sizeAr = ms.Length
Label2.Text = "Serialized Class Array: " &
ms.Length.ToString("#,###") & " bytes"
ms.Close()

comparison = Convert.ToSingle(sizeDS) / Convert.ToSingle(sizeAr)
Label3.Text = "The dataset is " & comparison.ToString("0.000") & "
times al large!"

End Sub

<Serializable()> Public Class TestClass
Private _id As Int32
Private _fistName As String
Private _lastName As String
Private _birthDate As DateTime

Public ReadOnly Property id() As Int32
Get
Return _id
End Get
End Property
Public ReadOnly Property FirstName() As String
Get
Return _fistName
End Get
End Property
Public ReadOnly Property LastName() As String
Get
Return _lastName
End Get
End Property
Public ReadOnly Property BirthDate() As DateTime
Get
Return _birthDate
End Get
End Property
Public Sub New(ByVal i As Int32, ByVal f As String, ByVal l As
String, ByVal b As DateTime)
_id = i
_fistName = f
_lastName = l
_birthDate = b
End Sub

End Class
End Class
 
David,

I like to look at a dataset as quick and dirty but not so cheap business
object. In an ideal world you'd want classes, you'd want neat 3 layers, and
you'd want a proper nice translation between datasets and these business
object classes going on, but you'd then also need/want 200 developers on
your team maintaining and testing the tomes of code you are going to write.
In that regard, even though datasets might take more memory, most of us
choose to save our marriage's over a few ram bytes - and in lieu of lesser
work and lesser code to maintain, we choose to go with strongly typed
datasets instead of neat and clean business objects - but not always and not
in every situation - where you demand utmost performance, the extra work
might be warranted.

The biggest problem with datasets in .net framework 1.1 is that their
default serialization EVEN IF you are using BinaryFormatter, is XML, which
is ultra bloated - which is why you are seeing 6 times the difference.

With ADO.NET 2.0, datasets have a remotingformat property that allow you to
seriously compress and improve serialization performance by about 10 times
(Check my blog for a post on RemotingFormat performance comparisons). Also
the dataset when serialized binary will be many orders smaller in size -
maybe even smaller than your class at times.

in .NET framework 1.1, you can almost get the same effect by implementing
DataSetSurrogate (Search microsoft knowledgebase for datasetsurrogate),
though it has a few irritating limitations that especially get well
pronounced when you are over a remoting connection (ask me if this is what
u'll be using datasets for).

Hope this helped.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
 
Well, We stopped using datasets a while ago, anyway, since you can't
encapsulate business rules inside of them. I was just a little shocked at
the difference in serialization.

As far as the dataset always serializing to XML.....well, internally its
stored as XML so it makes sense.

What I was going serializing the two of them was really checking what's
required to "rehydrate" an instance....basically determining how large its
state was. Whis would eb directly proportional to the amount of memory it
takes up.
 
Back
Top