When to use new DataSet? When to perform DataSet resident joins?

  • Thread starter Thread starter Siegfried Heintze
  • Start date Start date
S

Siegfried Heintze

Whenever I create a new database query, I have to answer a couple of
questions that are even harder to ignore if you are using visual studio:

(1) Should I create a new table in an existing dataset or a new dataset? Is
there peril in having one table per dataset and having lots of data sets?
(2) Should I use a generic late binding dataset or have visual studio create
a custom early binding dataset class for me?
(3) Should I implement my one to many relationship in the SQL with an INNER
JOIN clause or should I create two datatables and join them in the dataset
using the visual studio dataset GUI designer using drag and drop?
(4) If I have a many to many relationship, is it possible to implement this
in the dataset GUI designer instead of in the SQL with INNER JOIN
statements? What would be the advantage?
(5) Am I going to supply the dataset with add, delete and update objects?
What is the advantage of assigning these Command objects to the Dataset
instead of just calling them directly as needed if they are needed?
(5a) When you are configuring a new data adapter, you get the oppertunity to
let visual studio automatically generate the SQL statements to INSERT,
DELETE and UPDATE and assign their respective command objects to the
dataset. I've done it and it even worked, but I did not feel like I
understood it. How does visual studio know how to automatically generate the
SQL update, add and delete SQL statements when you have a complex join
statement anyway?
(5b) When is visual studio not able to automatically generate the UPDATE,
DELETE, and INSERT SQL statements and their respective command objects? I
received a message from Visual Studio recently complaining that I was not
using enough PK to automatically generate the UPDATE, DELETE and INSERT SQL
statements. This was OK since I could not imagine how Visual Studio was
going to know how to implement these statements anyway. Was it because I was
joining some fields that were secondary indices?

I've successfully used the dataset GUI designer to connect the foreign key
field to the PK of another table. (I'm not talking about the query designer
here -- this is connecting fields in the dataset after they have been read
out of the database tables). It worked. I was impressed but I was not (still
not) clear on the advantage. When I tried to do the same thing with a many
to many I could not figure out how to do it.

So I'd love some help on the perils and merits to each approach!
Thanks,
Siegfried
 
(1) Should I create a new table in an existing dataset or a new dataset?
Is
there peril in having one table per dataset and having lots of data sets?

In fact there is no reason to create a new one, however given the fact that
in Net 1.x removing tables and rows is very slow. Is it in that a good
alternative this is improved in 2.0.
(2) Should I use a generic late binding dataset or have visual studio
create
a custom early binding dataset class for me?

You can use direct the raw dataset class, you can use the by the designer
generated strongly typed dataset (inherited from the raw dataset class) or
you can create your own strongly typed dataset class.

If the generated strongly typed dataset class was easy to update as a class.
I would forever prefer that one forever. That it is not so and can be a
reason to build your own or just to use the raw dataset. (If you have a
table that almost never anymore changes than I advice you to use the
designer build strongly typed dataset).
(3) Should I implement my one to many relationship in the SQL with an
INNER
JOIN clause or should I create two datatables and join them in the dataset
using the visual studio dataset GUI designer using drag and drop?

Joined tables in a database create *one* combined datatable in a dataset. So
it is up to what you want. For the joined table you cannot AFAIK use the
designer to create update, delete or insert commands.
(4) If I have a many to many relationship, is it possible to implement
this
in the dataset GUI designer instead of in the SQL with INNER JOIN
statements? What would be the advantage?

Yes it is than a part of the dataset. A dataset holds beside tables the
relations between them. As I previous already mentioned is the Jon no
opinion in this.
(5) Am I going to supply the dataset with add, delete and update objects?

No the datarows will be duplicated in the datatables where they have there
rowstate and the changes in them. This will be until an "acceptchanges" are
done and the changes are processed inside the original datarows.

What is the advantage of assigning these Command objects to the Dataset
instead of just calling them directly as needed if they are needed?

There is no advantage than that you use an uniform methode provided by Net.
If you have time enough you can build everything yourself.
(5a) When you are configuring a new data adapter, you get the oppertunity
to
let visual studio automatically generate the SQL statements to INSERT,
DELETE and UPDATE and assign their respective command objects to the
dataset. I've done it and it even worked, but I did not feel like I
understood it. How does visual studio know how to automatically generate
the
SQL update, add and delete SQL statements when you have a complex join
statement anyway?

It gets from the SQL statement the data and therefore is it able to build
those.
With a Join it does not build those statements.
(5b) When is visual studio not able to automatically generate the UPDATE,
DELETE, and INSERT SQL statements and their respective command objects? I
received a message from Visual Studio recently complaining that I was not
using enough PK to automatically generate the UPDATE, DELETE and INSERT
SQL
statements. This was OK since I could not imagine how Visual Studio was
going to know how to implement these statements anyway. Was it because I
was
joining some fields that were secondary indices?

No with joining AFAIK does it never build those update delete and inserts.
To do an update, delete and insert you need forever to have the primary key,
even if it was without a dataset.
I've successfully used the dataset GUI designer to connect the foreign key
field to the PK of another table. (I'm not talking about the query
designer
here -- this is connecting fields in the dataset after they have been read
out of the database tables). It worked. I was impressed but I was not
(still
not) clear on the advantage. When I tried to do the same thing with a many
to many I could not figure out how to do it.
If you don't use the designer for the datast than many people probably
create the datarelation at runtime, which is very easy to do.

I hope this helps,

Cor
 
Comments inline:
Siegfried Heintze said:
Whenever I create a new database query, I have to answer a couple of
questions that are even harder to ignore if you are using visual studio:

(1) Should I create a new table in an existing dataset or a new dataset?
Is
there peril in having one table per dataset and having lots of data sets?
--It depends. If you use Typed DataSets exclusively, then you'll often have
multiple datasets instead of just one. While you could use just one, in
many cases this isn't practical. have multiple datasets like this allow you
to use Strong typing which is a peformance booster, provides intellisense
support and will catch column and table name changes at compile time. If
you are using Typed datasets, then you need to know what they'll look like
in advance, so you pretty much need to go the 'one dataset for each type"
route. There are many times that features like WriteXML are only available
to datasets too, so you may need to stick a table in a dataset just to get
the funcionality. It's probably a bit of an oversmiplification to say
this, but the more that you are using your datasets as business objects, the
more you'll probably want to lean toward having one dataset for each entity.
(2) Should I use a generic late binding dataset or have visual studio
create
a custom early binding dataset class for me?
--If you are in a scenario where you don't know what the data is goign to
look like then you have to use untyped datasets. Otherwise you should
probably opt for typed datasets. Typed ones perform better, have
intellisense support and are much easier to work with (for isntance, you
don't have to worry about remembering column names or table names). But,
there may be times when you aren't sure what a query will look like. In
those cases, you need to use untyped ones. Also, if you go the route of
Typed datasets, you probably want to make sure you innclude TableMappings
and ColumnMappings (one nice way to augment the data access application
block is to add overloads for this) so that if your field names in the db
change (or their aliases), you can just change the column mappings and be
done with it. Either way though, if you're using typed datasets and the
column or table names change, you're going to have to do some modification.
So you gain a lot by using typed datasets but you lose flexibility. In my
personal experience, we've done some pretty large state projects using only
typed datasets b/c we knew what our data would look like. We've also done
two where we used mainly untyped ones b/c the user create the query on the
fly. Using typed datasets, this would have been impossible. On the other
hand, if we didn't use typed datasets in the first case, there would have
no doubt been a lot more bugs involved and we wouldn't have caught breaking
changes as quickly. Given the choice, I have a STRONG bias toward typed
datasets unless I absolutely can't use them.
(3) Should I implement my one to many relationship in the SQL with an
INNER
JOIN clause or should I create two datatables and join them in the dataset
using the visual studio dataset GUI designer using drag and drop?
--Those aren't the only two choices per se, you can add a DataRelation
programatically ;-) (I know, it's a distinction without a difference).
However for performance reasons, update ease and data integrity, use
multiple tables attaching them with DataRelations where applicable. If you
do a join on a three colum table with 1000 rows to a 1 column table with say
2 rows, you'll be pulling over 4 columns worth of data, 1000 times. If you
use a datarelation, you'll pull over 3 columns 1000 times and 1 column 2
times. This is going to greatly reduce network conjestion and use a lot
less resources on the client machine (the extent of this depends on the
query but as you can see by this example, it doesn't take much before things
are dramatically different)
(4) If I have a many to many relationship, is it possible to implement
this
in the dataset GUI designer instead of in the SQL with INNER JOIN
statements?
--Yes (Sahil Malik's new Professional ADO.NET 2.0 has a great example of
doing many-to-many relationships
http://www.amazon.com/gp/product/15...3?v=glance&n=283155&n=507846&s=books&v=glance)
What would be the advantage?
-The same as above, just to an even greater extent.
(5) Am I going to supply the dataset with add, delete and update objects?
What is the advantage of assigning these Command objects to the Dataset
instead of just calling them directly as needed if they are needed?
--You assing these to the DataAdapter. You can delete, add, update from a
dataset, but the commands that fire against the db are done via a
DataAdapter. The benefit is calling
DataAdapter.Update(myDataSet.MYdataTable);
vs looping through each one, figuring out what needs done, building the
command each time and firing it. It's a lot cleaner and easier to
maintain - and a lot less error prone to just configure your adapter and
call Update.
(5a) When you are configuring a new data adapter, you get the oppertunity
to
let visual studio automatically generate the SQL statements to INSERT,
DELETE and UPDATE and assign their respective command objects to the
dataset. I've done it and it even worked, but I did not feel like I
understood it. How does visual studio know how to automatically generate
the
SQL update, add and delete SQL statements when you have a complex join
statement anyway?
--If can't generate anything but Select commands on a Joined scenario.
Otherwise for single tables, it just looks to the primary key to build the
statements (you need a PK on each table in order for the CommandBuilder or
the Designer to work).
(5b) When is visual studio not able to automatically generate the UPDATE,
DELETE, and INSERT SQL statements and their respective command objects? I
received a message from Visual Studio recently complaining that I was not
using enough PK to automatically generate the UPDATE, DELETE and INSERT
SQL
statements. This was OK since I could not imagine how Visual Studio was
going to know how to implement these statements anyway. Was it because I
was
joining some fields that were secondary indices?
--You must have PK for this to work and it only works on single table
queries.
 
Back
Top