datasets - working with alot of data

  • Thread starter Thread starter Marcin Wiszowaty
  • Start date Start date
M

Marcin Wiszowaty

Hello,

I am working with databases,datasets, sqlserver 2000, vb.net, vs.net 03

But this is more of a general question.

I like to do my dataacces by reading in the whole db table into a
dataset(ds) and then doing searches inside of that ds. Change the found data
and then at program completion updata that ds to the table in the db. I like
this way because i can use the intelisense by saing row.columnname and
dont have to wory abour sql querry laguage. I drag a data adapter onto the
designer specify a select query and it generates the insert and update for
me.

Is this the way it should be done?
I am working on a project now that will potentialy have millions of records
in that table. (After a while hundreds of millions, but am planing of moving
the oldest million records into a history table, history will probably not
be used ever) Which i will then read into a dataset and haveto search
through. The size will slow this down.
Is the better solution to read in smaller chunks and query the db like 100
times per program run to make the dataset smaller or to move older
information into a history table?
Should i have broken this big table into some 5 smaller tables with same
exact column names about 30 columns in all tables?

As you can see i have very little experience with this.

Another question is that i am now using a reqular int colum as the primary
key. None of the other data is unique. I could have made a primary key out
of 4 other varchar columns. Should i have done that? The idea there was that
the 4 columns would have slowed data access as per article i found online.
But now i am limited to 2^31-1 primary key values and if i move the records
into history table i will have to decrement this column for each record in
this table (which will always stay at about 1million records). Anyone know
of an easy way to do this? Tables without a primary key would make updates
much harder and the querys would not be generated for me automaticaly by
visual studio.

Please give me your experienced input on this matter. Hopefully the lenght
of this post will not scare people off from reading it.

I will try to post the all helpfull inputs on this topic to other sites
because i beleive this is important and i couldnt find the answears
elsewere.

Thank you.
 
Marcin Wiszowaty said:
Hello,

I am working with databases,datasets, sqlserver 2000, vb.net, vs.net 03

But this is more of a general question.

I like to do my dataacces by reading in the whole db table into a
dataset(ds) and then doing searches inside of that ds. Change the found
data and then at program completion updata that ds to the table in the db.
I like this way because i can use the intelisense by saing
row.columnname and dont have to wory abour sql querry laguage. I drag a
data adapter onto the designer specify a select query and it generates the
insert and update for me.

Is this the way it should be done?

I would say no. Why?

1. It will not scale
2. It eventually will not perform (reason for not scaling)
I am working on a project now that will potentialy have millions of
records in that table. (After a while hundreds of millions, but am planing
of moving the oldest million records into a history table, history will
probably not be used ever) Which i will then read into a dataset and
haveto search through. The size will slow this down.
Is the better solution to read in smaller chunks and query the db like 100
times per program run to make the dataset smaller or to move older
information into a history table?

It would be better to only pull the data that fits your criteria and change
it.
Should i have broken this big table into some 5 smaller tables with same
exact column names about 30 columns in all tables?

I generally do not like horizontal splitting of tables in this manner,
unless there is some sensible reason (archive versus live data, for
example).
As you can see i have very little experience with this.

Another question is that i am now using a reqular int colum as the primary
key. None of the other data is unique. I could have made a primary key out
of 4 other varchar columns. Should i have done that?

In my opinion, you did the right thing. Two reasons:

1. Primary Keys are derived. In most cases, they have no real world meaning.
While there are applications where users know the primary key (especially on
type tables), you can have another "user friendly" key. I have found that
"user friendly" keys change, which can be a mess if it is the primary key.

2. Composite keys (over multiple fields) run slower. If you need to ensure a
unique, you can put the four fields into a unique constraint.
The idea there was that the 4 columns would have slowed data access as per
article i found online.

And probably would. It would also cause issues if you had clustered that key
(put into the database in logical order (I would say physical, but that is
not always true at the Server level)), especially on inserts.
But now i am limited to 2^31-1 primary key values and if i move the
records into history table i will have to decrement this column for each
record in this table (which will always stay at about 1million records).
Anyone know of an easy way to do this?

You can also move up to BigInt. It will require a small amount of retooling,
but it might be an easier way to solve your archiving problem.
Tables without a primary key would make updates much harder and the querys
would not be generated for me automaticaly by visual studio.

I have no tables in my current database without a primary key. I can see a
few instances where it might make sense, but it generally does not.
Please give me your experienced input on this matter. Hopefully the lenght
of this post will not scare people off from reading it.

I will try to post the all helpfull inputs on this topic to other sites
because i beleive this is important and i couldnt find the answears
elsewere.

In general, you are better to filter your data in the database. There are
two ways to do this.

1. Reduce the number of columns returned
2. Reduce the number of rows returned

This is especially true with very large data sets that have to have certain
rows updated. I would also consider doing the updates completely in SQL, if
a user does not have to create the change. For example, process yesterday's
records and mark them as processed. Maybe not the best example, but I can
see instances where it works like that.

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
 
Back
Top