how big can disconnected dataset be?

  • Thread starter Thread starter Rich
  • Start date Start date
R

Rich

Hello,

I have to read and write around one million records from
an external data source to Sql Server2k every night.
That's a lot of I/O. I am using VB6 for this (takes
hours). I am connecting to the external data source with
API's from its object library (as opposed to ODBC in which
case I would just use DTS to pull this data) and just
looping. I was thinking that with VB.Net I could read
this data into a dataset in memory (currently have 4 gigs
of mem - may upgrade to 16 gigs if they let me have win2K
Advanced Server - currently 2 processors 1 gig each). My
idea is that if I could just read the data into memory (a
dataset) it would be much faster than writing each record
(180 fields per record). Then I could just do a
dataAdapter.Update or InsertInto to Sql Server once the
data is all in local memory. Any thoughts?

While I'm at it, the records come from 4 different
sources. I was thinking about using multi-threading and
pull the data simultaneously. I am aware than Sql
Server2k only supports 4 gigs a mem. But if I have more
than 4 gigs of data can one VB.Net app manage datasets in
more than 4 gigs of memory? Once I fill my datasets, I
would just do one InsertInto at a time. How is VB.Net for
multi-threading? Again, in VB6 I invoke 4 separate apps
which simultaneously pull the data each night. They write
to 4 separate table in Sql Server2k. I would really like
to have all this in one app and read/write directly to
memory. Is VB.Net a way to do this?

Thanks,
Rich
 
">
I have to read and write around one million records from
an external data source to Sql Server2k every night.
That's a lot of I/O.

Yes it is... =)
I am using VB6 for this (takes
hours). I am connecting to the external data source with
API's from its object library (as opposed to ODBC in which
case I would just use DTS to pull this data) and just
looping.

I'm assuming this is a proprietary data format?
I was thinking that with VB.Net I could read
this data into a dataset in memory (currently have 4 gigs
of mem - may upgrade to 16 gigs if they let me have win2K
Advanced Server - currently 2 processors 1 gig each). My
idea is that if I could just read the data into memory (a
dataset) it would be much faster than writing each record
(180 fields per record). Then I could just do a
dataAdapter.Update or InsertInto to Sql Server once the
data is all in local memory. Any thoughts?

Your still going to have a downturn on speed on the update/insert. I'm not
sure if it will cache data on your local disk or not depending on the amount
of memory. But you are still going to be subject to the speed of your SQL
Server. So you can have it all queued up, but its not necessarily going to
make it faster.
While I'm at it, the records come from 4 different
sources. I was thinking about using multi-threading and
pull the data simultaneously. I am aware than Sql
Server2k only supports 4 gigs a mem. But if I have more
than 4 gigs of data can one VB.Net app manage datasets in
more than 4 gigs of memory? Once I fill my datasets, I
would just do one InsertInto at a time. How is VB.Net for
multi-threading? Again, in VB6 I invoke 4 separate apps
which simultaneously pull the data each night. They write
to 4 separate table in Sql Server2k. I would really like
to have all this in one app and read/write directly to
memory. Is VB.Net a way to do this?

Many ways, I just don't know the benefit of it from what your giving us.
Yeah your reads and loading will be fast, but is this program going to run
on the SQL server itself? If your looking to reduce your amount of network
traffic, yeah this will solve your problem, if your looking to speed up
injection into your SQL server, not gonna do it... =(

You could use a more updated OLEDB driver for SQL Server, which may shave
some time off because of optimized programming, but again, don't know how
much of a gain you will actually get.

-CJ
 
Thanks. I forgot to mention, yes, reducing network
traffice, or rather, get all the data to the local server
and close the connection would be a good thing to
achieve. The other data source is Lotus Notes R5. I have
to pull data from 4 separate Notes DB's (nsf's). Not my
design. I could take it or leave it.

So you day that it would be possible to fill the datasets
(disconnected datasets) this way? Thinking outloud, I
have 4 tables in Sql Server and would have 4 datasets in
my VB.net app. Then, with multi-threading, I would start
pulling data into each of these datasets simultaneously.
I have 4 gigs of mem right now. The deal is that the old
way I would just write the data directly to the Sql
Tables. Here I load the data to memory as fast as I can,
close my connection to Lotus Notes (using Domino Object
Library here) and then start pushing the data into Sql
Server from the datasets. Wouldn't this be better, more
efficient than having 4 com based apps reading and writing
data 250,000+ times a piece?
 
Rich said:
Thanks. I forgot to mention, yes, reducing network
traffice, or rather, get all the data to the local server
and close the connection would be a good thing to
achieve. The other data source is Lotus Notes R5. I have
to pull data from 4 separate Notes DB's (nsf's). Not my
design. I could take it or leave it.

I coded notes (4.5 and 5) for 2 years... I can honestly say that I have
never looked on a project and said "You know what would be a good
solution... Notes..."

I know it has its advantages, but coming from a developer standpoint it
kinda sucks and is INCREDIBLY slow...
So you day that it would be possible to fill the datasets
(disconnected datasets) this way? Thinking outloud, I
have 4 tables in Sql Server and would have 4 datasets in
my VB.net app.

Yeah, that would be fine. I'm sure the Notes API is slowing you down as it
is... So this will reduce your overall network traffic (well, amount of
time used for network traffic, you'll have a big burst at the beginning and
then nothing).
Then, with multi-threading, I would start
pulling data into each of these datasets simultaneously.
I have 4 gigs of mem right now. The deal is that the old
way I would just write the data directly to the Sql
Tables. Here I load the data to memory as fast as I can,
close my connection to Lotus Notes (using Domino Object
Library here) and then start pushing the data into Sql
Server from the datasets. Wouldn't this be better, more
efficient than having 4 com based apps reading and writing
data 250,000+ times a piece?

For network traffic, yes, it would be better. I don't think you will see a
huge performance increase on your SQL server, but I'm sure you will see some
(as you now are not requesting data from a foreign source for each record).

Nonetheless... 180 fields, 1 million records, still going to take al ittl
etime to get in there. =)

-CJ
 
Hi Rich,

I was making a message, but I see that my points of addition are only minors
on the text from CJ.

(A little thing, if your server goes down in the middle of an update with a
dataset, do you know what that does mean for you?)
You could use a more updated OLEDB driver for SQL Server, which may shave
some time off because of optimized programming, but again, don't know how
much of a gain you will actually get.

This is the only thing that I will make a real addition.

Have a look at the SQL provider.

http://msdn.microsoft.com/library/d...y/en-us/cpguide/html/cpconadonetproviders.asp

Cor
 
Cor said:
Hi Rich,

I was making a message, but I see that my points of addition are only minors
on the text from CJ.

(A little thing, if your server goes down in the middle of an update with a
dataset, do you know what that does mean for you?)

A runtime exception.

=)
This is the only thing that I will make a real addition.

Have a look at the SQL provider.

http://msdn.microsoft.com/library/d...y/en-us/cpguide/html/cpconadonetproviders.asp

Excellent point.
 
Thank you all for your replies. At least now I can start
focusing on how to working multithreading. Plus, since I
am still relatively new to VB.Net I'm still having a
little trouble using external references - like the
proper/correct syntax (like do I still goto References and
select the library object I need for Domino Library - or
do I use an Import statement). Oh well, that will be for
my next post :).

Thanks all,
Rich
 
I've spent 7 years coding Notes 4.5, 5.x, and 6, all I can can say is that,
there are advantages in certain design scanarios. Notes is an easy place to
code up an application quickly amd some requirements lend themselves well to
notes.

However, having said that, Notes is dying out and will continue to do so,
there are so many better things around to take you away from it.

Regards - OHM
 
I've spent 7 years coding Notes 4.5, 5.x, and 6, all I can can say is that,
there are advantages in certain design scanarios. Notes is an easy place to
code up an application quickly amd some requirements lend themselves well to
notes.

Thats why I made my second comment that I know it has its advantages (sweet
irony, I found a use today in a conversation.. I didn't bring it up though.
=))

However, having said that, Notes is dying out and will continue to do so,
there are so many better things around to take you away from it.

Exactly...

Btw,
Where the hell have you been? I don't think I've seen a post from you in
ages!
 
Ive been ultra busy in a new job. Ive got some work creating Labs for a a
Major company dealing with some very new stuff, SmartDocuments, SmartTags,
MapPoint etc. All good stuff

Regards - OHM
 
Ive been ultra busy in a new job. Ive got some work creating Labs for a a
Major company dealing with some very new stuff, SmartDocuments, SmartTags,
MapPoint etc. All good stuff

That is fantastic. I was wondering what happened to you, but sounds like
you get to deal with a lot more exciting stuff than we do sometimes. Good
luck with that and try to stop in every now and then. =)

-CJ
 
Back
Top