Large Datasets

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hello All,

Does anyone have any suggestions on working with large datasets (10GB -
100GB)?

Best initialization?
Transfer from dataset to database?

Anything would be helpful, articles, code examples, discussions, etc

Thanks In Advance
 
Hi,

Is there really a need to?
Can't you use just a part of it at a time?
 
Let's just say in theory that it is neccesary, any recommendations?

Miha Markic said:
Hi,

Is there really a need to?
Can't you use just a part of it at a time?

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
SLODUG - Slovene Developer Users Group
www.rthand.com

Extreme Datasets said:
Hello All,

Does anyone have any suggestions on working with large datasets (10GB -
100GB)?

Best initialization?
Transfer from dataset to database?

Anything would be helpful, articles, code examples, discussions, etc

Thanks In Advance
 
System.Data.Dataset won't do for such large amount of data - that is just
not what it is meant to do. You would have to write your own class.

One approach is discussed here -
http://groups.google.com/[email protected]&rnum=2

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik




Extreme Datasets said:
Let's just say in theory that it is neccesary, any recommendations?

Miha Markic said:
Hi,

Is there really a need to?
Can't you use just a part of it at a time?

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
SLODUG - Slovene Developer Users Group
www.rthand.com

Extreme Datasets said:
Hello All,

Does anyone have any suggestions on working with large datasets (10GB -
100GB)?

Best initialization?
Transfer from dataset to database?

Anything would be helpful, articles, code examples, discussions, etc

Thanks In Advance
 
In .Net 2.0 the DataSet and it's related classes have been significantly
extended to scale and perform for large number of rows. However 10GB - 100GB
is kind of large for hosting in a single in-memory data structure. Are you
considering using 64 bit machines, as simple 32 bit machines may only go
upto 4GB?.

It'd help if you give more details, for instance
1. Are you considering partitioning data across multiple systems or does it
have to be constrained to a single system ?
2. Does the complete 10GB - 100GB data needs to be cached in main memory or
can it be paged in from secondary storage
3. What is the performance requirement, is it more around insert, update and
delete or querying? is it a mix of all of these
4. What is the kind of querying support in terms of complexity of
expressions and performance you are looking for.

you may want to take a look at the following article that describes some of
the dataset related enhancements in .NET 2.0
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnadonet/html/datasetenhance.asp

Thanks,
Kawarjit Bedi

Program Manager - ADO.NET Team
Microsoft Corp.

This posting is provided "AS IS" with no warranties, and confers no rights.
 
Hello Kawarjit,

Thanks for the in-depth analysis.

1. Are you considering partitioning data across multiple systems or does it
have to be constrained to a single system ?

Multiple systems->What would be the best partitioning strategy?

2. Does the complete 10GB - 100GB data needs to be cached in main memory or
can it be paged in from secondary storage?

Main mem would be nice but probably not possible.->What would you suggest
for secondary storage?

3. What is the performance requirement, is it more around insert, update and
delete or querying? is it a mix of all of these

Query speed would be vital, this would be the only operation done.

4. What is the kind of querying support in terms of complexity of
expressions and performance you are looking for.

Mild complexity, again query performance would be vital.

I really appreciate your feedback Kawarjit.

Thanks
 
Back
Top