Huge data

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi,

i'm trying to retreive the data from a table that has lots of columns and
thousands of records. The problem is that I have to display all of it, so I
need to retreive it all in one time. When I tried it using a dataset, I
received the exception "Out of memory". Is there any other way to get the
data from the database knowing that I have to work with it and eventually to
update it?

Thanks
 
jy said:
i'm trying to retreive the data from a table that has lots of columns and
thousands of records. The problem is that I have to display all of it, so I
need to retreive it all in one time. When I tried it using a dataset, I
received the exception "Out of memory". Is there any other way to get the
data from the database knowing that I have to work with it and eventually to
update it?

Just how much data are we talking about here? Could you maybe buffer it
to disk, if there's so much of it? To be honest, if you've got more
data than is fitting into memory, but you need to access all of it, I
think it's a pretty tall order.
 
Yeah, storing the human DNA sequence in a dataset you´re bould to get out of
memory warnings sooner or latter.

Now for thousands of rows, unless those rows are composed by hundreds of
columns with BLOB info, you should be fine.

what exactly are you using, Datatable?
 
JY,

How are you displaying the data, in my opinion will any normal control go
totally out of its posibilities to show data to a user and freeze.

Cor
 
The tables I need to display and deal with are extremely big. It's about
stock mouvements in a company. There are 2 tables, the master and the detail.
I am using a dataset to retreive them because i also need to insert / delete
/ update records. You can imagine how fast such tables can grow in a
supermarket or a grocery store. I don't think I will get lots of troubles in
reporting because it's a select based query, but if I have to display it and
update it, I might get some troubles.

Miha Markic said:
Can't you filter the data or something?

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
miha at rthand com
www.rthand.com

jy said:
Hi,

i'm trying to retreive the data from a table that has lots of columns and
thousands of records. The problem is that I have to display all of it, so
I
need to retreive it all in one time. When I tried it using a dataset, I
received the exception "Out of memory". Is there any other way to get the
data from the database knowing that I have to work with it and eventually
to
update it?

Thanks
 
jy said:
The tables I need to display and deal with are extremely big. It's about
stock mouvements in a company. There are 2 tables, the master and the detail.
I am using a dataset to retreive them because i also need to insert / delete
/ update records. You can imagine how fast such tables can grow in a
supermarket or a grocery store. I don't think I will get lots of troubles in
reporting because it's a select based query, but if I have to display it and
update it, I might get some troubles.

But why do you need to have the *whole* table in memory? That's what
everyone's asking, I think.

If you're asked to display the whole table, display it in pages,
fetching one page at a time.
 
Well, I need to put the whole table in memory, or at least all the columns
that I need (which represents about 80% of all the columns), to be able to
use a DataViewManager and search the records for specific information. In
fact, i will probably display the data page by page but few questions come up:
1- what will i use to fetch the data row by row?
2- won't it be too slow?
3- How will I search the tables for specific rows?
4- Will it be possible to update / delete / insert rows?

Thanks
 
jy said:
Well, I need to put the whole table in memory, or at least all the columns
that I need (which represents about 80% of all the columns), to be able to
use a DataViewManager and search the records for specific information. In
fact, i will probably display the data page by page but few questions come up:
1- what will i use to fetch the data row by row?

Well, you *could* use DataReader, but I wouldn't recommend that. Just
use a select statement that selects a single row, or a page-worth of
rows.
2- won't it be too slow?

Have you tried it? We don't know much about your database - you haven't
told us much information about the situation.
3- How will I search the tables for specific rows?

Using a select statement.
4- Will it be possible to update / delete / insert rows?

Yes. For update/delete, you use a row you've already fetched. For
insert, you don't need any existing data, do you?
 
jy said:
Well, I need to put the whole table in memory, or at least all the columns
that I need (which represents about 80% of all the columns), to be able to
use a DataViewManager and search the records for specific information. In
fact, i will probably display the data page by page but few questions come
up:
1- what will i use to fetch the data row by row?

an adapter.
2- won't it be too slow?

Depends on how you do it and what you call slow.
3- How will I search the tables for specific rows?

You mean tables in database? Sql statement of course.
4- Will it be possible to update / delete / insert rows?

Yes.
 
Actually it looks to me as you need to re-thing your architecture.

If you tell us what you are really trying to do, we might just be able to
offer more precise suggestions
 
Jon Skeet said:
Well, you *could* use DataReader, but I wouldn't recommend that. Just
use a select statement that selects a single row, or a page-worth of
rows.

I haven't much used dataReaders. Can you tell me how to select a group of
rows and sequentially call the next group when needed?
Have you tried it? We don't know much about your database - you haven't
told us much information about the situation.

Well, the database is made of over 1000 Tables. You have around 50 users.
It's an Oracle 9i DataBase. I haven't tried it but I suspect that having to
connect everytime to the database to get a record or a group of records will
take more time than retreiving all the data and work with it offline, no?
Using a select statement.

Yes, but by using select statement, I will lose the feature of
DataViewManager to filter locally the rows and displaying the rows I need at
any time. Plus, I will have to send a query to the database everytime.
Yes. For update/delete, you use a row you've already fetched. For
insert, you don't need any existing data, do you?

Yes but it won't be working like the rest of the application where I could
update many rows before send the commit to the database.

In a way, I know that all these arguements are minor problems, but I need to
know if it's possible in this case to make it work like the rest of the
application (DataViewManager filter, multiple update before commit, ...), or
if it will have to be different.

Thanks
 
jy said:
I haven't much used dataReaders. Can you tell me how to select a group of
rows and sequentially call the next group when needed?

Doing that would be a bad idea, IMO. You really don't want to use
DataReader here.
Well, the database is made of over 1000 Tables. You have around 50 users.
It's an Oracle 9i DataBase. I haven't tried it but I suspect that having to
connect everytime to the database to get a record or a group of records will
take more time than retreiving all the data and work with it offline, no?

Not necessarily - and if working with it offline just isn't an option,
as it seems it isn't, given the exception you've got, surely it's
better to have something that works than something that doesn't.

Note that connecting every time doesn't necessarily mean making a new
underlying database connection each time - connection pooling takes
care of that for you.
Yes, but by using select statement, I will lose the feature of
DataViewManager to filter locally the rows and displaying the rows I need at
any time. Plus, I will have to send a query to the database everytime.

And do you have evidence (rather than just guesses) that that will be
too slow?

Don't forget that you could always fetch, say, 100 records and then
allow further filtering within those.
Yes but it won't be working like the rest of the application where I could
update many rows before send the commit to the database.

So keep a table with all the data you're updating and then send it in
one go - that still doesn't mean you need to have the whole table in
memory at a time, unless you're updating every row in the table.
In a way, I know that all these arguements are minor problems, but I need to
know if it's possible in this case to make it work like the rest of the
application (DataViewManager filter, multiple update before commit, ...), or
if it will have to be different.

If you can't fit the whole of the data into memory, as it looks like
you can't, then it will clearly have to be different.
 
I agree. Even Oracle permits you to write server-side executables that
precludes the need to move the "database" to the client to perform changes.
One of the most important features of a serious DBMS (as in not JET) is
server-side executables. These permit you to execute intelligent updates and
data summaries without having to bring the data to the client. Sure you're
running out of memory. You're asking the server to transport the data
(probably a gig or more) into client memory. It will never work. I also
think you need to reevaluate your approach.

--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
Microsoft MVP
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
 
A DataReader is your best option, as it is a forward only stream. As
mentioned elsewhere, you need buffering of sorts. Best option is to page the
data. Since the DataSet ends up too large, you need to fake this. Best method
is to output the data in a specific order and use the ordering column for the
next item. If you can use a primary key field, it is easiest, but Date
fields, with a filter for the specific data you need, can work.

Paging forward is releatively easy, by a single page, as you can use the end
record as a pointer. Once forward one or more pages, you can use the start
record. This is a basic "linked list" pattern. The issue comes when someone
wants to move forward to a specific "page" (eg, click on page 22).

If you need to be able to move forward to specific pages, you will have to
create a separate map file and store the start and finish pointers for each
page. Using this map (a cheap form of index), you can pull records from page
X, pointer begin to page X, pointer end with each return trip, based on
knowing the X value.

There is at least one issue here. If a row is added, your map has to reflect
this. You will either have to refresh the map regularly, or use something
like SQL Notification Services to keep the map to date. If a record is ever
inserted in the middle, it could confuse your users. As you are dealing with
stock data, which is time driven, this is unlikely. UPDATEs are not an issue,
as long as the sorted column (the one you are paging on) does not change (ie,
user changes primary key or date value).

As long as the map is set up and kept up, you can page through records. This
will give you back the option of using a DataSet, as you will pull 10, 20, ?
rows at a time, rather than 1000s.

NOTE: For reporting, you will still have to pull mass amounts of data, with
the exception of summary reports, which can be churned on the SQL Server side.

---

Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

***************************
Think Outside the Box!
***************************
 
Playing devil's advocate (as I agree with server side processing in many
cases):
Overall, if he can reduce what the user is viewing at any one time, there is
no need for server side executables. If this is a report, a DataReader is
most likely the best method.

---

Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

***************************
Think Outside the Box!
***************************
 
I think I have a solution to this problem.

I faced a similar situation in believe it or not a financial system, so I
can understand the pain.

What I did was, I inherited from DataSet and DataTable to create
LargeDataset and LargeDatatable, and I would simply page parts of the
dataset as requested by the callee from the disk to and from temp files -
sort of like writing your own virtual memory paging scheme. Albeit my
implementation was fairly simple compared to windows Vmem handling the idea
being you have to reduce page faults, and my data access was fairly linear
in nature so I could simply cut up the rows into smaller configurable chunks
and page them as needed.

The good news is, since the inherited dataset is nothing but a dataset
(inherited), you can use it to bind, or basically anything a dataset might
do. But it's quite a lot of work overriding every behavior, and well the
class(es) I ended up writing were multiple pages long.

But at the end of the day - it all worked !! :)

- Sahil Malik
You can reach me thru my blog at
http://www.dotnetjunkies.com/weblog/sahilmalik
 
I think I have a solution to this problem.

I faced a similar situation in believe it or not a financial system, so I
can understand the pain.

What I did was, I inherited from DataSet and DataTable to create
LargeDataset and LargeDatatable, and I would simply page parts of the
dataset as requested by the callee from the disk to and from temp files -
sort of like writing your own virtual memory paging scheme. Albeit my
implementation was fairly simple compared to windows Vmem handling the idea
being you have to reduce page faults, and my data access was fairly linear
in nature so I could simply cut up the rows into smaller configurable chunks
and page them as needed.

The good news is, since the inherited dataset is nothing but a dataset
(inherited), you can use it to bind, or basically anything a dataset might
do. But it's quite a lot of work overriding every behavior, and well the
class(es) I ended up writing were multiple pages long.

But at the end of the day - it all worked !! :)

- Sahil Malik
You can reach me thru my blog at
http://www.dotnetjunkies.com/weblog/sahilmalik
 
I must clarify why I went with the approach that I suggested in my previous
post.

It is because, the source of my dataset was not a database, but a remote
server I was synching with. And yes I know it is a bad idea to use ADO.NET
objects to synch databases, but in real life we face things that are not
ideal and I see them everyday :-/ (though I recommended SqlBulkCopy to them
so they might be able to alleviate some pain now (don't work there no
more)).

- Sahil Malik
You can reach me thru my blog at
http://www.dotnetjunkies.com/weblog/sahilmalik
 
I have to add another peice of relevant info - (sorry)

In most financial systems where you need to deal with so much data; you
would need to write your own data structure that is suited to your specific
needs and is more tightly packed than a dataset is. Consider using your own
business object representation as an easier alternative to what I just
suggested.

There's a reason why most financial analysis engines are still writen in C++
LOL !! :)

- Sahil Malik
You can reach me thru my blog at
http://www.dotnetjunkies.com/weblog/sahilmalik



Sahil Malik said:
I think I have a solution to this problem.

I faced a similar situation in believe it or not a financial system, so I
can understand the pain.

What I did was, I inherited from DataSet and DataTable to create
LargeDataset and LargeDatatable, and I would simply page parts of the
dataset as requested by the callee from the disk to and from temp files -
sort of like writing your own virtual memory paging scheme. Albeit my
implementation was fairly simple compared to windows Vmem handling the idea
being you have to reduce page faults, and my data access was fairly linear
in nature so I could simply cut up the rows into smaller configurable chunks
and page them as needed.

The good news is, since the inherited dataset is nothing but a dataset
(inherited), you can use it to bind, or basically anything a dataset might
do. But it's quite a lot of work overriding every behavior, and well the
class(es) I ended up writing were multiple pages long.

But at the end of the day - it all worked !! :)

- Sahil Malik
You can reach me thru my blog at
http://www.dotnetjunkies.com/weblog/sahilmalik



Jon Skeet said:
Doing that would be a bad idea, IMO. You really don't want to use
DataReader here.
having
records
 
Back
Top