How Much Data to Provide?

  • Thread starter Thread starter Kevin Frey
  • Start date Start date
K

Kevin Frey

At our work a bit of a "philosophical" debate is underway as to how much
data (in terms of number of rows of data) should be considered "enough" data
for presenting to a user, via web application. This is not a web-based
report (an entirely different issue) but a UI that allows a user to pick a
record and work on it in a transactional system.

What I mean by this is: if a particular table is extremely large (say 3
MILLION rows) should we consider it "reasonable" to allow them page through
all 3 million rows of data, if that's what they want to do?

In the past I have built desktop applications that would happily let you
page/scroll through millions of records (the concept is relatively similar).
But nowadays I'm of the opinion that there is a fairly small number (let's
say a few thousand, certainly unlikely > 5,000) that if the user needs to
see beyond this amount they are either using the wrong mechanism (should be
using a report) or they should not be doing that due to being so
unproductive. In the second case, the user should instead be using a search
mechanism to increase the selectivity of what they need require and narrow
the number of records displayed.

My technical reason for asking such a question is that when you want to
allow paging through X million records, it is only feasible to do this when
you are following well-defined navigational paths (indexes) in the database.
This allows you to use a "TOP N" approach and grab small chunks of data to
support each "page-full" of data being displayed.

Such a technical "restriction" produces a dichotomy when you also want to
present the data to a user in a grid and have them click a column heading to
sort the data - my way of thinking is if someone sorts the data and then
clicks "Next Page" they *should* see the next chunk of data based on that
sort order. To permit that for every possible column heading of a data table
would either require a bucket-load of indexes. But even then it is often the
case that an index cannot help you because the user wants to sort on a
foreign-key non-primary-key field (for example, Customer connects to
Customer Type, and user wants to sort by Customer Type Description [which is
not the key]).

I know their exist strategies like materialised views and indexable views,
but are these really appropriate solutions to this kind of problem? To make
matters worse in our case, we want users to have some degree of
customisation of what data they see, which means [within the limits of what
makes logical sense in the model] they can "join up" more foreign keys or
remove them. Even if we did use indexed views, simply indexing "everything"
would be wasteful.

Appreciate any comments you wish to make regarding the various strategies
available.
 
I would not allow more than a few hundred in most cases. I would also
provide a way to set up a hierarchy to reduce numbers (search with
controlled constraints), like alphabetically by last name, currently under
treatment, today's visitors, and the like. The means of whittling down
depend on the nature of the business and the application.

I would also have an advanced search type of feature that allows search for
items by some or all fields that can be searched.

I have retooled tons of BAD user interfaces that allowed people to get far
more data than they would ever be able to use that day (sometimes even in a
lifetime). Allowing a user to page through millions of records would fit
that kind of UI.

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
Kevin Frey said:
At our work a bit of a "philosophical" debate is underway as to how much
data (in terms of number of rows of data) should be considered "enough"
data for presenting to a user, via web application. This is not a
web-based report (an entirely different issue) but a UI that allows a user
to pick a record and work on it in a transactional system.

What I mean by this is: if a particular table is extremely large (say 3
MILLION rows) should we consider it "reasonable" to allow them page
through all 3 million rows of data, if that's what they want to do?

In the past I have built desktop applications that would happily let you
page/scroll through millions of records (the concept is relatively
similar). But nowadays I'm of the opinion that there is a fairly small
number (let's say a few thousand, certainly unlikely > 5,000) that if the
user needs to see beyond this amount they are either using the wrong
mechanism (should be using a report) or they should not be doing that due
to being so unproductive. In the second case, the user should instead be
using a search mechanism to increase the selectivity of what they need
require and narrow the number of records displayed.

My technical reason for asking such a question is that when you want to
allow paging through X million records, it is only feasible to do this
when you are following well-defined navigational paths (indexes) in the
database. This allows you to use a "TOP N" approach and grab small chunks
of data to support each "page-full" of data being displayed.

Such a technical "restriction" produces a dichotomy when you also want to
present the data to a user in a grid and have them click a column heading
to sort the data - my way of thinking is if someone sorts the data and
then clicks "Next Page" they *should* see the next chunk of data based on
that sort order. To permit that for every possible column heading of a
data table would either require a bucket-load of indexes. But even then it
is often the case that an index cannot help you because the user wants to
sort on a foreign-key non-primary-key field (for example, Customer
connects to Customer Type, and user wants to sort by Customer Type
Description [which is not the key]).

I know their exist strategies like materialised views and indexable views,
but are these really appropriate solutions to this kind of problem? To
make matters worse in our case, we want users to have some degree of
customisation of what data they see, which means [within the limits of
what makes logical sense in the model] they can "join up" more foreign
keys or remove them. Even if we did use indexed views, simply indexing
"everything" would be wasteful.

Appreciate any comments you wish to make regarding the various strategies
available.
 
My general strategy is to provide a text box and submit button above the
gridview where the user can type part of the information he's looking
for. The submit then triggers a query against the db using WHERE x LIKE
'%' + data + '%' to retrieve a subset of the data the user is looking
for.

You may need to modify this somewhat for your circumstance, but my
experience is that the user normally has some idea what they are looking
for and allowing them to pre-filter the data and then refine by paging
is a lot more efficient than either displaying it all on one page or
making them page through the data.

As for "How much data?" I generally show 10 row at a time.

I'd also recommend implementing paging at the stored procedure instead
of letting .NET do it for you. If they are looking for something that
returns a lot of records, this will return less data, which will reduce
the amount of time it will take to display the results on the browser.


Dave Bush
http://blog.dmbcllc.com




-----Original Message-----
From: Kevin Frey [mailto:[email protected]]
Posted At: Thursday, November 29, 2007 5:30 PM
Posted To: microsoft.public.dotnet.framework.aspnet
Conversation: How Much Data to Provide?
Subject: How Much Data to Provide?


At our work a bit of a "philosophical" debate is underway as to how much

data (in terms of number of rows of data) should be considered "enough"
data
for presenting to a user, via web application. This is not a web-based
report (an entirely different issue) but a UI that allows a user to pick
a
record and work on it in a transactional system.

What I mean by this is: if a particular table is extremely large (say 3
MILLION rows) should we consider it "reasonable" to allow them page
through
all 3 million rows of data, if that's what they want to do?

In the past I have built desktop applications that would happily let you

page/scroll through millions of records (the concept is relatively
similar).
But nowadays I'm of the opinion that there is a fairly small number
(let's
say a few thousand, certainly unlikely > 5,000) that if the user needs
to
see beyond this amount they are either using the wrong mechanism (should
be
using a report) or they should not be doing that due to being so
unproductive. In the second case, the user should instead be using a
search
mechanism to increase the selectivity of what they need require and
narrow
the number of records displayed.

My technical reason for asking such a question is that when you want to
allow paging through X million records, it is only feasible to do this
when
you are following well-defined navigational paths (indexes) in the
database.
This allows you to use a "TOP N" approach and grab small chunks of data
to
support each "page-full" of data being displayed.

Such a technical "restriction" produces a dichotomy when you also want
to
present the data to a user in a grid and have them click a column
heading to
sort the data - my way of thinking is if someone sorts the data and then

clicks "Next Page" they *should* see the next chunk of data based on
that
sort order. To permit that for every possible column heading of a data
table
would either require a bucket-load of indexes. But even then it is often
the
case that an index cannot help you because the user wants to sort on a
foreign-key non-primary-key field (for example, Customer connects to
Customer Type, and user wants to sort by Customer Type Description
[which is
not the key]).

I know their exist strategies like materialised views and indexable
views,
but are these really appropriate solutions to this kind of problem? To
make
matters worse in our case, we want users to have some degree of
customisation of what data they see, which means [within the limits of
what
makes logical sense in the model] they can "join up" more foreign keys
or
remove them. Even if we did use indexed views, simply indexing
"everything"
would be wasteful.

Appreciate any comments you wish to make regarding the various
strategies
available.
 
I'd also recommend implementing paging at the stored procedure instead
of letting .NET do it for you. If they are looking for something that
returns a lot of records, this will return less data, which will reduce
the amount of time it will take to display the results on the browser.

So do I take from your comments that if there was a substantive amount of
data (let's say 500 records) you would only retrieve 10 or so at a time?

How would you then handle the situation if the user wants a different
sorting order for their data, but still wants to page through it?
 
So do I take from your comments that if there was a substantial amount of
data (let's say 500 records) you would only retrieve 10 or so at a time?

I certainly would - the trick with web performance is many small requests...
How would you then handle the situation if the user wants a different
sorting order for their data, but still wants to page through it?

Sorting changes the sort order, and sets the page of data back to 1.

Paging retains the sort order.
 
Agree

-----Original Message-----
From: Mark Rae [MVP] [mailto:[email protected]]
Posted At: Friday, November 30, 2007 4:22 AM
Posted To: microsoft.public.dotnet.framework.aspnet
Conversation: How Much Data to Provide?
Subject: Re: How Much Data to Provide?

So do I take from your comments that if there was a substantial amount of
data (let's say 500 records) you would only retrieve 10 or so at a
time?

I certainly would - the trick with web performance is many small
requests...
How would you then handle the situation if the user wants a different
sorting order for their data, but still wants to page through it?

Sorting changes the sort order, and sets the page of data back to 1.

Paging retains the sort order.
 
Paging retains the sort order.

So you will only allow sorting where there is a corresponding index on the
database?
 
So you will only allow sorting where there is a corresponding index on the
database?

Precisely the other way round, otherwise you have IT driving the business
requirements which is total lunacy...

If a set of data needs to be sortable on a column, then that column needs to
be indexed otherwise you risk forcing your RDBMS to perform a table scan,
which can seriously cripple your performance...
 
Precisely the other way round, otherwise you have IT driving the business
requirements which is total lunacy...

If a set of data needs to be sortable on a column, then that column needs
to be indexed otherwise you risk forcing your RDBMS to perform a table
scan, which can seriously cripple your performance...

I didn't mean to imply that IT was driving the business requirement, simply
that there needed to be an index in place for any column which was to be
sortable.
 
Back
Top