large amounts of data

  • Thread starter Thread starter Terry
  • Start date Start date
T

Terry

I have a Web application that suddenly needs to scale up,
up, up.

I need some advice about handling large amounts of data
loading into the application's data sets. Please let me
know if my idea is a good one or if there is a better way
to handle the data.

The web application presents employee reports. A sample
report begins with a drop down list of employee names. A
user selects a name and returns a detailed report on the
employee.

We have about 15,000 employees on the list and expect the
list to grow rapidly. An employee might have many
records in his report.

What is the best way to present this application on the
web with a quick load time of the initial list of
employees and a quick turn around of the training records?

I'm considering caching some of the data, or publishing
data to the web server as xml using SQL server's xml
publishing. The xml could be republished from the SQL
server every 4 hours or so.

I fear that caching the list of employees will consume
too much memory.

My thought is to publish list of employees as a xml doc
and to load the xml data into a data set (to be loaded
into the drop down according to the user data from server
variables). The training data could be stored as xml as
well.

So, what do people think? Is this a good plan? Would it
be better to pull the data from the SQL server via
ADO.Net at every request?

Thanks

--Terry

To reply directly replace msdn . com with msn . com.
 
"Populating a dropdown list with far over 15.000 entries
is as far as i'm concerned a UI nightmare."

That would be a little odd, so I'm not loading 15,000,
20,000, 30,000, or 40,000 records into a single drop down
list. As they stand the drop down lists are filtered by
the information I can obtain about the user and by a
series of drop down lists.

That reduces the amount presented to a user but it does
not reduce the amount of data that has to be handled, or
the amount of times that the data has to be rendered.
There are even greater amounts of data on the report
side. Each employee will have many records.

The web application itself was built long ago for a
smaller user group with simpler needs. Really, it was
just an Access 95 ap used by one person. As a
departmental web ap it tried to use inline SQL queries to
pull the data into record sets and manipulate the data
from there. Though there might be a dozen reports in a
series the drop down lists and report data are largely
the same each time.

When I was assigned to care for the ap I took several
steps to improve performance (moving from and Access back
end to SQL server optimizing (or creating) indexes, using
stored procedures, adding "where" clauses with parameters
including more information about the user ...) but the
application grows in popularity and now is being demanded
by a user base greater than anyone imagined.

I do not have experience with scaling web applications so
far out of a departmental level so I am trying to find a
less resource intensive means of handling the data.

--Terry

To reply directly replace msdn . com with msn . com.
 
Kathleen,

Thanks for the advice. Most of the resources I can find
about .Net focus on the ability to cache data. If I
could rely users to request the output of a single
variation of a report several times I might try caching
the output, but the source data seems just to vast to
handle that way.

I will look into this Report Server, but if it requires
buying something then it will be a long range solution
(several months out).

You mentioned using javascript in conjunction with an xml
service. Do you think that xml would be the best way to
go? Better than caching? I am leaning towards that. I
made some documents with xml and Office Web Components
that enable an Excel savvy user to manipulate data. The
thought was that the Web Component would use an xml doc
as its data source, rather than the SQL server. Office
Web Components are not as popular as I hoped they would
be because of challenges with varying user box
configurations. I also have an example of an xml style
sheet that makes possible sorting data on a web page with
reloading and no call back to the server. Just click on
the column to sort and the style sheet changes. I am
considering adapting the style sheet to filter as well as
sort.

--Terry

To reply directly replace msdn . com with msn . com.

-----Original Message-----
Terry,

Is this strictly a "report of employee information". If so, I think you
should look for the schedule SQL Server 2000 Report
Server.
 
Terry,

I was a little too vague.

Report Server is going into public beta at some point. I had a chance to
work with it at TechEd and was quite impressed with the flexiblity. It was
announced as an extension to SQL Server - if you've got SQL Server, you've
got it (and it will work on Oracle, but you may have to buy a SQL Server
license...). Because it is an upcoming tool, you haven't heard about it
creating solutions today. It is a few months out for real world deployment I
suspect.

I think you are going to need a combination of tricks to get this to work
smoothly. If I understand what you are doing, the users select values from
combo boxes, that may or may not have interdependencies. Armed with this set
of information you'll go to the ASP server to get data. If there aren't
interdependencies, and you can do a full page/frame postback at that point,
then forget the Javascript. It was to manage the experience prior to that
point.

Yes, caching may be part of your solution. But its easy to jump to without
looking at alternatives, and to be honest, I don't have enough information
to give you an answer other than a classic "It depends". If my understanding
is correct, I doubt I'd go with a cache of employee names, or an XML
document. The perf hit will be the filtering, not the selection, and SQL
Server is excellent at that (unless you know your SQL Server is already in
serious trouble on perf). If you insure your critieria is indexed, you
should see excellent perf. On the other hand, if the employees segment
themselves into logical subsets for that combo box, even if there are a
couple of hundred subsets, I'd seriously consider a user control cache of
the output.

If you do go with an XML solution, be sure you to load it as an XPath
document and don't reload the doc on every postback. Keep it in memory, and
shove some more memory at the ASP server if needed (20,000 employees * 100
characters of data * 2 xml factor I made up = 4Mb).

I am extremely unclear on why you want to cache the training data. That
seems like a retrieval of a small subset of your data based on a primary
key, and I initially see no benefit to the cache.

Kathleen
 
Back
Top