Fill DataTable Progress

processoriented · Oct 13, 2006

Hi, I'm something of a noob at this, but here it is... I have an app
that fills a dataset from a SQL database, and then writes the dataset
to an xml file. Everything is a SELECT query... I am basically just
replicating data to the local user's machine, and several of the
queries take a long time to run over VPN connection to the database
(the way most of my users will use it), so I sank the whole operation
into a BackgroundWorker so that the user's app wont freeze up while it
is running.

I would love to use the backgroundworker progresschanged event to
control a progress bar and show the user how much longer the background
operation is going to take, but I can't seem to figure out how to make
that happen... I have experimented with the dataadapter rowupdating
event, but even if I can get that working (and so far I can't) it will
only tell me how many rows it's already done, not how many it has to
go... I thought about running two queries on the database (one to
count the number of records I should expect and one to return the
records) but, really, there's got to be a better way. Has anyone out
here had any success in developing a solution that returns progress
indication on filling a dataset from a SELECT query that they can post?
I am basically looking for an example here so that I can figure out
how to apply it to my own circumstance.

Thanks!

V

Stephany Young · Oct 13, 2006

You have already answered your own question.

You MUST know what the target is before you can compare anything to the
target.

The dataadapter has no knowledge of how may rows it is going to 'import'
until it has finished.

The first question you need to address is how much overhead is involved in
retrieving the count first. If the query is simple and efficient the
overhead will be negible but if it is complex and/or inefficient then the
overhead could be considerable.

The next question you have to address is the value of the progress
information compared to the overhead in producing it.

Having progress information is all very nice, so long as it does not impact
on the perfomance of the overall application.

processoriented · Oct 13, 2006

Thanks Stephany... that's what I was afraid of... very complex
queries... lots of overhead, little value to the progress indicator,
but would have been a "nice to have"

Jay B. Harlow · Oct 13, 2006

V,
As Stephany suggests, you unfortunately don't know how many rows until your
done.

What I will do is use a "typical" number for max rows on the progress.

Alternatively you can use ProgressBar.Style = Marquee, coupled with a Timer.

http://msdn2.microsoft.com/en-us/library/system.windows.forms.progressbar.style.aspx

Something like:

Protected Overrides Sub OnLoad(ByVal e As EventArgs)
MyBase.OnLoad(e)
ProgressBar1.Maximum = Integer.MaxValue
End Sub

Private Sub Timer1_Tick(ByVal sender As System.Object, ByVal e As
EventArgs) Handles Timer1.Tick
ProgressBar1.Value += 1
End Sub

NOTE: You need to set ProgressBar1.Maximum value to a sufficiently high
value to ensure you don't exceed it when you set ProgressBar1.Value...

Jay B. Harlow · Oct 13, 2006

Doh!

You don't need the timer with ProgressBar.Style to Marquee as Marquee will
update the progress for you!

Haven't fully tested ProgressBar.Style Continuous...

Cor Ligthert [MVP] · Oct 13, 2006

A sample however with the same conclusions as Stephany and you had already.

http://www.vb-tips.com/dbPages.aspx?ID=49f2cff5-56ad-44fc-a4c6-fc0d5c470f53

I hope this helps,

Cor

Andrew Morton · Oct 13, 2006

Stephany said:
You have already answered your own question.

You MUST know what the target is before you can compare anything to
the target.

The dataadapter has no knowledge of how may rows it is going to
'import' until it has finished.

The first question you need to address is how much overhead is
involved in retrieving the count first. If the query is simple and
efficient the overhead will be negible but if it is complex and/or
inefficient then the overhead could be considerable.

If the select statement was of the form SELECT COUNT(*), [other stuff],
would SQL Server be efficient enough to only do the count once? Then could
that value be retrieved from the dataset before the dataset is filled? Of
course, it would add to the amount of data being transferred and slow it
down a bit :-(

Or even use a stored procedure to select into a temporary table then return
the count of records in that table and finally return the records.

Andrew

Cor Ligthert [MVP] · Oct 13, 2006

Andrew,

Using a stored procedure in any database except DB2 does not give you any
performance advantage above using sql text strings.

Cor

Andrew Morton said:
Stephany said:

You have already answered your own question.

You MUST know what the target is before you can compare anything to
the target.

The dataadapter has no knowledge of how may rows it is going to
'import' until it has finished.

The first question you need to address is how much overhead is
involved in retrieving the count first. If the query is simple and
efficient the overhead will be negible but if it is complex and/or
inefficient then the overhead could be considerable.

Click to expand...

If the select statement was of the form SELECT COUNT(*), [other stuff],
would SQL Server be efficient enough to only do the count once? Then could
that value be retrieved from the dataset before the dataset is filled? Of
course, it would add to the amount of data being transferred and slow it
down a bit :-(

Or even use a stored procedure to select into a temporary table then
return the count of records in that table and finally return the records.

Andrew

processoriented · Oct 14, 2006

Thanks to everyone for the fantastic ideas... I have been able to get
the replication function working mostly to my satisfaction... the only
"hitch" comes on one of the more complex queries (query needs to poll
five different tables, and decide based on some SQL functions which
data to include/exclude - clearly not the most efficient query I've
every written, but given the needs of the application it is really the
only way to grab what is needed and prevent the user from seeing some
information that should remain confidential - because what needs to
remain confidential changes dynamically, there is no other way than the
complex query to get it... but I digress) the query normally takes
about 25 to 30 seconds to run, and when I rewrote the query to give me
a count of the records the "count" query still takes about 10-15
seconds to run... during this time, it appears that the app is
"hanging" because there is nothing happening that can trigger an event
to change the user interface... I am thinking that I can just live
with it, but if my users really start complaining, I might just throw
in an animated gif or something else to give the (correct) impression
that something is happening in the background.

Thanks again!

Andrew Morton said:
Andrew,

Using a stored procedure in any database except DB2 does not give you any
performance advantage above using sql text strings.

Cor

Andrew Morton said:

Stephany said:

You have already answered your own question.

You MUST know what the target is before you can compare anything to
the target.

The dataadapter has no knowledge of how may rows it is going to
'import' until it has finished.

The first question you need to address is how much overhead is
involved in retrieving the count first. If the query is simple and
efficient the overhead will be negible but if it is complex and/or
inefficient then the overhead could be considerable.

Click to expand...

If the select statement was of the form SELECT COUNT(*), [other stuff],
would SQL Server be efficient enough to only do the count once? Then could
that value be retrieved from the dataset before the dataset is filled? Of
course, it would add to the amount of data being transferred and slow it
down a bit :-(

Or even use a stored procedure to select into a temporary table then
return the count of records in that table and finally return the records.

Andrew

Click to expand...

Andrew Morton · Oct 16, 2006

Cor said:
Andrew,

Using a stored procedure in any database except DB2 does not give you
any performance advantage above using sql text strings.

My idea was to use /one/ stored procedure to return /both/ a value and a set
of records, the hope being that SQL Server would be able to cache something
along the way so that the count(*) of the selection and the selection
records themselves would be found more efficiently than doing completely
separate queries.

<pseudo-code>
create procedure foo
@nRecords as int ouput
as
select records into #temporary_table;
select @nRecords=count(*) from #temporary_table;
select * from #temporary_table;

then the OP has the number (@nRecords) to create a progress counter:

<pseudo-code>
setup SqlCommand with a sqlParam.Direction = ParameterDirection.Output
myReader = myCommand.ExecuteReader
retrieve the @nRecords output parameter
dim counter as integer = 0
if nRecords > 0 then
while myReader.Read
get record
counter += 1
update progress counter with (counter / nRecords)
end while
end if

I wasn't really looking for a performance advantage other than trying to
reduce two queries into one; the message from the OP suggests that 10-15
seconds could be saved.

Hope that makes some sort of sense.

Andrew

Jay B. Harlow · Oct 16, 2006

Andrew,

create procedure foo
@nRecords as int ouput

The "problem" is that the output parameter is available *after* the result
set is completed.

In other words the client would need to process the entire result set to get
the number of rows parameter...

You could probably work around this by returning 2 result sets. The first
one containing the number of rows, and the second one containing the rows
themselves.

Andrew Morton · Oct 16, 2006

Jay said:
Andrew,
The "problem" is that the output parameter is available *after* the
result set is completed.

I had a niggling feeling that there was something wrong with the idea.

In other words the client would need to process the entire result set
to get the number of rows parameter...

That isn't going to be very helpful then :-(

You could probably work around this by returning 2 result sets. The
first one containing the number of rows, and the second one
containing the rows themselves.

So all it needed was someone who actually knows how it works instead of me
guessing :-)

Andrew

Fill DataTable Progress

processoriented

Stephany Young

processoriented

Jay B. Harlow

Jay B. Harlow

Cor Ligthert [MVP]

Andrew Morton

Cor Ligthert [MVP]

processoriented

Andrew Morton

Jay B. Harlow

Andrew Morton