Batch of Stored Procedures

  • Thread starter Thread starter mj2736
  • Start date Start date
M

mj2736

In our .Net 2.0 web service we need to insert/update data records into
SQL Server 2005 that have been uploaded to us in XML documents. There
could be anywhere from one to literally thousands of records in each
uploaded transaction, and there are several dozen destination tables.

Since performance is critical for this particular app, we were
thinking of combining the stored procedure calls into configurable-
sized batches using CommandType.Text and multiple concatenated EXEC
statements, so we minimize database roundtrips as opposed to calling
each SP individually in its own SqlCommand.

Is this a good approach? One concern is that SQL Server would have to
compile each and every batch since they will all be different, and the
overhead involved in this could become significant. Is this really
something to worry about, or not? Wouldn't a few batch compiles per
transaction still be better than potentially thousands of roundtrips?
Are there other problems with this approach?

Of course we are planning to stress test to see how it actually
responds, but I just wanted to get some opinions first in case someone
has done this before and can give advice/recommendations/suggestions
before we get too far into it.

Thanks!
MJ
 
In our .Net 2.0 web service we need to insert/update data records into
SQL Server 2005 that have been uploaded to us in XML documents. There
could be anywhere from one to literally thousands of records in each
uploaded transaction, and there are several dozen destination tables.

Since performance is critical for this particular app, we were
thinking of combining the stored procedure calls into configurable-
sized batches using CommandType.Text and multiple concatenated EXEC
statements, so we minimize database roundtrips as opposed to calling
each SP individually in its own SqlCommand.

Is this a good approach? One concern is that SQL Server would have to
compile each and every batch since they will all be different, and the
overhead involved in this could become significant. Is this really
something to worry about, or not? Wouldn't a few batch compiles per
transaction still be better than potentially thousands of roundtrips?
Are there other problems with this approach?

Of course we are planning to stress test to see how it actually
responds, but I just wanted to get some opinions first in case someone
has done this before and can give advice/recommendations/suggestions
before we get too far into it.

Thanks!
MJ
For SQL 2005, I would use SQL's built in XML processing. You can pass
the xml text in as a parameter to a stored procedure.

Here's a snippet from what I have done with something similar:

/*
declare @XmlDoc nvarchar(4000)
set @XmlDoc =
'<patient>
<answer>
<PatientID>13303</PatientID>
<QuestionID>200</QuestionID>
<AppointmentID>105084</AppointmentID>
<AnswerValue>217,218</AnswerValue>
<SaveTypeID>1</SaveTypeID>
</answer>
</patient>'

exec up_PatientAnswer_Save_FromXml @XmlDoc = @XmlDoc
*/
ALTER procedure [dbo].[up_PatientAnswer_Save_FromXml]
(
@XmlDoc ntext
)
as
set nocount on

declare @Answers table
(
AnswerID Int identity,
PatientID Int,
QuestionID Int,
AppoIntmentID Int,
SaveTypeID Int,
AnswerValue nvarchar(3900) null,
AnswerValueDate datetime null,
AnswerValueInt Int null
)

declare @Pointer Int

-- get a pointer to the parsed doc
exec sp_xml_preparedocument @Pointer output, @XmlDoc

-- insert Into the table var so we can dispose of the xml doc
-- and work with the data easier (better)
insert Into @Answers
(PatientID, QuestionID, AppointmentID, SaveTypeID, AnswerValue,
AnswerValueDate, AnswerValueInt)
select
PatientID, QuestionID, AppointmentID, SaveTypeID, AnswerValue,
AnswerValueDate, AnswerValueInt
from
openxml(@Pointer, '/patient/answer', 2)
with
(
PatientID int,
QuestionID int,
AppointmentID int,
SaveTypeID int,
AnswerValue nvarchar(3900),
AnswerValueDate datetime,
AnswerValueInt int
)

-- release the xml doc
exec sp_xml_removedocument @Pointer

-- processing from here is done with the data in the table variable


More info can be found here:

http://www.perfectxml.com/articles/xml/openxml.asp

If the XML that you receive is usable as is, you can just pass that
along. Or you can create new XML that is in a format that you expect.

You may also be able to us SqlBulkCopy to insert the data into a loading
table and have a stored procedure that works off of that table.

In SQL 2008, you will be able to pass a DataTable as a parameter to a
stored proc.
 
Thanks for the suggestion. We've actually tried this - passing the
data in as XML - but found it to be way too resource intensive on the
database server for our purposes, particularly with large XML
documents. We observed occasional 'out of memory' errors despite
calling sp_xml_removedocument every time when finished. So we have
already ruled this out.

I am interested if anyone has any opinions about the idea of
concatenating multiple SP calls into a batch and sending the whole
batch to the database in one shot as CommandType.Text.

Thanks,
MJ
 
In our .Net 2.0 web service we need to insert/update data records into
SQL Server 2005 that have been uploaded to us in XML documents. There
could be anywhere from one to literally thousands of records in each
uploaded transaction, and there are several dozen destination tables.

Since performance is critical for this particular app, we were
thinking of combining the stored procedure calls into configurable-
sized batches using CommandType.Text and multiple concatenated EXEC
statements, so we minimize database roundtrips as opposed to calling
each SP individually in its own SqlCommand.

Is this a good approach? One concern is that SQL Server would have to
compile each and every batch since they will all be different, and the
overhead involved in this could become significant. Is this really
something to worry about, or not? Wouldn't a few batch compiles per
transaction still be better than potentially thousands of roundtrips?
Are there other problems with this approach?

Of course we are planning to stress test to see how it actually
responds, but I just wanted to get some opinions first in case someone
has done this before and can give advice/recommendations/suggestions
before we get too far into it.

Thanks!
MJ

SQL Server MVP Erland Sommarskog wrote an excellent article "Arrays
and Lists in SQL". We use his approach. We pack up to 100K values in
an image, that mimics an array. We can use several images. Performs
very fast.
 
I still think XML may be your best option, but preload the XML into a single
table and then run processes on the data there. This works best if you can
create a table to load. You may need multiple load tables if you have
different clients. This will still hammer the server for a bit. It should
not be an issue for thousands of records, however. After you fill one table,
you need to get its primary key values into the load table, especially if
using row guids or IDENTITY.

When you get to millions of records, this can still cause issues with the
server. In these cases, you may want to preprocess outside of SQL Server. If
you opt for this, you essentially create bcp files (or files for
BULK_INSERT) outside of SQL. When you get to the step to create IDENTITY
values, you can use DBCC commands to increment a "hole" the size of the
records being loaded. The customer's unique key and new IDENTITY driven
primary key can be loaded into a hashtable. Load time for a million records
is about 2-3 seconds with 1 million lookups takign far less than a second.
At least this was the stats we worked with when I did a project that
preprocessed offline.

If you can load into a single table from XML and then run commands, it is
less time consuming to code.

What about batching? It is a possibility, but realize you are just
flattening the performance curve. Rather than a single huge hit, you are
taking many smaller hits. This could well be the solution you are looking
for, but you may find that this approach fails as you grow in size. Of
course, you can always move SQL Server to a big box, the data files to a
SAN, etc. to alleviate some, if not all, issues. At least until you grow to
millions of files being loaded this way.

Another potential option is getting more transactional with the process that
creates the XML so single records are sent. If this is a client, it may not
be an option. Even internal, there may be valid reasons not to go record by
record.

Also, in most of these systems, you have to check for existing records
before loading, as inevetably someone sends you the same info. You may have
some system of checks and balances here to avoid this. If not, consider it,
as a process failure in the middle of a large batch of records can be
extremely hard to cure.

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
 
I still think XML may be your best option, but preload the XML into a single
table and then run processes on the data there. This works best if you can
create a table to load. You may need multiple load tables if you have
different clients. This will still hammer the server for a bit. It should
not be an issue for thousands of records, however. After you fill one table,
you need to get its primary key values into the load table, especially if
using row guids or IDENTITY.

So far this is contrary to our benchmarks - in all our experience XML
is inefficient in terms of network bandwidth, storage, and CPU cycles
to parse.
What about batching? It is a possibility, but realize you are just
flattening the performance curve. Rather than a single huge hit, you are
taking many smaller hits. This could well be the solution you are looking

I have a different opinion here too - based on years of benchmarking I
think there is a performance price you pay for every round trip to the
database. Dealing with batches may be way more efficient - in some
cases 100 times more performant. I do not see the flattening you are
speaking about, I usually observe a dramatic reduction of overall
execution costs.
 
Mike said:
Thanks for the suggestion. We've actually tried this - passing the
data in as XML - but found it to be way too resource intensive on the
database server for our purposes, particularly with large XML
documents. We observed occasional 'out of memory' errors despite
calling sp_xml_removedocument every time when finished. So we have
already ruled this out.

I am interested if anyone has any opinions about the idea of
concatenating multiple SP calls into a batch and sending the whole
batch to the database in one shot as CommandType.Text.

Thanks,
MJ
I recall problems like you mention in SQL 2000, but have never seen this
issue in 2005 when you dispose of the xml document early.
 
Alex Kuznetsov said:
So far this is contrary to our benchmarks - in all our experience XML
is inefficient in terms of network bandwidth, storage, and CPU cycles
to parse.

When consumed as XML, I would concur. For apps with heavy load, running
directly out of the in-memory XML data, even in SQL Server, has a perf
penalty associated with it. The larger the XML set, the greater the penalty.

As efficient as Microsoft has made their XML representations, you still end
up with recursion underneath the hood. If you load it into a temp table,
however, the inefficiency is reduced as the main hit is moving the data from
XML structures to the temp table. The question then is whether or not
ripping the data from the temp table into your normalized structure creates
too heavy a hit on your database. But, this can take a toll on tempdb if you
have exteremely large data sets, so you have to determine where you can
afford to take the hit.
I have a different opinion here too - based on years of benchmarking I
think there is a performance price you pay for every round trip to the
database. Dealing with batches may be way more efficient - in some
cases 100 times more performant. I do not see the flattening you are
speaking about, I usually observe a dramatic reduction of overall
execution costs.

Flattening may not have been the best word to use here, but hindsight is
always 20-20. :-)

Overall, I was speaking of prepping the data outside of the database, which
can be a huge performance boost. Just as a bit of anectdotal evidence
(anectdotal, as it is a single case, not because the evidence is
inconclusive), I helped design and build an "offline" system that worked
completely outside of the database until batch load. Working with non-XML
data, of course, but files up to 25GB in size. The prior system was load and
run sprocs, which could take up to 8 days to completely normalize in the
database. The new load system originally got the load time down to about 24
hours, or approximately 12.5% of the original time. Actually time
"hammering" the database was reduced to minutes of batch upload.

Before I left the company, we had refactored to where it was about 8 hours.
I believe they are down to about 1/3rd of that time now. Most of the
inefficiencies lay in either a) concatenation of strings (StringBuilder
helped tremendously here) and b) working with strings instead of bytes
(unavoidable in some situations, but very doable during the first churns
through the data). Further efficiency was garnered by sorting the files with
Windows ports of some UNIX sort utilities (to anyone ever going this route,
it pays to get trials and compare sort times, as sort does not necessarily
equal sort. Depending on how you use the sort, you will find that some are
more efficient than others in some respects).

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
 
Back
Top