How to delete thousands of SQL records

  • Thread starter Thread starter ChrisH
  • Start date Start date
C

ChrisH

Hello,

I need to delete thousands of records from two SQL tables, but the
rows are not contiguous. There is a unique numeric id for each row
(clustered index), and my input is a file containing at least a
hundred thousand of these ids to be deleted. The numbers, however,
are all over the map. On my first attempt, I wrote a service, using
C#, that just looped through the file and issued DELETE commands using
the SqlCommand object and the ExecuteNonQuery method. It beat the
crap out of the server, nearly pegging it for the duration.

On my second attempt, I created a dataset using a BETWEEN clause with
the lowest and highest numbers in the file. In tests with a small
file with the numbers more contiguous than real life, it performed
quite well, simply spiking the CPU a bit when I issued the update.
With an actual test set, however, the service eat up almost 2GB of ram
fairly quickly, then stayed put, maybe grabbing a few KB here and
there. I write a log file during the operation, so I know if the
deletion process has begun, and the log file was never written. I
suspect that SELECT query containing the above-mentioned BETWEEN
clause generated more than a million rows and this is the problem.

That said, I need another approach. Can someone please point me in
the right direction?

Many thanks,

Chris
cholimanatcoxdotnet
 
Chris,

Did you try this building it using the stringbuilder.

"Delete.......................;Delete.....................;Delete..............................

And than one ExecutenonQuery

I hope this helps,

Cor
 
ChrisH said:
Hello,

I need to delete thousands of records from two SQL tables, but the
rows are not contiguous. There is a unique numeric id for each row
(clustered index), and my input is a file containing at least a
hundred thousand of these ids to be deleted. The numbers, however,
are all over the map. On my first attempt, I wrote a service, using
C#, that just looped through the file and issued DELETE commands using
the SqlCommand object and the ExecuteNonQuery method. It beat the
crap out of the server, nearly pegging it for the duration.

On my second attempt, I created a dataset using a BETWEEN clause with
the lowest and highest numbers in the file. In tests with a small
file with the numbers more contiguous than real life, it performed
quite well, simply spiking the CPU a bit when I issued the update.
With an actual test set, however, the service eat up almost 2GB of ram
fairly quickly, then stayed put, maybe grabbing a few KB here and
there. I write a log file during the operation, so I know if the
deletion process has begun, and the log file was never written. I
suspect that SELECT query containing the above-mentioned BETWEEN
clause generated more than a million rows and this is the problem.

That said, I need another approach. Can someone please point me in
the right direction?


Use SQLBulkCopy, BULK INSERT, or plain-old INSERT to get your ID's into a
server-side table, and delete using a JOIN or subquery.

new SqlCommand("create table #RowsToDelete(ID int primary
key)",con).ExecuteNonQuery();

.. . . insert ID's. SQLServer will sort as you go because of the Primary
Key.

new SqlCommand("delete from MyTable where id in (select id from
#RowsToDelete)",con).ExecuteNonQuery();

David
 
Yuck! Hammer that server. :-)

--
Gregory A. Beamer

************************************************
Think outside the box!
************************************************
 
Upload the numbers into a temp table and use a join to delete.

delete from MyTable
where MyId IN (SELECT MyID FROM #TempTable)

If you find this perf is horrible on the delete, you can, instead, upload
the numbers get a group of numbers not in this list and put that in a temp
table. Then move the non-deletes to another temp table, truncate the
original table and move them back.

SELECT MyID INTO #TempTable2 FROM MyTable m
LEFT JOIN #TempTable t
ON m.MyID = t.MyID
WHERE t.MyID IS NULL

SELECT * INTO #TempTable3 FROM MyTable m
JOIN #TempTable2 t
ON m.MyID = t.MyID

TRUNCATE TABLE MyTable

INSERT INTO MyTable
SELECT * FROM #TempTable3

-- You can get rid of the temp tables explicitly now

If you have constraints on this table (it is referred to by a foreign key
relationship elsewhere) this gets trickier as cascading deletes might or
might not be the best option.

--
Gregory A. Beamer

************************************************
Think outside the box!
************************************************
 
A great part of your problem here is the amount of basic work that must be
performed by SQL-Server (deleting hundred of thousands of records) and has
much less to do with the way you ask it to do so.

So you should see this as a SQL-Server optimisation problem. For example,
if you have a lot of indexes, than all of these indexes must be updated for
each delete operations. Here are a few suggestions:

1- Are they too many indexes (unused indexes) on this table?

2- Are these indexes heavily fragmented? Do you perform rebuilding and/or
defragmenting of theses indexes on a regular basis?

3- Have you considered the possibility of storing this table/clustered index
in its own file and make this file continuous on the physical hard drive (by
allocating in a single shot or defragmenting the hard drive) ?

Finally, you should ask this question in a newsgroup about SQL-Server (as
this is more a problem related to SQL-Server than to .NET) and buy a good
book on optimizing SQL-Server.
 
Chris,

Did you try this building it using the stringbuilder.

"Delete.......................;Delete.....................;Delete..............................

And than one ExecutenonQuery

I hope this helps,

Cor

Thanks, I'll give that a try.

Chris
 
Back
Top