Transaction deadlock on inserting into a table

  • Thread starter Thread starter fniles
  • Start date Start date
F

fniles

I am using SQL Server 2005 with VB.NET 2008.
I have a table myTbl with an identity column ID (this is NOT a primary
key and I do NOT have an index on this column) and a primary key
column called [Order].
I also have a stored procedure INSERT_INTO_myTbl that inserts a record
into myTbl, and set the [Order] column to be the value of Account
column plus '-' plus the value of the identity column ID like so:
SET [Order] = @ACCOUNT + '-' + convert(varchar(50),@ID)
For example:
@Order = "11111"
@Account = "11111"
After INSERT_INTO_myTbl, [order] = "11111-123456"
In the VB.NET program I have hundreds of different thread that calls
the sp INSERT_INTO_myTbl.
If I call INSERT_INTO_myTbl at the same time (say 100 times from 1
thread and 100 times from another thread), the insertion is very slow,
and sometimes I will still get error 1205 (transaction deadlock) even
after the stored procedure retry 3 times. I also will get the
transaction deadlock when I try to query the table during this time.
When I increase the retry to 10 times, sometimes I will get the error
"The timeout period elapsed prior to completion of the operation or
the server is not responding".
1. Why if I call sp INSERT_INTO_myTbl 100 times from 1 thread and 100
times from another thread at the same time, it is very slow ?
2. Is there any way to avoid the deadlock ?

Thank you

CREATE PROCEDURE INSERT_INTO_myTbl
@Order varchar(50) = NULL,
@ACCOUNT varchar(10) = NULL
AS
DECLARE @Success int, @TryNr int, @ErrMsg varchar(500), @ErrNumber
int;
SET @Success = 0;
SET @TryNr = 1;
SET @ErrMsg = ''
WHILE @Success = 0 AND @TryNr <= 3 and @ErrMsg = ''
BEGIN;
BEGIN TRY;
BEGIN TRANSACTION;
insert into myTbl ([Order],ACCOUNT) values
(@Order,@ACCOUNT)
select @ID = SCOPE_IDENTITY()
UPDATE HistTradesOrig
SET [Order] = @ACCOUNT + '-' + convert(varchar(50),@ID)
WHERE ID = @ID
SET @Success = 1
COMMIT TRANSACTION;
END TRY
BEGIN CATCH;
ROLLBACK TRANSACTION;
SELECT @ErrNumber = ERROR_NUMBER()
IF @ErrNumber = 1205
BEGIN;
SET @TryNr = @TryNr + 1;
IF @TryNr > 3
RAISERROR ('Giving up after three consecutive deadlocks', 16,
1);
END;
ELSE
BEGIN;
SET @ErrMsg = ERROR_MESSAGE();
RAISERROR (@ErrMsg, 16, 1);
END;
END CATCH;
END;
 
fniles said:
insert into myTbl ([Order],ACCOUNT) values
(@Order,@ACCOUNT)
select @ID = SCOPE_IDENTITY()
UPDATE HistTradesOrig
SET [Order] = @ACCOUNT + '-' + convert(varchar(50),@ID)
WHERE ID = @ID
SET @Success = 1

I have difficult to make sense out of this, at least as it is written. Is
myTbl and HistTradesOrirg the same table? If they are not, it looks very
funny you select which row to update from the IDENTITY value you received
for the other table.

On the other hand, if they are the same, it looks very strange that you
simply through the value away.

So I will have to guess that you trimmed things down a little bit too much
for the post.

Anyway, it seems that the solution is to make Order a computed column:

Order AS Account + '-' + convert(varchar(10), ID)

That saves from updating the column directly after the update, which
certainly can increase the risk for deadlocks.

--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
 
Thanks for the reply.

Sorry, yes, myTbl is actually HistTradesOrirg, my bad.

So, you think the deadlock is because of this UPDATE statement ?
UPDATE HistTradesOrig SET [Order] = @ACCOUNT + '-' +
convert(varchar(50),@ID)

The computed column would be a good idea, but there are some orders
that I do NOT want to set [Order] = @ACCOUNT + '-' +
convert(varchar(50),@ID).
Most orders will be computed that way, but not all of them.




fniles said:
                insert into myTbl ([Order],ACCOUNT) values
(@Order,@ACCOUNT)
        select @ID = SCOPE_IDENTITY()
        UPDATE HistTradesOrig
        SET [Order] = @ACCOUNT + '-' + convert(varchar(50),@ID)
        WHERE ID = @ID
        SET @Success = 1

I have difficult to make sense out of this, at least as it is written. Is
myTbl and HistTradesOrirg the same table? If they are not, it looks very
funny you select which row to update from the IDENTITY value you received
for the other table.

On the other hand, if they are the same, it looks very strange that you
simply through the value away.

So I will have to guess that you trimmed things down a little bit too much
for the post.

Anyway, it seems that the solution is to make Order a computed column:

  Order AS Account + '-' + convert(varchar(10), ID)

That saves from updating the column directly after the update, which
certainly can increase the risk for deadlocks.

--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Links for SQL Server Books Online:
SQL 2008:http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005:http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
 
Here is the table
CREATE TABLE [dbo].[HistTradesOrig](
[ID] [int] IDENTITY(1,1) NOT NULL,
[LOG] [smallint] NULL,
[Floor] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[order] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Date] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[TradeType] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Quantity] [float] NULL,
[Symbol] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FillPrice] [float] NULL CONSTRAINT [DF__HistTrade__FillP__4CA06362]
DEFAULT ((0)),
[OrderType] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
CONSTRAINT [DF__HistTrade__Order__4D94879B] DEFAULT ('DAY'),
[OrderSpec] [varchar](12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
CONSTRAINT [DF__HistTrade__Order__4E88ABD4] DEFAULT ('NA'),
[OrderPrice] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
CONSTRAINT [DF__HistTrade__Order__4F7CD00D] DEFAULT ((0)),
[Put] [smallint] NULL,
[Call] [smallint] NULL,
[open] [smallint] NULL,
[close] [smallint] NULL,
[Account] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Cancel] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Day] [datetime] NULL,
[Strike] [float] NULL CONSTRAINT [DF__HistTrade__Strik__5070F446]
DEFAULT ((0)),
(e-mail address removed)
 
When I took out the following code from the SP, I do NOT get the
transaction lock error
select @ID = SCOPE_IDENTITY()
UPDATE HistTradesOrig
SET [Order] = @ACCOUNT + '-' + convert(varchar(50),@ID)
WHERE ID = @ID

For most of the inserts, I will need to SET [Order] = @ACCOUNT + '-' +
convert(varchar(50),@ID).
But, for some of the inserts, that's not the case, so I can't use the
formula in the [Order] column.

What is the best way to update column [Order] = @ACCOUNT + '-' +
convert(varchar(50),@ID) without causing the transaction lock ?

Shall I do it in 2 different SP with 1 of the SP doing the insert
first ?


Here is the table
CREATE TABLE [dbo].[HistTradesOrig](
        [ID] [int] IDENTITY(1,1) NOT NULL,
        [LOG] [smallint] NULL,
        [Floor] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [order] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
        [Date] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,
        [TradeType] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Quantity] [float] NULL,
        [Symbol] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [FillPrice] [float] NULL CONSTRAINT [DF__HistTrade__FillP__4CA06362]
DEFAULT ((0)),
        [OrderType] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
CONSTRAINT [DF__HistTrade__Order__4D94879B]  DEFAULT ('DAY'),
        [OrderSpec] [varchar](12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
CONSTRAINT [DF__HistTrade__Order__4E88ABD4]  DEFAULT ('NA'),
        [OrderPrice] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
CONSTRAINT [DF__HistTrade__Order__4F7CD00D]  DEFAULT ((0)),
        [Put] [smallint] NULL,
        [Call] [smallint] NULL,
        [open] [smallint] NULL,
        [close] [smallint] NULL,
        [Account] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Cancel] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Day] [datetime] NULL,
        [Strike] [float] NULL CONSTRAINT [DF__HistTrade__Strik__5070F446]
DEFAULT ((0)),
        (e-mail address removed)
 
It looks like by creating a Unique NON Clustered ASC index on ID makes
everything runs a LOT faster and fixed the Transaction Log issue.
This even if I leave the following UPDATE statement in the stored
procedure:
select @ID = SCOPE_IDENTITY()
UPDATE HistTradesOrig
SET [Order] = @ACCOUNT + '-' + convert(varchar(50),@ID)
WHERE ID = @ID

When I took out the following code from the SP, I do NOT get the
transaction lock error
       select @ID = SCOPE_IDENTITY()
        UPDATE HistTradesOrig
        SET [Order] = @ACCOUNT + '-' + convert(varchar(50),@ID)
        WHERE ID = @ID

For most of the inserts, I will need to SET [Order] = @ACCOUNT + '-' +
convert(varchar(50),@ID).
But, for some of the inserts, that's not the case, so I can't use the
formula in the [Order] column.

What is the best way to update column [Order] = @ACCOUNT + '-' +
convert(varchar(50),@ID) without causing the transaction lock ?

Shall I do it in 2 different SP with 1 of the SP doing the insert
first ?

Here is the table
CREATE TABLE [dbo].[HistTradesOrig](
        [ID] [int] IDENTITY(1,1) NOT NULL,
        [LOG] [smallint] NULL,
        [Floor] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [order] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
        [Date] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [TradeType] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Quantity] [float] NULL,
        [Symbol] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [FillPrice] [float] NULL CONSTRAINT [DF__HistTrade__FillP__4CA06362]
DEFAULT ((0)),
        [OrderType] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
CONSTRAINT [DF__HistTrade__Order__4D94879B]  DEFAULT ('DAY'),
        [OrderSpec] [varchar](12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
CONSTRAINT [DF__HistTrade__Order__4E88ABD4]  DEFAULT ('NA'),
        [OrderPrice] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
CONSTRAINT [DF__HistTrade__Order__4F7CD00D]  DEFAULT ((0)),
        [Put] [smallint] NULL,
        [Call] [smallint] NULL,
        [open] [smallint] NULL,
        [close] [smallint] NULL,
        [Account] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Cancel] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
        [Day] [datetime] NULL,
        [Strike] [float] NULL CONSTRAINT [DF__HistTrade__Strik__5070F446]
DEFAULT ((0)),
        (e-mail address removed)
 
fniles said:
It looks like by creating a Unique NON Clustered ASC index on ID makes
everything runs a LOT faster and fixed the Transaction Log issue.

Of course. Without it, SQL Server will have to scan the whole table. It
will then be blocked on any newly inserted row. And that process will
be blocked by...

Proper indexes is important to avoid deadlocks.

But your UPDATE is changing the clustered index, which means that the entire
row will be moved. This means then you first insert a row in one place, you
can cause a page split (unless @order is always increasing), and then you
will cause another page split when you move the row.

I think you should consider to find a different clustered index for your
table, unless you run a lot of range queries on the Order column. (Which I
find somewhat difficult to believe.)

Also, I suggest that you do this prior to INSERT:

retry:
BEGIN TRY
SELECT @expected_id = ident_current('HistTradesOrig') + 1
SELECT @order = @account + '-' + convert(varchar(50), @expected_id)
INSERT ...
END TRY
BEGIN CATCH
IF error_message() = 2627 GOTO retry
-- Retrhow any other error.
END CATCH

SELECT @id = scope_identity()
IF @id <> @expected_id
BEGIN
UPDATE HistTradesOrig
SET ....
END

ident_current gives the most recently generated IDENTITY value for the
table *by any session*. There is no guarantee that you will get
ident_current()+1, but you might. In this case, you need to do the
second update. And if you need it, you may still not have to move the
row, since it might end up on the same place in the index.

The extra TRY block is there if two processes tries to insert rows for
the same account simultaneously.

A much simpler solution is to skip IDENTITY and roll your own with MAX,
but then you will get a serilization point and no concurrency.



--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
 
Here is the table
(snip)

Hi fniles,

I guess this is the same table we discussed in the other thread,
right?

Based on the table layout and the stored procedure you posted, my
suggestion would be to make the following changes:

==> ID column should be primary key (and use clustered index)
==> order column should have UNIQUE constraint instead of PRIMARY KEY
column (and use nonclustered index)

Use Erlands suggestion as well, it will save you a load of unneeded
UPDATEs by inserting a value that is correct in 99% of the cases.

There's a lot more wrong with the table, though. The amount of
nullable columns is absolutely scary! And the number of string columns
that all have the same maximum length is also suspicious. Some columns
should be combined (filltime and filldate should either be a single
datetime column, or two seperate columns using the date and time data
types respectively). And I'm not sure if all columns really belong in
this table - my -admitedly limited- understanding of the stock market
is that open and close are functionally dependant on symbol and trade
day, so they don't belong in a trades table. So you might consider
rethinking the entire design.
 
Thank you everybody for your suggestions. I really appreciate it.
Yes.

In our application, we do many queries like
SELECT ...FROM HisttradesOrig where [Order] = ''
And, we do never do
SELECT ...FROM HisttradesOrig where [ID] = ''

The only purpose of ID is really to create the order number ([Order] =
@ACCOUNT + '-' + convert(varchar(50),@ID) )

In this case, do you still suggest to change ID to be the Primary Key
and [Order] to have UNIQUE constraint instead of PRIMARY KEY column
(and use nonclustered index) ?

I will take a look at the nullable columns, as well.
What are the negative effect of having these nullable columns ?

Thanks again !
 
The extra TRY block is there if two processes tries to insert rows for thesame account simultaneously.
Did you mean this TRY block ?
BEGIN CATCH
IF error_message() = 2627 GOTO retry
-- Retrhow any other error.
END CATCH

Is it possible the retry will go infinitely on this ?
Do I want to limit the retry to a maximum number of times ?


Here is the updated stored procedure. Does it look correct ?
DECLARE @Success int, @TryNr int, @ErrMsg varchar(500), @ErrNumber
int, @expected_id int;
SET @Success = 0;
SET @TryNr = 1;
SET @ErrMsg = ''
WHILE @Success = 0 AND @TryNr <= 10 and @ErrMsg = ''
BEGIN;
BEGIN TRY; BEGIN TRANSACTION;
retry:
BEGIN TRY;
SELECT @expected_id = ident_current('HistTradesOrig') + 1
SELECT @order = @account + '-' + convert(varchar(50),
@expected_id)
insert into HistTradesOrig
([Floor],
[Order],Date,TradeType,Quantity,Symbol,FillPrice,OrderType,OrderSpec,OrderPrice,
Put,Call,[Open],[Close],ACCOUNT,Cancel,[Day],Strike,Email,filldate,
[Time],CustomerTime,
Status,TempOrder,[Replace],FloorConf,DeskOK,Location,LName,[Key],
[Message],Listname)
values

(@Floor,@Order,@Date,@TradeType,@Quantity,@Symbol,@FillPrice,@OrderType,@OrderSpec,

@OrderPrice,@Put,@Call,@Open,@Close,@ACCOUNT,@Cancel,@Day,@Strike,@Email,@FillDate,@Time,

@CustomerTime,@Status,@TempOrder,@Replace,@FloorConf,@DeskOK,@Location,@LName,@Key,@Message,@Listname)
END TRY
BEGIN CATCH;
IF error_message() = 2627 GOTO retry
-- Retrhow any other error.
END CATCH

select @ID = SCOPE_IDENTITY()
IF @id <> @expected_id
BEGIN
UPDATE HistTradesOrig
SET [Order] = @ACCOUNT + '-' + convert(varchar(50),@ID)
WHERE ID = @ID
END
SET @Success = 1
COMMIT TRANSACTION;
END TRY
BEGIN CATCH;
ROLLBACK TRANSACTION;
SELECT @ErrNumber = ERROR_NUMBER()
IF @ErrNumber = 1205
BEGIN;
SET @TryNr = @TryNr + 1;
IF @TryNr > 3
RAISERROR ('Giving up after 3 consecutive deadlocks for %s',
16, 1,@ACCOUNT);
END;
ELSE
BEGIN;
SET @ErrMsg = ERROR_MESSAGE();
RAISERROR (@ErrMsg, 16, 1);
END;
END CATCH;
END;


Many Thanks !


Thank you everybody for your suggestions. I really appreciate it.

Yes.

In our application, we do many queries like
SELECT ...FROM HisttradesOrig where [Order] = ''
And, we do never do
SELECT ...FROM HisttradesOrig where [ID] = ''

The only purpose of ID is really to create the order number ([Order] =
@ACCOUNT + '-' + convert(varchar(50),@ID) )

In this case, do you still suggest to change ID to be the Primary Key
and [Order] to have UNIQUE constraint instead of PRIMARY KEY column
(and use nonclustered index) ?

I will take a look at the nullable columns, as well.
What are the negative effect of having these nullable columns ?

Thanks again !

Hi fniles,
I guess this is the same table we discussed in the other thread,
right?
Based on the table layout and the stored procedure you posted, my
suggestion would be to make the following changes:
==> ID column should be primary key (and use clustered index)
==> order column should have UNIQUE constraint instead of PRIMARY KEY
column (and use nonclustered index)
Use Erlands suggestion as well, it will save you a load of unneeded
UPDATEs by inserting a value that is correct in 99% of the cases.
There's a lot more wrong with the table, though. The amount of
nullable columns is absolutely scary! And the number of string columns
that all have the same maximum length is also suspicious. Some columns
should be combined (filltime and filldate should either be a single
datetime column, or two seperate columns using the date and time data
types respectively). And I'm not sure if all columns really belong in
this table - my -admitedly limited- understanding of the stock market
is that open and close are functionally dependant on symbol and trade
day, so they don't belong in a trades table. So you might consider
rethinking the entire design.

- Show quoted text -
 
fniles said:
Did you mean this TRY block ?
BEGIN CATCH
IF error_message() = 2627 GOTO retry
-- Retrhow any other error.
END CATCH

Is it possible the retry will go infinitely on this ?

Not if myu thinking was right.
BEGIN CATCH;
IF error_message() = 2627 GOTO retry
-- Retrhow any other error.
END CATCH


That comment was to for you to fill in.




--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
 
fniles said:
The only purpose of ID is really to create the order number ([Order] =
@ACCOUNT + '-' + convert(varchar(50),@ID) )

In this case, do you still suggest to change ID to be the Primary Key
and [Order] to have UNIQUE constraint instead of PRIMARY KEY column
(and use nonclustered index) ?

Yes. If you would create a child table, you would use ID as the foreign key
in that table.
I will take a look at the nullable columns, as well.
What are the negative effect of having these nullable columns ?

It indicates an incomplete design.

For every nullable column you should ask yourself "What does it mean
that this column is NULL?" and "Do I handle that situation?".

For instance, Date is nullable. What does it mean that the date for the
order is NULL? (And why is Date varchar(50)??????)

Account appears to be a key here - but it is nullable. Maybe the account
is nullable for some special order type, but in that case you should
express this in a constraint.

You have both Put and Call and they are smallint. I would think that
Put and Call are mutually exclusive. I would expect them to be be binary.
What does Put = 4 mean? Same goes for Open/Close.

OrderPrice is varchar(50)????

varchar(50) may be too short for email.

What is Day? Time?

Bill is varchar(1)? Why not char(1)

And so on.

--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
 
Thank you everybody for your suggestions. I really appreciate it.

Yes.

Hi fniles,

Too bad that the discussion somehow got spread out over three threads.
Yesterday, I found myself twice revising my earlier (and already
posted) ideas, based on new information given in another thread, that
I happened to read later.
In our application, we do many queries like
SELECT ...FROM HisttradesOrig where [Order] = ''
And, we do never do
SELECT ...FROM HisttradesOrig where [ID] = ''

The only purpose of ID is really to create the order number ([Order] =
@ACCOUNT + '-' + convert(varchar(50),@ID) )

In this case, do you still suggest to change ID to be the Primary Key
and [Order] to have UNIQUE constraint instead of PRIMARY KEY column
(and use nonclustered index) ?

Maybe, maybe not. You have several options. One is to drop the ID
column completely (I posted a way to generate the order numbers
without using IDENTITY in one of my posts yesterday - you'll have to
see how it performans and scales, and if you like it). The other end
of the spectrum is to keep the ID column, make its index clustered,
and use the ID column in more places, such as for foreign key
references. Or you can keep using your current design, which sits
somewhere in between these extremes.

Which of these options is best really depends on so many factors that
there is no clear-cut answer. If you can, it would be ideal to set up
a test under realistic useage conditions to compare all the
alternatives.
I will take a look at the nullable columns, as well.
What are the negative effect of having these nullable columns ?

If a database allows invalid data, you will end up finding yourself
cleaning invalid data. (Or your successor, if you manage to get away
from the job before the shit hits the fan).
Erland posted several other relevant remarks - your task now is to
think about how you will handle all the possible cases of invalid
data.
How will you handle a row with a NULL value in the TradeType or
Account column?
How will you handle a row with the value "Too much" in the OrderPrice
column?
And what about the value "Yesterday" in the Date column?

These are just a few examples. If you want to have a properly designed
database, you need to ask yourself the following questions for each
column:
* What are valid values?
* What data type best enforces these values?
* Do I need an additional CHECK, UNIQUE of FOREIGN KEY constraint to
prevennt invalid values?
* Is it possible for this column to not have a value at any given
moment (i.e., be NULL)? If so, what does that mean?
 
Did you mean this TRY block ?
BEGIN CATCH
IF error_message() = 2627 GOTO retry
-- Retrhow any other error.
END CATCH

Is it possible the retry will go infinitely on this ?
Do I want to limit the retry to a maximum number of times ?

Hi fniles,

If some of the data in the other columns violates another UNIQUE
constraint, then it will most certainly go on forever. The check for
error 2627 only tells you a unique or primary key coonstraint has been
violated, but not which one.

I don't see any other way for this to loop forever - even under the
most extreme concurrency conditions, you will eventually succeed in
entering a row with the retrieved scope_identity() before another
connection does.
 
Hugo said:
If a database allows invalid data, you will end up finding yourself
cleaning invalid data.

Or the business people may have to deal with an angry customer, because
an invalid order made it to the market, and the customer lost a lot of
money.

Or because the order was accepted and processing failed later because
of an incorrect later, the order never made it to the market, and the
customer lost a lot of money.

In both situations, your organization - or your client - may have to
cough up the money to keep the customer free from damage.

--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
 
Thank you Hugo and Erland for your help.
I appreciate it very much.

I am sorry, I posted it in 3 different threads. I won't do that
anymore.

Erland mentioned this in the earlier posting:
But your UPDATE is changing the clustered index, which means that the entire row will be moved. This means then you first insert a row in one place, you
can cause a page split (unless @order is always increasing), and then you will cause another page split when you move the row.
Does it mean that eventually the database will be slow because of all
the page split ?
If I now change [Order] to be NON Clustered Index and [ID] as Primary
key, will the created page split still be there ?

Regarding Hugo's suggestion of creating the order numbers without
using IDENTITY, is this what you are referring to ?
CREATE TABLE dbo.HistTradesOrig
(ID int IDENTITY(1,1) NOT NULL,
Account varchar(10) NOT NULL,
OrderNo int NOT NULL,
order AS Account + CAST(OrderNo AS char(6)),
-- Other columns,
PRIMARY KEY (ID),
UNIQUE (Account, OrderNo)
);

The problem with that is although in most cases order AS Account +
CAST(OrderNo AS char(6)), but in some cases [Order] is not Account +
CAST(OrderNo AS char(6)).


Thank you everybody for your suggestions. I really appreciate it.
Yes.

Hi fniles,

Too bad that the discussion somehow got spread out over three threads.
Yesterday, I found myself twice revising my earlier (and already
posted) ideas, based on new information given in another thread, that
I happened to read later.
In our application, we do many queries like
SELECT ...FROM HisttradesOrig where [Order] = ''
And, we do never do
SELECT ...FROM HisttradesOrig where [ID] = ''
The only purpose of ID is really to create the order number ([Order] =
@ACCOUNT + '-' + convert(varchar(50),@ID) )
In this case, do you still suggest to change ID to be the Primary Key
and [Order] to have UNIQUE constraint instead of PRIMARY KEY column
(and use nonclustered index) ?

Maybe, maybe not. You have several options. One is to drop the ID
column completely (I posted a way to generate the order numbers
without using IDENTITY in one of my posts yesterday - you'll have to
see how it performans and scales, and if you like it). The other end
of the spectrum is to keep the ID column, make its index clustered,
and use the ID column in more places, such as for foreign key
references. Or you can keep using your current design, which sits
somewhere in between these extremes.

Which of these options is best really depends on so many factors that
there is no clear-cut answer. If you can, it would be ideal to set up
a test under realistic useage conditions to compare all the
alternatives.
I will take a look at the nullable columns, as well.
What are the negative effect of having these nullable columns ?

If a database allows invalid data, you will end up finding yourself
cleaning invalid data. (Or your successor, if you manage to get away
from the job before the shit hits the fan).
Erland posted several other relevant remarks - your task now is to
think about how you will handle all the possible cases of invalid
data.
How will you handle a row with a NULL value in the TradeType or
Account column?
How will you handle a row with the value "Too much" in the OrderPrice
column?
And what about the value "Yesterday" in the Date column?

These are just a few examples. If you want to have a properly designed
database, you need to ask yourself the following questions for each
column:
* What are valid values?
* What data type best enforces these values?
* Do I need an additional CHECK, UNIQUE of FOREIGN KEY constraint to
prevennt invalid values?
* Is it possible for this column to not have a value at any given
moment (i.e., be NULL)? If so, what does that mean?
 
Thank you again for your reply.
If some of the data in the other columns violates another UNIQUE constraint, then it will most certainly go on forever.
If there is only 1 Primary Key in the table (which is currently
[Order]), then does it mean that I don't have to worry about it go on
forever ?
So I don't need to limit the retry to a maximum number of times ?

Is CAST a better/faster way to use than CONVERT ?

Thanks again.
 
Thank you again for your reply.
If some of the data in the other columns violates another UNIQUE constraint, then it will most certainly go on forever.
If there is only 1 Primary Key in the table (which is currently
[Order]), then does it mean that I don't have to worry about it go on
forever ?
So I don't need to limit the retry to a maximum number of times ?

Hi fniles,

As far as I see it, this is correct.
Is CAST a better/faster way to use than CONVERT ?

CAST is ANSI-compliant syntax; reason for me to use it when possible.
But CONVERT has the style parameter that gives more control than CASE
gives you - so when I need to have more control, I use CONVERT.
 
Erland mentioned this in the earlier posting:
But your UPDATE is changing the clustered index, which means that the entire row will be moved. This means then you first insert a row in one place, you
can cause a page split (unless @order is always increasing), and then you will cause another page split when you move the row.
Does it mean that eventually the database will be slow because of all
the page split ?
If I now change [Order] to be NON Clustered Index and [ID] as Primary
key, will the created page split still be there ?

Hi fniles,

Page splits incur a little overhead while inserting or updating the
data, and they cause indexes to become fragmented over time, which
decreases read performance.

Your proposed change will not completely eliminate the page splits;
the nonclustered index on [Order] still will suffer them. But they
will become less frequent the size of an index is less than the size
of the table, so the number of pages is lower; this automatically
decreases the number of page splits.

In any case you'll have to schedule index maintenance; if you
reorganize or rebuild your index at regular intervals, the amount of
fragmentation will remain controlled.
Regarding Hugo's suggestion of creating the order numbers without
using IDENTITY, is this what you are referring to ?
CREATE TABLE dbo.HistTradesOrig
(ID int IDENTITY(1,1) NOT NULL,
Account varchar(10) NOT NULL,
OrderNo int NOT NULL,
order AS Account + CAST(OrderNo AS char(6)),
-- Other columns,
PRIMARY KEY (ID),
UNIQUE (Account, OrderNo)
);

The problem with that is although in most cases order AS Account +
CAST(OrderNo AS char(6)), but in some cases [Order] is not Account +
CAST(OrderNo AS char(6)).

Yes, that was what I was referring to. What are the cases where the
order is deduced in a different way? And how is it deduced in those
cases?
 
But they will become less frequent the size of an index is less than the size of the table, so the number of pages is lower; this automatically decreases the number >of page splits.
So, if I make [ID] as the Primary Key, the # of page split will be
fewer than if [Order] is the Primary Key ?
In any case you'll have to schedule index maintenance; if you reorganize or rebuild your index at regular intervals, the amount of fragmentation will remain controlled.
Will database backup reorganize/rebuild indexes ?
Which one do you suggest to do: reorganize or rebuild indexes ?
If I reorganize, I want to reorganize the HisttradesOrig table,
correct ? Do I want to check "Compact large objects" ? Do I need to
reorganize the "view" that refer to HisttradesOrig table ?
If I rebuild, do I want to reorganize the HisttradesOrig table, or
also the "view" that refer to HisttradesOrig table ? And, shall I
select "Reorganize pages with the default amount of free space" ? And,
shall I select "Sort results in tempdb" and "Keep index online while
reindexing" ?

Can I reorganize or rebuild indexes during weekdays when records are
inserted ? Or, shall I do it in the weekends, when there is no records
inserted ?
How often do you suggest to reorganize or rebuild indexes ?

Is there a way to tell how much fragmentation is there in the
database ?

Thanks a lot for your help.


Erland mentioned this in the earlier posting:
But your UPDATE is changing the clustered index, which means that the entire row will be moved. This means then you first insert a row in one place, you
can cause a page split (unless @order is always increasing), and then you will cause another page split when you move the row.
Does it mean that eventually the database will be slow because of all
the page split ?
If I now change [Order] to be NON Clustered Index and [ID] as Primary
key, will the created page split still be there ?

Hi fniles,

Page splits incur a little overhead while inserting or updating the
data, and they cause indexes to become fragmented over time, which
decreases read performance.

Your proposed change will not completely eliminate the page splits;
the nonclustered index on [Order] still will suffer them. But they
will become less frequent  the size of an index is less than the size
of the table, so the number of pages is lower; this automatically
decreases the number of page splits.

In any case you'll have to schedule index maintenance; if you
reorganize or rebuild your index at regular intervals, the amount of
fragmentation will remain controlled.
Regarding Hugo's suggestion of creating the order numbers without
using IDENTITY, is this what you are referring to ?
CREATE TABLE dbo.HistTradesOrig
   (ID int IDENTITY(1,1) NOT NULL,
    Account varchar(10) NOT NULL,
    OrderNo int NOT NULL,
    order AS Account + CAST(OrderNo AS char(6)),
--   Other columns,
    PRIMARY KEY (ID),
    UNIQUE (Account, OrderNo)
   );
The problem with that is although in most cases order AS Account +
CAST(OrderNo AS char(6)), but in some cases [Order] is not Account +
CAST(OrderNo AS char(6)).

Yes, that was what I was referring to. What are the cases where the
order is deduced in a different way? And how is it deduced in those
cases?
 
Back
Top