M
Michael Lang
I've noticed people recommending int datatypes for sql tables. I always use
uniqueidentifier (GUID). I wanted to start a discussion on the merits of
each in an ADO.NET development environment. Let's please keep it a
professional discussion that includes reasons for each method, and no
slamming.
First I'll start by stating some basic facts.
uniqueidentifier / GUID (from .NET help files on Guid enumeration)
A GUID is a 128-bit integer (16 bytes) that can be used across all computers
and networks wherever a unique identifier is required. Such an identifier
has a very low probability of being duplicated.
bigint (from qsl server help)
Integer (whole number) data from -2^63 (-9223372036854775808) through 2^63-1
(9223372036854775807). Storage size is 8 bytes.
int (from qsl server help)
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1
(2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is
integer.
smallint (from qsl server help)
Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is
2 bytes.
tinyint (from qsl server help)
Integer data from 0 through 255. Storage size is 1 byte.
Now for my opinion.
I didn't mention any other types for primary keys, such as strings (nchar,
nvarchar, etc...). I don't think leaving the decision as to the PK for a
record is a good permission to give the user. You can still implement a
string "identifer" field that the user can enter and use for searches later,
but I don't think you should make that the PK. An example of a string based
PK is in the 'pubs' database. You can still "index" other fields besides
the PK if you want to make the search faster.
Many older databases may have been designed before GUID was available. I
don't question that it would not be feasible to convert them to use GUID.
I'm more concerned with new database designs.
So what are the drawbacks to using GUID?
1) 16 bytes instead of the typical int index (int type) of 4 bytes. When
translated to the maximum record count of 4,294,967,295 per table (if use
all from min to max)...
int = 17,179,869,180 bytes
guid = 68,719,476,720 bytes
Bringing it back to reality, what database it going to use EVERY possible
index for int? If you are going to have that many records in the lifetime
of your application you should use GUID just to give room for growth.
For tables that have only a small number of records (say 100-1000), the
total memory used not alot using GUID.
2) performance? I'm ASSuming that it takes longer to search a larger index
field?
So what are the benefits to using GUID?
1) 4 times as many possible records as int, 2 times as many as bigint. Of
coarse this is only a benefit to mega-applications
2) Ease of programming. no dependency on calling @@Identity and basically
waiting for SQL to assign your records ID's. In .NET just call
GUID.NewGuid() at record/object creation. In this case you can allow
creating the index in the business layer of your application. You don't
have to worry about assigning a temporary value in the business layer, and
making sure it is updated later. This is really a problem if the user wants
to create a record in another table that has a relation to the first table.
Now you have 2 or even more objects you need to remember to update the
Identity on.
When you create a new business object, "Company"...
public class Company
{
private GUID _companyID;
private string _name;
public Company()
{
_companyID= Guid.NewGuid();
_name = "New Company"; //whatever...
}
...
}
Now when you update these objects to the datalayer, create a new dataRow,
set each field to the class instances property values, and do an update.
DataTable dt = ds.Tables["Company"];
DataRow dr = dt.NewRow();
dr["CompanyID"] = companyNew.CompanyID;
dr["Name"] = companyNew.Name;
dt.Rows.Add(dr);
da.Update(ds);
General guidelines that I follow:
1) if a table is will not have any records added, use the smallest integer
based index possible. An example would be a "lookup" table, such as a table
of American states. If less than 256 records, use tinyint. less than 64k
records, use smallint, etc...
2) every other table I use a GUID as the PK.
Is there a flaw in my assesment? Or are there drawbacks or benefits I've
missed? Does anyone else that uses GUID's have additional guidelines? Of
coarse there will always be deviations from best practices. But what is or
should be the new best practice?
Michael Lang, MCSD
uniqueidentifier (GUID). I wanted to start a discussion on the merits of
each in an ADO.NET development environment. Let's please keep it a
professional discussion that includes reasons for each method, and no
slamming.
First I'll start by stating some basic facts.
uniqueidentifier / GUID (from .NET help files on Guid enumeration)
A GUID is a 128-bit integer (16 bytes) that can be used across all computers
and networks wherever a unique identifier is required. Such an identifier
has a very low probability of being duplicated.
bigint (from qsl server help)
Integer (whole number) data from -2^63 (-9223372036854775808) through 2^63-1
(9223372036854775807). Storage size is 8 bytes.
int (from qsl server help)
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1
(2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is
integer.
smallint (from qsl server help)
Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is
2 bytes.
tinyint (from qsl server help)
Integer data from 0 through 255. Storage size is 1 byte.
Now for my opinion.
I didn't mention any other types for primary keys, such as strings (nchar,
nvarchar, etc...). I don't think leaving the decision as to the PK for a
record is a good permission to give the user. You can still implement a
string "identifer" field that the user can enter and use for searches later,
but I don't think you should make that the PK. An example of a string based
PK is in the 'pubs' database. You can still "index" other fields besides
the PK if you want to make the search faster.
Many older databases may have been designed before GUID was available. I
don't question that it would not be feasible to convert them to use GUID.
I'm more concerned with new database designs.
So what are the drawbacks to using GUID?
1) 16 bytes instead of the typical int index (int type) of 4 bytes. When
translated to the maximum record count of 4,294,967,295 per table (if use
all from min to max)...
int = 17,179,869,180 bytes
guid = 68,719,476,720 bytes
Bringing it back to reality, what database it going to use EVERY possible
index for int? If you are going to have that many records in the lifetime
of your application you should use GUID just to give room for growth.
For tables that have only a small number of records (say 100-1000), the
total memory used not alot using GUID.
2) performance? I'm ASSuming that it takes longer to search a larger index
field?
So what are the benefits to using GUID?
1) 4 times as many possible records as int, 2 times as many as bigint. Of
coarse this is only a benefit to mega-applications
2) Ease of programming. no dependency on calling @@Identity and basically
waiting for SQL to assign your records ID's. In .NET just call
GUID.NewGuid() at record/object creation. In this case you can allow
creating the index in the business layer of your application. You don't
have to worry about assigning a temporary value in the business layer, and
making sure it is updated later. This is really a problem if the user wants
to create a record in another table that has a relation to the first table.
Now you have 2 or even more objects you need to remember to update the
Identity on.
When you create a new business object, "Company"...
public class Company
{
private GUID _companyID;
private string _name;
public Company()
{
_companyID= Guid.NewGuid();
_name = "New Company"; //whatever...
}
...
}
Now when you update these objects to the datalayer, create a new dataRow,
set each field to the class instances property values, and do an update.
DataTable dt = ds.Tables["Company"];
DataRow dr = dt.NewRow();
dr["CompanyID"] = companyNew.CompanyID;
dr["Name"] = companyNew.Name;
dt.Rows.Add(dr);
da.Update(ds);
General guidelines that I follow:
1) if a table is will not have any records added, use the smallest integer
based index possible. An example would be a "lookup" table, such as a table
of American states. If less than 256 records, use tinyint. less than 64k
records, use smallint, etc...
2) every other table I use a GUID as the PK.
Is there a flaw in my assesment? Or are there drawbacks or benefits I've
missed? Does anyone else that uses GUID's have additional guidelines? Of
coarse there will always be deviations from best practices. But what is or
should be the new best practice?
Michael Lang, MCSD