GUID vs other integer types for Primary Keys?

  • Thread starter Thread starter Michael Lang
  • Start date Start date
M

Michael Lang

I've noticed people recommending int datatypes for sql tables. I always use
uniqueidentifier (GUID). I wanted to start a discussion on the merits of
each in an ADO.NET development environment. Let's please keep it a
professional discussion that includes reasons for each method, and no
slamming.

First I'll start by stating some basic facts.

uniqueidentifier / GUID (from .NET help files on Guid enumeration)
A GUID is a 128-bit integer (16 bytes) that can be used across all computers
and networks wherever a unique identifier is required. Such an identifier
has a very low probability of being duplicated.

bigint (from qsl server help)
Integer (whole number) data from -2^63 (-9223372036854775808) through 2^63-1
(9223372036854775807). Storage size is 8 bytes.

int (from qsl server help)
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1
(2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is
integer.

smallint (from qsl server help)
Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is
2 bytes.

tinyint (from qsl server help)
Integer data from 0 through 255. Storage size is 1 byte.

Now for my opinion.

I didn't mention any other types for primary keys, such as strings (nchar,
nvarchar, etc...). I don't think leaving the decision as to the PK for a
record is a good permission to give the user. You can still implement a
string "identifer" field that the user can enter and use for searches later,
but I don't think you should make that the PK. An example of a string based
PK is in the 'pubs' database. You can still "index" other fields besides
the PK if you want to make the search faster.

Many older databases may have been designed before GUID was available. I
don't question that it would not be feasible to convert them to use GUID.
I'm more concerned with new database designs.

So what are the drawbacks to using GUID?
1) 16 bytes instead of the typical int index (int type) of 4 bytes. When
translated to the maximum record count of 4,294,967,295 per table (if use
all from min to max)...
int = 17,179,869,180 bytes
guid = 68,719,476,720 bytes

Bringing it back to reality, what database it going to use EVERY possible
index for int? If you are going to have that many records in the lifetime
of your application you should use GUID just to give room for growth.

For tables that have only a small number of records (say 100-1000), the
total memory used not alot using GUID.

2) performance? I'm ASSuming that it takes longer to search a larger index
field?

So what are the benefits to using GUID?
1) 4 times as many possible records as int, 2 times as many as bigint. Of
coarse this is only a benefit to mega-applications

2) Ease of programming. no dependency on calling @@Identity and basically
waiting for SQL to assign your records ID's. In .NET just call
GUID.NewGuid() at record/object creation. In this case you can allow
creating the index in the business layer of your application. You don't
have to worry about assigning a temporary value in the business layer, and
making sure it is updated later. This is really a problem if the user wants
to create a record in another table that has a relation to the first table.
Now you have 2 or even more objects you need to remember to update the
Identity on.

When you create a new business object, "Company"...
public class Company
{
private GUID _companyID;
private string _name;
public Company()
{
_companyID= Guid.NewGuid();
_name = "New Company"; //whatever...
}
...
}

Now when you update these objects to the datalayer, create a new dataRow,
set each field to the class instances property values, and do an update.

DataTable dt = ds.Tables["Company"];
DataRow dr = dt.NewRow();
dr["CompanyID"] = companyNew.CompanyID;
dr["Name"] = companyNew.Name;
dt.Rows.Add(dr);
da.Update(ds);

General guidelines that I follow:
1) if a table is will not have any records added, use the smallest integer
based index possible. An example would be a "lookup" table, such as a table
of American states. If less than 256 records, use tinyint. less than 64k
records, use smallint, etc...

2) every other table I use a GUID as the PK.

Is there a flaw in my assesment? Or are there drawbacks or benefits I've
missed? Does anyone else that uses GUID's have additional guidelines? Of
coarse there will always be deviations from best practices. But what is or
should be the new best practice?

Michael Lang, MCSD
 
Michael,
I have run into the @@Identity issue for my Parent-Child datatables. (Also
the Oracle Sequence issue.)
Luckily I read ADO.Net by David Sceppa!

I got it working, but what a hack compared to using GUIDs!
Assign fake Identity values, increment in the opposite direction.
Then when updating the database, hook into the row updated event to issue
the Select @@Identity command and update the PK in the dataset to the real
value which then cascades to the child table.
This allows me to then post the child table data to the server using the
correct related key.

(All examples assume that Identity values start at 1. But that gives up 1/2
the possible values and mine start at
-2.xx Billion.)

I too would be interested in hearing more about the use of GUIDs.
Are they too good to be true?
What are the drawbacks that have not been pointed out already?
--
Joe Fallon



Michael Lang said:
I've noticed people recommending int datatypes for sql tables. I always use
uniqueidentifier (GUID). I wanted to start a discussion on the merits of
each in an ADO.NET development environment. Let's please keep it a
professional discussion that includes reasons for each method, and no
slamming.

First I'll start by stating some basic facts.

uniqueidentifier / GUID (from .NET help files on Guid enumeration)
A GUID is a 128-bit integer (16 bytes) that can be used across all computers
and networks wherever a unique identifier is required. Such an identifier
has a very low probability of being duplicated.

bigint (from qsl server help)
Integer (whole number) data from -2^63 (-9223372036854775808) through 2^63-1
(9223372036854775807). Storage size is 8 bytes.

int (from qsl server help)
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1
(2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is
integer.

smallint (from qsl server help)
Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage size is
2 bytes.

tinyint (from qsl server help)
Integer data from 0 through 255. Storage size is 1 byte.

Now for my opinion.

I didn't mention any other types for primary keys, such as strings (nchar,
nvarchar, etc...). I don't think leaving the decision as to the PK for a
record is a good permission to give the user. You can still implement a
string "identifer" field that the user can enter and use for searches later,
but I don't think you should make that the PK. An example of a string based
PK is in the 'pubs' database. You can still "index" other fields besides
the PK if you want to make the search faster.

Many older databases may have been designed before GUID was available. I
don't question that it would not be feasible to convert them to use GUID.
I'm more concerned with new database designs.

So what are the drawbacks to using GUID?
1) 16 bytes instead of the typical int index (int type) of 4 bytes. When
translated to the maximum record count of 4,294,967,295 per table (if use
all from min to max)...
int = 17,179,869,180 bytes
guid = 68,719,476,720 bytes

Bringing it back to reality, what database it going to use EVERY possible
index for int? If you are going to have that many records in the lifetime
of your application you should use GUID just to give room for growth.

For tables that have only a small number of records (say 100-1000), the
total memory used not alot using GUID.

2) performance? I'm ASSuming that it takes longer to search a larger index
field?

So what are the benefits to using GUID?
1) 4 times as many possible records as int, 2 times as many as bigint. Of
coarse this is only a benefit to mega-applications

2) Ease of programming. no dependency on calling @@Identity and basically
waiting for SQL to assign your records ID's. In .NET just call
GUID.NewGuid() at record/object creation. In this case you can allow
creating the index in the business layer of your application. You don't
have to worry about assigning a temporary value in the business layer, and
making sure it is updated later. This is really a problem if the user wants
to create a record in another table that has a relation to the first table.
Now you have 2 or even more objects you need to remember to update the
Identity on.

When you create a new business object, "Company"...
public class Company
{
private GUID _companyID;
private string _name;
public Company()
{
_companyID= Guid.NewGuid();
_name = "New Company"; //whatever...
}
...
}

Now when you update these objects to the datalayer, create a new dataRow,
set each field to the class instances property values, and do an update.

DataTable dt = ds.Tables["Company"];
DataRow dr = dt.NewRow();
dr["CompanyID"] = companyNew.CompanyID;
dr["Name"] = companyNew.Name;
dt.Rows.Add(dr);
da.Update(ds);

General guidelines that I follow:
1) if a table is will not have any records added, use the smallest integer
based index possible. An example would be a "lookup" table, such as a table
of American states. If less than 256 records, use tinyint. less than 64k
records, use smallint, etc...

2) every other table I use a GUID as the PK.

Is there a flaw in my assesment? Or are there drawbacks or benefits I've
missed? Does anyone else that uses GUID's have additional guidelines? Of
coarse there will always be deviations from best practices. But what is or
should be the new best practice?

Michael Lang, MCSD
 
Those sound like concerns a DB admin would have rather than a developer.
But good points, we do need to keep DB admins happy too.

I don't use Query Analyzer much. What is hard about it? Using Ent Manager,
I do find it hard to create new records in the database. It won't let you
type a guid string into a Guid field! I get the error: "The value you
entered is not consistent with the datatype or length of the column". I
know I am typing a valid GUID because I exported the data from a real app to
an XML file, and just pasted that directly into the GUID field. Maybe it
expects it in another format? I end up writing my db layer before adding
any records to the database. I use an "admin" version of my application for
testing Sprocs and other query strings. The admin app is a prime target for
code generation. (see tool in my signature)

An int PK value may give an idea of when records were added, but gives no
idea as to when they were last modified. Therefore, if you have the
slightest requirement for auditing edit order/history you need to add a
DateTime field storing the last modified date (and possibly created date).

I would say the best option is one that makes the lifetime of the database
simpler (not just creation time) and helps prevent errors introduced by
applications. As far as applications go, GUID's seem like the simpler
solution. Although from what you are saying it is harder to maintain?
Sounds like this is where MS needs to improve?

Michael Lang, MCSD
Take a look at my open source database project code generator...
http://sourceforge.net/projects/dbobjecter

Kathleen Dollard said:
Michael,

GUIDs have benefits. I think you left out their most important drawback.
They make running stored procs in Query Analyzer to test the return for a
particular record exceedingly difficult.

I think you also left out a benefit of int PKs. They give you a quick
intuitive idea of the order records were added.

They are both valid approaches. Depends on whether you want your hassles
when you add data, or later.

--
Kathleen (MVP-VB)



Michael Lang said:
I've noticed people recommending int datatypes for sql tables. I always use
uniqueidentifier (GUID). I wanted to start a discussion on the merits of
each in an ADO.NET development environment. Let's please keep it a
professional discussion that includes reasons for each method, and no
slamming.

First I'll start by stating some basic facts.

uniqueidentifier / GUID (from .NET help files on Guid enumeration)
A GUID is a 128-bit integer (16 bytes) that can be used across all computers
and networks wherever a unique identifier is required. Such an identifier
has a very low probability of being duplicated.

bigint (from qsl server help)
Integer (whole number) data from -2^63 (-9223372036854775808) through 2^63-1
(9223372036854775807). Storage size is 8 bytes.

int (from qsl server help)
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1
(2,147,483,647). Storage size is 4 bytes. The SQL-92 synonym for int is
integer.

smallint (from qsl server help)
Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767). Storage
size
is
2 bytes.

tinyint (from qsl server help)
Integer data from 0 through 255. Storage size is 1 byte.

Now for my opinion.

I didn't mention any other types for primary keys, such as strings (nchar,
nvarchar, etc...). I don't think leaving the decision as to the PK for a
record is a good permission to give the user. You can still implement a
string "identifer" field that the user can enter and use for searches later,
but I don't think you should make that the PK. An example of a string based
PK is in the 'pubs' database. You can still "index" other fields besides
the PK if you want to make the search faster.

Many older databases may have been designed before GUID was available. I
don't question that it would not be feasible to convert them to use GUID.
I'm more concerned with new database designs.

So what are the drawbacks to using GUID?
1) 16 bytes instead of the typical int index (int type) of 4 bytes. When
translated to the maximum record count of 4,294,967,295 per table (if use
all from min to max)...
int = 17,179,869,180 bytes
guid = 68,719,476,720 bytes

Bringing it back to reality, what database it going to use EVERY possible
index for int? If you are going to have that many records in the lifetime
of your application you should use GUID just to give room for growth.

For tables that have only a small number of records (say 100-1000), the
total memory used not alot using GUID.

2) performance? I'm ASSuming that it takes longer to search a larger index
field?

So what are the benefits to using GUID?
1) 4 times as many possible records as int, 2 times as many as bigint. Of
coarse this is only a benefit to mega-applications

2) Ease of programming. no dependency on calling @@Identity and basically
waiting for SQL to assign your records ID's. In .NET just call
GUID.NewGuid() at record/object creation. In this case you can allow
creating the index in the business layer of your application. You don't
have to worry about assigning a temporary value in the business layer, and
making sure it is updated later. This is really a problem if the user wants
to create a record in another table that has a relation to the first table.
Now you have 2 or even more objects you need to remember to update the
Identity on.

When you create a new business object, "Company"...
public class Company
{
private GUID _companyID;
private string _name;
public Company()
{
_companyID= Guid.NewGuid();
_name = "New Company"; //whatever...
}
...
}

Now when you update these objects to the datalayer, create a new dataRow,
set each field to the class instances property values, and do an update.

DataTable dt = ds.Tables["Company"];
DataRow dr = dt.NewRow();
dr["CompanyID"] = companyNew.CompanyID;
dr["Name"] = companyNew.Name;
dt.Rows.Add(dr);
da.Update(ds);

General guidelines that I follow:
1) if a table is will not have any records added, use the smallest integer
based index possible. An example would be a "lookup" table, such as a table
of American states. If less than 256 records, use tinyint. less than 64k
records, use smallint, etc...

2) every other table I use a GUID as the PK.

Is there a flaw in my assesment? Or are there drawbacks or benefits I've
missed? Does anyone else that uses GUID's have additional guidelines? Of
coarse there will always be deviations from best practices. But what is or
should be the new best practice?

Michael Lang, MCSD
 
My DBA's let me play in the test database said:
I don't use Query Analyzer much. What is hard about it?

Select * From 368927

vs.

Select * From FDA5SDFPOJKN239K8G3T

My brain can't handle the second, so it just means I have to do that part of
my job differently.

GUIDs are simpler for some things. But they are more difficult for others.
We frequently have issues with individual records. So, we the users can
report the specific record, via PK. I think we'd be forced to an automatic
email system or something ot copy the GUID. All things that can be done, but
GUIDs do require changes in the way you manage your data from a
communication perspective, as well as the sheer technical aspects of
inserting records. I think these are the reasons that they have not caught
on.

--
Kathleen (MVP-VB)



Michael Lang said:
Those sound like concerns a DB admin would have rather than a developer.
But good points, we do need to keep DB admins happy too.

I don't use Query Analyzer much. What is hard about it? Using Ent Manager,
I do find it hard to create new records in the database. It won't let you
type a guid string into a Guid field! I get the error: "The value you
entered is not consistent with the datatype or length of the column". I
know I am typing a valid GUID because I exported the data from a real app to
an XML file, and just pasted that directly into the GUID field. Maybe it
expects it in another format? I end up writing my db layer before adding
any records to the database. I use an "admin" version of my application for
testing Sprocs and other query strings. The admin app is a prime target for
code generation. (see tool in my signature)

An int PK value may give an idea of when records were added, but gives no
idea as to when they were last modified. Therefore, if you have the
slightest requirement for auditing edit order/history you need to add a
DateTime field storing the last modified date (and possibly created date).

I would say the best option is one that makes the lifetime of the database
simpler (not just creation time) and helps prevent errors introduced by
applications. As far as applications go, GUID's seem like the simpler
solution. Although from what you are saying it is harder to maintain?
Sounds like this is where MS needs to improve?

Michael Lang, MCSD
Take a look at my open source database project code generator...
http://sourceforge.net/projects/dbobjecter

Kathleen Dollard said:
Michael,

GUIDs have benefits. I think you left out their most important drawback.
They make running stored procs in Query Analyzer to test the return for a
particular record exceedingly difficult.

I think you also left out a benefit of int PKs. They give you a quick
intuitive idea of the order records were added.

They are both valid approaches. Depends on whether you want your hassles
when you add data, or later.

--
Kathleen (MVP-VB)



always
use size
for
a
record is a good permission to give the user. You can still implement a
string "identifer" field that the user can enter and use for searches later,
but I don't think you should make that the PK. An example of a string based
PK is in the 'pubs' database. You can still "index" other fields besides
the PK if you want to make the search faster.

Many older databases may have been designed before GUID was available. I
don't question that it would not be feasible to convert them to use GUID.
I'm more concerned with new database designs.

So what are the drawbacks to using GUID?
1) 16 bytes instead of the typical int index (int type) of 4 bytes. When
translated to the maximum record count of 4,294,967,295 per table (if use
all from min to max)...
int = 17,179,869,180 bytes
guid = 68,719,476,720 bytes

Bringing it back to reality, what database it going to use EVERY possible
index for int? If you are going to have that many records in the lifetime
of your application you should use GUID just to give room for growth.

For tables that have only a small number of records (say 100-1000), the
total memory used not alot using GUID.

2) performance? I'm ASSuming that it takes longer to search a larger index
field?

So what are the benefits to using GUID?
1) 4 times as many possible records as int, 2 times as many as bigint. Of
coarse this is only a benefit to mega-applications

2) Ease of programming. no dependency on calling @@Identity and basically
waiting for SQL to assign your records ID's. In .NET just call
GUID.NewGuid() at record/object creation. In this case you can allow
creating the index in the business layer of your application. You don't
have to worry about assigning a temporary value in the business layer, and
making sure it is updated later. This is really a problem if the user wants
to create a record in another table that has a relation to the first table.
Now you have 2 or even more objects you need to remember to update the
Identity on.

When you create a new business object, "Company"...
public class Company
{
private GUID _companyID;
private string _name;
public Company()
{
_companyID= Guid.NewGuid();
_name = "New Company"; //whatever...
}
...
}

Now when you update these objects to the datalayer, create a new dataRow,
set each field to the class instances property values, and do an update.

DataTable dt = ds.Tables["Company"];
DataRow dr = dt.NewRow();
dr["CompanyID"] = companyNew.CompanyID;
dr["Name"] = companyNew.Name;
dt.Rows.Add(dr);
da.Update(ds);

General guidelines that I follow:
1) if a table is will not have any records added, use the smallest integer
based index possible. An example would be a "lookup" table, such as a table
of American states. If less than 256 records, use tinyint. less than 64k
records, use smallint, etc...

2) every other table I use a GUID as the PK.

Is there a flaw in my assesment? Or are there drawbacks or benefits I've
missed? Does anyone else that uses GUID's have additional guidelines? Of
coarse there will always be deviations from best practices. But what
is
or
should be the new best practice?

Michael Lang, MCSD
 
Back
Top