Autonumbering Primary Keys

G

Guest

Hi! I'm doing an orders database. For the primary key, I was going to use the
order numbers since those will always be unique. However, it seems that
Access suggests that I have a separate column for OrderID using Autonumbers.

Is there a reason I should always have a separate ID column as the primary
key? Should I not use Order numbers or part numbers as primary keys?
 
M

Mark

As long as your order number or part number will always be unique (or should
be), go ahead and use it as the primary key. You can even select multiple
fields and make the combination of them a primary key--for instance a
city/state/zip together could be a primary key even though the same city
could be in many states, or the same zip could be assigned to many cities,
the combination of city/state/zip would be unique.
 
O

onedaywhen

CC said:
I'm doing an orders database. For the primary key, I was going to use the
order numbers since those will always be unique. However, it seems that
Access suggests that I have a separate column for OrderID using Autonumbers.

Is there a reason I should always have a separate ID column as the primary
key? Should I not use Order numbers or part numbers as primary keys?

I think it's a lowest common denominator thing i.e. the documentation
doesn't assume the reader can determine for themselves whether a
(natural) key column will *always* be unique so suggests using an
autonumber as a candidate (artificial) key. The choice of the
autonumber as the sole primary key (PK) column is questionable, though
e.g. if it's only purpose is to ensure uniqueness then why not
(order_nbr, ID) as a composite PK rather than just ID?

Some users like to use an autonumber candidate key internally for
foreign key (FK) relationships, joins, etc on the basis that a numeric
(INTEGER is the most popular choice for autonumber) is more 'efficient'
than using e.g. a 10 character text column. The problem is it destroys
data integrity (the effects of which may be mitigated by taking regular
backups). Another common problem is that other constraints are
seemingly omitted e.g. I often see a design like this for a
relationship/junction table:

tblEmployeesSkills
--------------
EmployeesSkillsID (autonumber PK)
EmployeeID (FK references tblEmployees)
SkillID (FK references tblSkills)

The autonumber pretty redundant here: (SkillID, EmployeeID) is usually
what is required for a PK. The autonumber PK could distract the reader
from the fact there are missing UNIQUE constraint(s) to prevent
duplicates of the same (SkillID, EmployeeID) pairing.

Some people here clearly love autonumber PKs, though. In a recent-ish
discussion, one MS Access MVP declared he used them in virtually every
one of his tables and gave the following arguments in support of his
approach:

1) He is an MVP and he uses them in virtually every one of his tables.
2) Er...
3) That's it.

Jamie.

--
 
G

Guest

This is an ongoing debate. I personally use autonumbers in most cases, even
if I don't see an immediate use for them. It's what I learned, and what I am
accustomed to using. People have maintained that by creating an unneeded
index (or something like that), it slows down performance, but I'm not
convinced that a combined PK is an improvement in that regard. I have heard
of instances where a company changed the format of an EmployeeID number or
some other number (by appending a 0, for instance, to allow for more IDs than
the current system allowed). Had EmployeeID been used as the primary key, or
as part of a combined PK, it could have been pretty messy.
Having said that, Order Number can serve as a PK if you like, since it will
be unique no matter if it is 1 digit or 10. You may want to look into an
incremented order number, in which 1 (or whatever you want) is added to the
number from the previous record. Try a groups search for "increment number
primary key" or "increment autonumber" or "simulated autonumber" or something
of the sort.
Use what works best for you. There is no single correct answer for every
situation.
 
O

onedaywhen

BruceM said:
I have heard
of instances where a company changed the format of an EmployeeID number or
some other number (by appending a 0, for instance, to allow for more IDs than
the current system allowed). Had EmployeeID been used as the primary key, or
as part of a combined PK, it could have been pretty messy.

Actually, having the true key value in each of the referencing tables
would make this quite a straightforward mapping exercise. You'd have to
suspend the constraints during the process, of course.

Jamie.

--
 
G

Guest

using an autonumber schema also helps if you ever need to know the "Natural
order" of the data.
 
O

onedaywhen

tw said:
using an autonumber schema also helps if you ever need to know the "Natural
order" of the data.

You point is valid but not really relevant to this discussion. Yes, you
can use an autonumber to give you a relative date/time entered order.
You can also use a DATETIME column that defaults to Date() with a CHECK
constraint to ensure it is always equal to Date(), which would give you
a 'natural' order in terms of insertions *and* amendments, and has the
advantage of retaining the actual date/time.

But using an autonumber in this way would not influence the choice of
primary key column(s). The autonumber does not have to be the PK for
you to be able to ORDER BY <autonumber column> when this 'natural'
order is required.

Choosing the autonumber column as the sole PK column would mean the
physical order on disk (Jet's equivalent of clustered indexes) would be
in date/time entered order, which is rarely desirable. For performance
reasons it's better to have your physical order on column(s) you use
most often in GROUP BY and ORDER BY clauses.

Your phrasing is pertinent: 'if you ever need to know' does not sound
like an operation performed often enough to justify it being the PK.

Jamie.

--
 
B

Brendan Reynolds

I would not recommend depending on an AutoNumber field for date-entered
ordering, as this will fail if the database is replicated, when the New
Values property of all AutoNumber fields is changed to Random. True, most
databases will never be replicated, but as there are simple alternatives,
why take the risk, even if it seems to be a small one?
 
O

onedaywhen

Brendan said:
I would not recommend depending on an AutoNumber field for date-entered
ordering, as this will fail if the database is replicated, when the New
Values property of all AutoNumber fields is changed to Random. True, most
databases will never be replicated, but as there are simple alternatives,
why take the risk, even if it seems to be a small one?

OK, I take it back: autonumber isn't good for date/time entered
ordering either. Another reason for not using autonumber, then.

Out of interest (my quest for knowledge continues <g>), if autonumber
cannot be used to record date/time entered order then what? Consider
the following table:

CREATE TABLE Test (
key_col INTEGER NOT NULL PRIMARY KEY,
data_col CURRENCY NOT NULL,
date_entered DATETIME DEFAULT Now() NOT NULL,
last_updated DATETIME DEFAULT Now() NOT NULL,
CONSTRAINT last_updated_must_be_current
CHECK (last_updated = Now())
);

Create a row:

INSERT INTO Test (key_col, data_col)
VALUES (1,9999);

Try to change the data:

UPDATE Test
SET data_col = 12345
WHERE key_col = 1;

This fails because my CHECK constraint (Validation rule), which ensures
the last_updated is always current, bites. So I must also change the
last_updated column for it to take:

UPDATE Test
SET data_col = 12345,
last_updated = Now()
WHERE key_col = 1;

My question is: is there a similar constraint I can apply to ensure the
date_entered is always the original date_entered?

The only way I can think of is to use permissions e.g. revoke
permissions on the table, create a VIEW to expose the table and
respective procedures to INSERT and UPDATE a row with owner access, the
procedures internally supplying date_entered and last_updated values as
appropriate, then grant read permissions on these objects.

Jamie.

--
 
B

Brendan Reynolds

In all situations that I have so far encountered, simply setting the default
value of the date-entered field to Now() has been sufficient. You are
correct of course that this could be circumvented. I've never encountered a
situation where that has been a practical problem, and therefore have not
given any thought to it. Your proposed solution looks promising, but I have
not tested it.
 
G

Guest

I'm not sure what you just said. What do you mean by "true key value"? What
is a mapping exercise. If I use employeeID as the PK I am not using a PK
that is likely to be included in OrderBy clauses, although I will group by
employeeID in order to have each employee's record on a separate page or
something like that. In that case I group by employee ID, and order by
FullName, which is a concatenated field in the report's underlying query. In
a company of about 70 employees there have been no duplicates in FullName,
but if there are I will order by something additional. My question to you is
what do you see as the advantage of updating the records if the PK value
changes in nature (as in the example of EmployeeID). Why is it not simpler
and cleaner to avoid that possibility? What would you use as PK in an
employee records database?
 
O

onedaywhen

BruceM said:
What do you mean by "true key value"?

I think I meant a 'natural' key. An autonumber is not a key. In your
brand new Employees table with its autonumber PK, insert a row for an
employee and you get ID=1. Then delete the row and insert the exact
same details again. Now you get ID=2. You have two different IDs that
refer to the same employee. Your ID column is not a key.
In
a company of about 70 employees there have been no duplicates in
FullName

You are aware full name is not likely to be unique, that's good.
What would you use as PK in an
employee records database?

In my country, the government likes to keep tabs on its subjects;
something to do with paying tax and qualifying for state benefits, I'm
told. It has a department that issues each individual with a 9
character unique identifier. If a new employee can't tell their ID, the
employer must construct a temporary number using date of birth and a
letter denoting sex and the tax department gets back to them fairly
promptly with the actual ID (AFAIK notifying the tax office of a new
employee is a legal requirement). When an employer takes on two people
within a short period with the same sex and birth date, I guess they
either ask one of them to try to find out their ID (usually found on an
old payslip) or phone to tax office to see if they can tell the actual
IDs. I understand in certain large countries in North America they have
something similar, known as a social security number (SSN).
What
is a mapping exercise

You proposed a theoretical scenario where a column (EmployeeID) needed
to be changed. Copying the data from the old to the new is what I'm
calling a mapping exercise i.e. you have to map (copy) the data from
each row in the old table into the corresponding row in the new table.
what
do you see as the advantage of updating the records if the PK value
changes in nature (as in the example of EmployeeID).

When doing data conversion/mapping exercises, you have to be sure your
actual results correspond to you expected results. I guess there is no
disadvantage to using an integer value to identify each row for these
purposes but I think I would be happier seeing something that was
obviously a key, rather than a simple row identifier.
Why is it not simpler
and cleaner to avoid that possibility?

Let me propose another theoretical scenario: something happens to your
employee table and your autonumber column is lost. You now have an
EarningsHistory table where each employee is identified using an
integer which has no meaning without the original values from your
Employees table. Consider the same EarningsHistory table where each row
had a unique identifier such as the aforementioned SSN, something
verifiable in reality e.g. phone the tax department and ask them which
employee has a certain SSN. Do you now see what I mean when I say I
feel 'happier' using a real key in the referencing tables?

Jamie.

--
 
G

Guest

onedaywhen said:
I think I meant a 'natural' key. An autonumber is not a key. In your
brand new Employees table with its autonumber PK, insert a row for an
employee and you get ID=1. Then delete the row and insert the exact
same details again. Now you get ID=2. You have two different IDs that
refer to the same employee. Your ID column is not a key.

I'm missing your point here. If I delete the record that contains
autonumber ID=1 then there is no record containing ID=1. This would be a
problem only if I had related records in other tables.
You are aware full name is not likely to be unique, that's good.

I was referring to FullName as an Order By field in a report. As a
concatenated field in a query it can't have a PK in any case. Combining
FirstName and LastName as a PK would be risking duplication.
You proposed a theoretical scenario where a column (EmployeeID) needed
to be changed. Copying the data from the old to the new is what I'm
calling a mapping exercise i.e. you have to map (copy) the data from
each row in the old table into the corresponding row in the new table.

Got it. I thought you meant something like that, but wasn't sure.
Let me propose another theoretical scenario: something happens to your
employee table and your autonumber column is lost. You now have an
EarningsHistory table where each employee is identified using an
integer which has no meaning without the original values from your
Employees table. Consider the same EarningsHistory table where each row
had a unique identifier such as the aforementioned SSN, something
verifiable in reality e.g. phone the tax department and ask them which
employee has a certain SSN. Do you now see what I mean when I say I
feel 'happier' using a real key in the referencing tables?

Yes, I see your point about being able to reconstruct the database if the
SSN field is lost. Although I don't quite see how a single field can be lost
(a field involved in a relationship can't be deleted until the relationship
is undone) I think I would prefer reconstructing the PK field if it came to
that. As long as I keep the records ordered by the autonumber field I could
add a new autonumber field, then reorder the related table by the FK field
and replace all occurrences of the lowest number with 1, of the second lowest
with 2, etc. (or something like that). This is assuming my backup also went
bad.

I'm not trying to be contentious here. I really do appreciate your taking
the time to reply and to explain in further detail. I have a feeling that
neither of us is likely to change the other's mind, but you do raise some
points worth considering.
 
O

onedaywhen

BruceM said:
I'm missing your point here. If I delete the record that contains
autonumber ID=1 then there is no record containing ID=1.

The point is SSN always identifies an entity in reality.
I was referring to FullName as an Order By field in a report. As a
concatenated field in a query it can't have a PK in any case. Combining
FirstName and LastName as a PK would be risking duplication.

what the autonumer (IDENTITY) is for, i.e. to eliminate the possibility
of duplicates, so make your compound PK be (LastName, FirstName, ID) in
that specific order.

I think you are placing the wrong significance on the meaning of
primary key. For example, consider this:

CREATE TABLE Workers (
ID INTEGER IDENTITY(1,1) NOT NULL UNIQUE,
last_name VARCHAR(35) NOT NULL,
first_name VARCHAR(35) NOT NULL,
PRIMARY KEY (last_name, first_name, ID)
)
;
CREATE TABLE Wages (
ID INTEGER NOT NULL
REFERENCES Workers (ID)
ON UPDATE CASCADE
ON DELETE CASCADE,
start_date DATETIME NOT NULL,
end_date DATETIME,
wage CURRENCY NOT NULL,
PRIMARY KEY (start_date, ID)
)
;

In other words, there is no rule (in either Jet or relational theory)
that says a FK in one table must use a PK from another; any key (a.k.a.
unique identifier) will do. In relational theory, a primary key has no
special powers. A PK has special meaning for Jet, though i.e. the
aforementioned physical ordering/clustered indexes.

If you are not using your PK in terms of its special meaning to Jet,
what are you using it for?

Jamie.

--
 
O

onedaywhen

BruceM said:
ID already makes it unique. That's good enough for me.

Let me give you a real life example I encountered just last week. I am
converting a database table with FirstName and LastName columns plus
the ubiquitous single column PK incrementing INTEGER autonumber column
named ID. The autonumber takes care of the duplicates so there could
happily be two 'Jean Dupont's in the database. However, a problem
arises when data arrives via an email and a procedure named (something
like) GetIDUsingName kicks in, which queries the database using
LastName = 'Dupont' and FirstName = 'Jean' and takes the first row from
the resulting recordset. No problem for me: the client's instruction is
to do a straight port, bugs and all and the autonumber PK actually
makes this bizarre function highly predictable in practice. But it
makes me wonder if that duplicate-breaking autonumber is actually a
solution for anything ...
What if I want to order the information in some other way. If there is
validity to your statements about physical order on the disk it seems to me
it would only apply to one way of looking at the data (sorted alphabetically
by name). What happens if I want to sort by age or by gender, or
whatever?

You should always use an ORDER BY. My point is, when the physical order
happens to coincide with the requested ORDER BY you will get a
performance gain. Obviously you can have only one physical order so you
should choose wisely e.g. the order you most frequent request in your
ORDER BY statements. By choosing your autonumber for the physical
order, you never get the performance benefit; looked at another way,
you are always taking a performance hit.

ORDER BY is not the prime example, though. GROUP BY and BETWEEN are
more significant. That's why I used that example of, 'grab me all the
phone numbers of people whose last name begins with A'. If the required
data rows are already next to each other, as it is in a paper copy
telephone directory, the performance advantage is clear (remember the
BETWEEN and GROUP BY are applied to the rowset early on, whereas ORDER
BY is only applied at the end of the process).

I know of only one way of choosing the physical order in my Jet table
and that is to use the PRIMARY KEY functionality and ensure the file is
regularly compacted. If I want a column that cannot be null and must
contain unique values, but would not be the best choice for the
physical order of the table, I can use NOT NULL with either a UNQIUE
CONSTRAINT or a UNIQUE INDEX.
I use a PK because it is the established method. I am interested in using
Access to meet my needs, and feel no compulsion to first reinvent the wheel.
PK works, so I use it. I don't care about physical order on the disk or
clustered indexes or anything of that sort. If I saw some objective
benchmark data showing me that my databases are experiencing a performance
hit because of my use of autonumber PKs I would rethink what I am
doing

But you are advising others to use autonumber PKs, thus propagating bad
advice.

I guess my methodology is to try to find the best way of doing
something. I can use logic to determine that a table in physical order
of its 'prime usage' will have a performance advantage so I will go
with this approach unless there is another issue e.g. doing so would
make my schema more difficult to maintain.

With the greatest respect, your approach seems to be more like, 'Well,
it works for me and doesn't run like a dog so I won't change my ways
until someone reputable tells me I should.'

But thank you, as ever, for listening.

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top