James said:
I have been reading many posts by users that claim to have years of
experience with Access that promote the use of Autonumber generated
values as primary keys. Don't get me wrong; I use Autonumber values as
primary keys for standalone tables, such as those that really do not
have a primary key unless you use multiple key fields. But I keep
asking myself why anyone would create an architecture of two related
tables upon which the relationship is based on an Autonumber key
field.
What you have to bear in mind is that there's a difference between a
table and an entity. Entities are in the logical data structure and
tables are in the physical data structure.
Whenever there's a difference between the logical and physical data
structures there is a risk of data corruption.
Joe Celko says that autonumbers don't exist in the logical data
structure, I disagree. I think that autonumbers appear in the LDS as
separate entities with a 1:1 relationship to another entity. The problem
is that that 1:1 relationship isn't inherent in the data, it has to be
enforced by the programmer. If that isn't done then you get situations
where, for instance, a customer's order is entered twice and gets two
different autonumber "Order Numbers."
For that reason it's safest to use a natural key if there is one
available. Adding an autonumber doesn't create a natural key of there
isn't already one in the real data. If there is one already then the
autonumber is probably redundant.
There are times when you need to denormalise the table for performance
reasons, and that may involve adding an autonumber. With current
database systems you are less likely to need this, storage is so cheap
that it's often better to duplicate key fields (even compound keys) than
to add an autonumber.
The most difficult situations arise when you deal with data about
people. There is no simple and easy to use natural key for people. Over
in comp.databases.theory I once suggested that the precise latitude,
longitude, altitude and time of birth would make a good natural key if
we could persuade people to record it.
I'm currently working on one right now where I'm converting an
"Orders" table that uses Autonumber to generate Order numbers. This
Autogenerated value is then stored in the OrderDetails records. If the
scenario that you describe below occurs, guess what? You've just lost
your relationship between the two fields, because you cannot enter
your own values in an Autonumber field, and your OrderDetails table is
useless because the reference to the Orders table is gone.
That's called an update anomaly, and it's expected when you work with
denormalised data. It's up to the programmers to build system where that
type of anomaly can never occur.
There's also an insert anomaly when the same order gets entered twice.
Again, it's up to the designers to build systems where that can't
happen.
So my advice is take a good long look at the logical data structure and
try very hard to find a natural key. Use that if you can, and only use
autonumbers if you are forced to. There are two reasons for doing that.
The first is performance, to avoid having to match long key fields or to
avoid storing long fields both as primary key and as foreign keys.
Modern systems mean that these cause less problems than they once did.
The seconds is that sometimes you don't have a natural key available,
such as when you are dealing with data about people. At these times you
just have to accept that your data is likely to have errors in it, and
there are limits to what you can do about it. You can choose to use
someone else's autonumber field (for instance a social security number)
or create your own. When you create your own then you need to put
procedures in place to make sure that there is always an exact 1:1
relationship between your autonumber field and the people it is supposed
to identify. You can't do that in software.