Rick said:
there are two different philosophies at issue here.
Surrogate numeric keys (AutoNumber or otherwise) do solve a lot of technical
problems for database and database application developers and many seasoned
professionals who know what they are talking about use them and recommend
them. However; they are not part of proper Relational Database Design
Theory.
If you are discussing the construction of "proper" relational databases then
surrogates will be argued against by the experts almost without exception.
If you are discussing the practical pitfalls and solutions of building
working databases then you will find a lot of people who make their living
doing such things arguing for the use of surrogate keys.
These really are two different discussions, but often the boundaries get
blurred during exchanges in these groups. I for one have no problem with
the judicious use of surrogate keys and often use them myself. However; I
avoid making statements like "All Primary Keys should be numeric surrogates"
because their use is simply not warranted in every case and my preferences
are not applicable in every case. I will point out their advantages when
discussing databases with other developers, but I would not fault the
decision of another developer not to use them.
A key uniquely identifies data and the unit of work in SQL is a row
(record). 'Primary keys' are a subset of 'keys'. Disregard the
'primary' distinction for a moment.
I recognise *three* uses for autonumber.
1) As a so-called surrogate, where the natural key is being constrained
and the autonumber is used to 'associate' related tables. The main
advantage is that a single column integer is efficient and easier to
work with. IMO most people take this approach out of habit rather than
to tackle a real life problem. FWIW the 'theorists' are split: Codd
suggests a true surrogate should not be visible, à la an index; Date
seems to be less strict e.g. might be visible to a DBA. I'm not a
theorist and I don't really have a problem with such usage. The problem
really is the physical implementation: from a SQL point of view, there
is no requirement for a referencing (child) table to physically store
the referenced key value (in some SQL products there they are
physically one and the same) but in Jet physically repeated values
between tables are a reality and sometimes physical considerations
(e.g. performance) must be addressed. Natural key + autonumber is a
fair compromise because you have a 'relational' key (the natural key)
plus a physical connector (autonumber).
2) As a so-called artificial key where no natural key exists i.e. the
composite of all the attributes being modelled could be duplicated for
distinct entities and modelling more attributes is not an option. You
will have to accept that you will never be able to associate entities
(e.g. if you want to collect all posts by me - and why wouldn't you
<g>? - then you're our of luck because lots of people post via google,
handles can be duplicated, email addresses can be reassigned, sig lines
can be faked, personal style can be mimicked, etc). Your database
becomes the trusted source. The problem here is that the autonumber
must be exposed to users and logically autonumbers aren't the best for
real word situations: people hate typing GUIDs (if they can remember
one in the first place <g>); random integers are easily mistyped (hint:
a check digit, à la ISBN, helps here); with sequential integers people
get bothered by missing values (e.g. auditors). Autonumbers also have
physical problems: a sequence must be generated on the same machine,
that's why your incrementing autonumber are changed to when using
replication, but even random numbers can produce clashes. And consider
that autonumber values can be assigned explicitly i.e. using INSERT
INTO syntax. On the other hand, designing a key and maintaining its
values takes time and effort and Access is all about RAD so I can see
the appeal (but I maintain my right to level charges of unimaginative
design and contempt for users).
3) As a so-called uniquifier. Designed to ensnare the clueless
(subjective, I know, but I can't think of a better way of putting it).
You know the message: "Although a primary key isn't required, it's
highly recommended. A table must have a primary key for you to define a
relationship between this table and other tables in the database. Do
you want [Access] to create a primary key now?" Lies, all lies: a)
logically, a table *does* require a key, otherwise it's a 'heap'; b) a
table does not need a primary key or other key to define a relationship
(it needs a key to 'enforce referential integrity' but it doesn't need
to be the primary key), rather it merely needs a compatible data type
in anther table; c) if you choose 'Yes', Access doesn't create a key,
it merely creates an autonumber to 'uniquify' you data. A key should
prevent duplicates. Think about it: an autonumber doesn't *prevent*
duplicates, rather it *enables* duplicates. Try it: create a single
column of type 'number' and create several rows with the same value.
say, 1. Now, are those 1s the same or are they all different? The
autonumber is there to tell you they are all different but do you
believe it? In case it's not clear, it's this third flavour of
autonumber I have a problem with.
Back to 'primary key'. The term can be traced back to an early mistake
made by Ted Codd. You have number of 'candidate key' and you pick one
to be 'primary key'. He later realised that all keys are equal but by
then it was too late. Relational theory has moved on but SQL took
PRIMARY KEY (SQL keywords in uppercase) and now has to deal with it as
a legacy issue.
First, you don't *need* PRIMARY KEY. If you make your existing primary
key columns NOT NULL and constrained them with UNIQUE you would suffer
no loss of data integrity because a NOT NULL UNIQUE key is a key and
all keys are equal. The NOT NULL property is required for equivalence
because a PRIMARY KEY cannot comprise a NULL value.
Second, while the idea of PRIMARY KEY is the particular SQL
implementation - each SQL product e.g. Jet - would give special meaning
to the PRIMARY KEY. This is why everyone says that a SQL table should
have a primary key: if it didn't have one at the very worst you would
be missing out on the special benefits that PRIMARY KEY has to offer.
David W Fenton (downthread) is correct: these special benefits vary
from product to product. For Access/Jet, they include:
1) Clustering (physical ordering) on disk on compact;
2) Appearing as bold text in the Relationships diagram;
3) In absence of explicit definition, it will be used when the table is
referenced in a relationship or FOREIGN KEY (the letter behaviour is
defined in the SQL-92 standard);
4) Prevents you from being nagged, "Although a primary key isn't
required..." <g>.
Jamie.
--