Lars:
Its true that a purpose of normalization is to eliminate redundancy, but
that
in itself is not the real objective. Redundancy allows for inconsistent
data
or update anomalies, so by its elimination the integrity of the data is
protected, which is the important thing.
An example of lack of proper normalization can be found in the sample
Northwind database's Customers table. You'll see that this has City,
Region
and Country columns so we are told numerous times that São Paulo is in SP
region (as is Resende) and that SP region is in Brazil. Not only does
this
require repetitive data entry, but more importantly it opens up the risk
of
inconsistent data, e.g. it would be perfectly possible to put São Paulo in
California in one row and California in Ireland! Proper normalization
would
prevent this as the fact that São Paulo is in SP region would be stored
only
once in the database as would the fact that SP region is in Brazil and
that
California is in the USA.
An example of what at first sight might seem to be redundancy, but in fact
is
not, can also be found in Northwind. The Products table and the
OrderDetails
table both have UnitPrice columns. It might be thought that the unit
price
of a product could always be looked up from the Products table, so its
unnecessary in Order Details. However, the unit price of a product will
change over time, but each order needs to retain the price in force at the
time the order was created. Consequently a UnitPrice column is needed in
both tables; that in products holds the current price and is used to get
the
value for that in Order Details (code in the ProductID control's
AfterUpdate
event procedure in the Order Details Subform does this), which then
remains
static when the current price (in products) changes. In each case
UnitPrice
is functionally dependent solely on the whole of the key of the table, so
there is no redundancy.
Its often said that having separate columns to represent different values
of
an attribute, e.g. if we were to have separate columns British and Irish
to
cater for hybrids like me (an absurd example, I know, but it serves to
illustrate the point), is an example of lack of normalization, but its
really more a case of a breach of Codd's Rule 1, The Information Rule,
which
as restated by Date in a slightly expanded form is:
'The entire information content of the database is represented in one and
only one way, namely as explicit values in column positions in rows in
tables'.
C J Date - Introduction to Database Systems; 7th Edition; 2000
Having a separate column for each nationality is what is known as (I think
these are Date's word's also) 'encoding data as column headings', i.e. it's
the names of the column which are the data values, 'British' and 'Irish'
rather than the data being stored as values of type Nationality in
separate
rows as Codd's rule requires.
It could be said that having the multiple columns means that the table is
not
in First Normal Form:
'A relvar is in 1NF if and only if, in every legal value of that relvar,
every tuple contains exactly one value for each attribute.'
But this requires that we regard the attribute as being Nationality, but
no
such column exists of course, so while the row (loosely speaking the
equivalent of a tuple) does contain more than one value of attribute
Nationality, can we say this is the case when no such column exists? This
is
why I feel that its not really a normalization issue, but a breach at the
more fundamental level of the Information Rule. Its an interesting
philosophical point, but however we regard it, I think we can conclude
that
its not a correct approach in terms of the database relational model.
Ken Sheridan
Stafford, England
Lars said:
Thanks Ken.
A 'surrogate' numeric key is only required if
the values in a column in a table are not distinct, e.g. a Cities table
needs
a CityID primary key as city names can be legitimately duplicated
I thought by using a surrogate numeric key in stead of the text itself,
you'd save database space.
Also, I though the second rule of normalization was to eliminate redundant
data, but I just reread and that's only applicable when the ID-number is
part of a multivalued key.
Nationality was a bad example, but you got my point
Lars
[quoted text clipped - 5 lines]
many relationship (one modelling the relationship by resolving it into
two
one-to-many relationships).
A 'surrogate' numeric key is only required if
the values in a column in a table are not distinct, e.g. a Cities table
[quoted text clipped - 67 lines]