normalization concern

John · Apr 19, 2010

Hi

I created a table of place names. Each place name has a standard name and a

number of variants of that name. For exampl

place_id place standard_nam
1 Seecon
2 Seekon
3 Ceekon
4 Barna
5 Bearna

For example, Seecon is the correct form of a place name, but, the variants

Ceekon and Seekon sometimes occur in the literature

I have put all place names, whether standard or variant, in one field, so

that I can search them in one go

To retrieve each place name and its standard name, I run the following

quer

SELECT places_a.place AS place_name, places_b.place AS correct_nam
FROM places AS places_a, places AS places_
WHERE places_a.standard_name=places_b.place_i

I retrieve the following results

place_name correct_nam
Seecon Seeco
Seekon Seeco
Ceekon Seeco
Barna Barn
Bearna Barn

This is what I want

I have since had to add a description

I added a description_id to the places table as a foreign key and added a

description table.

Description_id Descriptio

1 A rocky island in the Atlantic

Place_id place standard_name description_i

1 Seecon 1

2 Seekon 1

3 Ceekon 1

4 Barna 4

5 Bearna 4

This structure doesn't feel normalized. Should I be putting the variants in

their own table? Should I put the standard_name in the description table. I

have an unease about the structure and I am not sure why

Any help would be much appreciated

Daryl S · Apr 19, 2010

John -

You are right to question the structure. You can keep your original
'places' table to look up the correct standard_name. You should have a
StandardName table that contains the standard_name as the unique primary key,
and the description field can go in this table. (You could change the
standard_name field to an autonumber in the StandardName table, but you would
also need to map the correct values back to the places table. Not really
necessary, but it could save you issues with adding new records to the two
tables.)

Jerry Whittle · Apr 19, 2010

I think that this self-join on one table is a good option. The only problem
would be if more than one place has the same name. You probably want to
include the place_id (I'm assuming that it's the primary key) in queries to
show that you are talking about more than one 'Springfield'.

Paul Shapiro · Apr 20, 2010

I have done similar things with two tables, because the Place and it's
multiple names are two separate entities. The data about the Place should
not be repeated for each name.

Place {placeID, standardName, description, ...} This is the actual place,
with any data about that place.

PlaceName {placeID, placeName} This is the list of ALL names for the place,
including the standardName that was included in the Place table. This feels
a little questionable for normalization, but I prefer having the standard
name in the Place table, rather than having to retrieve it from the
placeName table. It is truly a piece of data about the Place, so the only
problem is including it a second time in PlaceName. One could use
PlaceAlternateName as the child table, and then omit the standard name from
that table, but then you have to do a union query every time you want a list
of all names.

If however you wanted the place names in many languages, then you would also
want the standard name in each language. Then PlaceName would be the only
place to include names, unless you renamed Place.standardName to
Place.standardEnglishName or whatever language you chose as primary.
PlaceName would need a languageCode column and also a boolean
isStandardName. Validation would have to ensure that for each place-language
there was one and only one standard name.

david · Apr 21, 2010

It's not normallised if some of the values in the
column are "standard name" and some are not.

You will get an update anomoly if you decide
that one of the "standard name" is really only
a "place name", and the "standard name" is
something different.

To do that kind of update you can change the
place name associated with the the place id, and
add a new place id with the old place name,
so all the pointers to "standard name" remain
correct, but any data associated with the old
"place name" is incorrect,

--or create a new place id for the new standard
name, so standard name relationships are wrong,
but place name relationships are correct.

Of course it's still a judgement call if you want
Normalization or not.

You can increase the normalization by adding a field
which identifies which row is a standard name, but
that is probably not an increase in normalisation which
will help you (or you would have done so already)

If you decide that further normalisation is a good idea,
you can add a new "standard name" table if standard
names are not place names, or a new relationship table
if standard names are place names. Either way, the
new system will have mostly theoretical benefits as
long as your data remains simple.

Taking my earlier example, if you have a relationship
table, you will be able to change the "standard name"
without messing with your "place name" cleanly by
adding a new name to the name table, and changing
a single row in the relationship table, but only at the
cost of maintaining the new table. Which in a simple
case may still be more work than maintaining an unnormalized system

(david)

normalization concern

John

Daryl S

Jerry Whittle

Paul Shapiro

david