Normalisaton help

Douglas J. Steele · Jul 16, 2003

What about a table that stores

Document_Number
Word
Word_Offset

(with all 3 as a compound key)?

If you have that, you really don't need the "occurrences" table you
suggested, as you can calculate it from this new table.

David Hall · Jul 16, 2003

Hi all - not sure if this is appropriate place to seek help, but a Google
Groups search suggested it might be!

I remember doing a lot of this at university, but that was a good few years
ago now, and I haven't had any practical experience since I left. Reading
through my old course notes hasn't really helped, and I'm at a bit of an
impasse - if anyone can offer any suggestions, I'll be *very* grateful!

Scenario: I have a number of documents. I need to locate, and then record
the positions of particular words in each document. The data I'm working
with looks like this -

- Document_Number (an arbitrary, unique identifier)

- Document_Location (room/cabinet/shelf)

- Word (a given word appearing in the document)

- Word_Length (length of each identified word)

- Word_Frequency (number of times a given word appears in the document)

- Word_Offset (Reading from top-left to bottom-right, one word at a time -
ignoring punctuation - where a given word appears in the document e.g.
'today' might appear 3 times, as the 20th word, the 43rd word, and the 92nd
word).

I keep ending up with three tables that look like (have a feeling this will
wrap horribly...):

'documents'
Document_Number <--- Primary Key
Document_Location

'occurrences'
Document_Number <------ Compound Key (referencing 'Document_Number' in
'documents' table)
Word <------ Compound Key (referencing 'Word' in
'words' table)
Word_Frequency

'words'
Word <--- Primary Key
Word_Length

....but I've got no idea about how to deal with the 'Word_Offset's. Each word
might appear many times in a document, and therefore have many offsets, but
I just can't work where to put the offset values, and it's driving me nuts!
I've been trying to work it out mechanically, by following the rules (for
1st, 2nd and 3rd NF) and not really thinking about what the table structures
I've got out of it are implying about the data, so I'm not even sure that
what I've got makes any sense. Just to make the point - each document may
contain many words. Each word may appear in many documents, and many times
in each document.

Any help very much appreciated - sorry for the long post folks

Kind regards

David

Travis Stine · Jul 17, 2003

Hi David,
You essentially have two collections. Documents and
Words. In addition, you want to derive information about
where the two collections join. These, you have called
occurences. You Word_Offsets are a property of your
occurences descripbing exactly where the occurance
happened, thus it should go in the occurances table which
of course is a many to many table between Words and
Documents. Hope this helps.

-----Original Message-----
Hi all - not sure if this is appropriate place to seek help, but a Google
Groups search suggested it might be!

I remember doing a lot of this at university, but that was a good few years
ago now, and I haven't had any practical experience since I left. Reading
through my old course notes hasn't really helped, and I'm at a bit of an
impasse - if anyone can offer any suggestions, I'll be *very* grateful!

Scenario: I have a number of documents. I need to locate, and then record
the positions of particular words in each document. The data I'm working
with looks like this -

- Document_Number (an arbitrary, unique identifier)

- Document_Location (room/cabinet/shelf)

- Word (a given word appearing in the document)

- Word_Length (length of each identified word)

- Word_Frequency (number of times a given word appears in the document)

- Word_Offset (Reading from top-left to bottom-right, one word at a time -
ignoring punctuation - where a given word appears in the document e.g.
'today' might appear 3 times, as the 20th word, the 43rd word, and the 92nd
word).

I keep ending up with three tables that look like (have a feeling this will
wrap horribly...):

'documents'
Document_Number <--- Primary Key
Document_Location

'occurrences'
Document_Number <------ Compound Key

(referencing 'Document_Number' in

David Hall · Jul 17, 2003

Thank you very much to both of you for your help - I'll go away and have
another go at it in the context of what each of you has said (haven't had a
chance to properly digest the details yet!). Much appreciated

Kind regards

David

Normalisaton help

Douglas J. Steele

David Hall

Travis Stine

David Hall