alia:
The principle issue here is that the structure of the .csv file in which the
assay results are being presented is much more akin to a spreadsheet than a
database table. Rather than having a separate column for each result
category these would, in a relational database, be stored as separate rows in
a related table. However, it's not difficult to take the data from the .csv
file and recast it in a format suitable for a database table. Before coming
to that, though, it looks to me like your existing database might be in need
of a little remodelling.
The design of tables in a relational database is governed by a process known
as normalization. I won't go into the details of this (The Wikipedia article
on the subject is pretty good), but essentially normalization is a set of
rules (normal forms) which ensure the elimination of any redundancies which
can leave the database open to inconsistent data. The end result is a set of
related table, each of which models an entity type and whose tables model the
attributes of each entity type without any redundancy. This is achieved by a
process of 'decomposition' of tables, breaking them down into separate tables
so that the attributes (columns) of each contain no redundant information.
Assuming a location might have more than one sample taken a table of the
structure you cite:
location ID | northing | easting | region name | sample number |
contains redundancies because for each sample per location we are told the
easting, northing and region of the location. This allows for
inconsistencies as there is nothing to prevent different value of one or more
of these attributes being entered in separate rows for the same location.
Separate Locations, Regions and Samples tables are needed, e.g.
Regions
....Region
Locations
....LocationID
....LocationName
....Region
Samples
....SampleNumber
....SampleDate
....LocationID
Region in Locations is a foreign key referencing the primary key of Regions,
and LocationID in samples is a foreign key referencing the primary key of
Samples. The tables might well have other non-key columns of course, but the
important thing is that each of these must be a specific attribute of the
entity type which the table models. In the language of the relational model
each must be 'functionally dependent' solely on the whole of the primary key
of the table.
Turning to the assay results a suitable table for this would be:
AssayResults
....SampleNumber
....ResultCategory
....Result
The primary key of this table would be a composite one made up of the two
columns SampleNumber and ResultCategory, each of which are foreign key
columns, the former referencing the primary key of Samples, the latter the
primary key of a ResultCategories table:
ResultCategories
....Resultcategory
This table would have one row for each type of assay result, so the values
might be Au_ppm, Pt_ppm etc.
When it comes to importing the results data from the .csv file you'd link to
the file and use a set of 'append' queries to insert rows into AssayResults,
with a separate query per result category, so for gold you'd use:
INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Au_ppm", Au_ppm
FROM [TheLinkedCSVFile]
WHERE Au_ppm IS NOT NULL;
The "Au_ppm" in quotes is a constant which inserts the text value 'Au_ppm'
into the ResultCategory column, the Au_ppm without quotes is the column in
the linked file which contains the result for gold ppm. You might have a
similar append query for platinum for instance:
INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Pt_ppm", Au_ppm
FROM [TheLinkedCSVFile]
WHERE Pt_ppm IS NOT NULL;
Whenever you receive a .csv file with new assay results it simply a case of
linking to the .csv file and executing the set of append queries, which can
easily be automated so that they can all be run at a single click of a button
on a form. Even if the same queries were accidentally executed more than
once for the same samples no harm would be done as the violation of the
composite primary key of assay results would prevent the same row being
inserted more than once.
When it comes to making the data available to MapInfo I have no experience of
that particular product. In my own work with environmental data of a broadly
similar structure to yours we used ArcInfo as the GIS. However, given a set
of correctly normalized tables as outlined above it should a simple task to
create a query to return the data in a format compatible with MapInfo's
requirements.
Ken Sheridan
Stafford, England
hi Jeff,
Thanks for answering. Sorry that wasn't clear. I also got some more
information from the geologist who is going to be using this db, so here it
is.
The lab assay .csv will have a column for the sample number and the assay
results:
sample number | Au_ppm | result2 | result3 | ...etc.
I want to bring in the relevant result (in this case, gold in parts per
million) into my database, which contains columns like:
location ID | northing | easting | region name | sample number |
So I want to link up the two SampleNumber fields and bring in the relevant
Au_ppm information (and not every location ID will have a sample taken). The
goal is to be able to export this information into another .csv to display
the information in MapInfo.
I tried this with some made-up data and the problem that I had was that when
I tried to append the data to the table, I ended up with duplicates in the
SampleNumber field. So I'm not sure how to fix that. And, more generally, I'm
not sure what the most appropriate way to do this will be - link the table,
import the data, or append a copy. We're going to have a lot of these assay
results so there will be a lot of data to handle.
Hope this helps.
thanks,
alia
To connect a sample to a location, you need to have a way to, well, connect
the sample to the location...
[quoted text clipped - 14 lines]
Jeff Boyce
Microsoft Access MVP
--
Message posted via AccessMonster.com
.