this might be about linking tables

  • Thread starter Thread starter alia
  • Start date Start date
A

alia

At work we are going to be setting up a MS Access database for some
geological data. We are going to be getting some other data in .csv format
that we would like to add to this database.

What we have is a collection of location ID's and corresponding sample
numbers. Not every location ID will have a sample taken from it. We will be
getting assay results in from the lab on each sample taken, and I want to
link the assay sample results (the .csv files) to the appropriate sample IDs
in the database.

How do I do this? Is it as easy as making a table for the .csv data and
linking that to a table in the DB? My Access knowledge is very rusty, so I
apologize in advance if this is the most basic question ever, or if there's
not enough information - just ask and I will try to provide more. :)

thanks and cheers,
alia
 
To connect a sample to a location, you need to have a way to, well, connect
the sample to the location...

Does the .csv sample data include a locationID, or LocationName, or some
such?

Do you have a list of all possible Locations? ?with IDs?

Can you have more than one assay result per sample? How do you connect an
assay result to a sample ... do you have a SampleID (i.e., a unique
identifier for each sample)?

More info, please...

Regards

Jeff Boyce
Microsoft Access MVP

--
Disclaimer: This author may have received products and services mentioned
in this post. Mention and/or description of a product or service herein
does not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.
 
hi Jeff,

Thanks for answering. Sorry that wasn't clear. I also got some more
information from the geologist who is going to be using this db, so here it
is.

The lab assay .csv will have a column for the sample number and the assay
results:

sample number | Au_ppm | result2 | result3 | ...etc.

I want to bring in the relevant result (in this case, gold in parts per
million) into my database, which contains columns like:

location ID | northing | easting | region name | sample number |

So I want to link up the two SampleNumber fields and bring in the relevant
Au_ppm information (and not every location ID will have a sample taken). The
goal is to be able to export this information into another .csv to display
the information in MapInfo.

I tried this with some made-up data and the problem that I had was that when
I tried to append the data to the table, I ended up with duplicates in the
SampleNumber field. So I'm not sure how to fix that. And, more generally, I'm
not sure what the most appropriate way to do this will be - link the table,
import the data, or append a copy. We're going to have a lot of these assay
results so there will be a lot of data to handle.

Hope this helps.

thanks,
alia
 
I'm not sure I'm correctly visualizing your data structure yet...

To get the best use of Access' relationally-oriented features/functions, you
need to 'feed' it well-normalized data.

When you described your database, the example you gave showed a LocationID
in the same record as a SampleNumber, along with several other fields that
appear to be information about the location. If this is an accurate
paraphrase of your data structure, you may need to brush up on
"normalization". It may be because I don't understand your situation, but
my first impression is that a "location" table would only have information
about locations, not about samples.

Let me try something here -- the following is just a wild guess, so feel
free to correct it ...

You have locations. Many of them.

Each of your locations can have multiple samples. ... or just one, or none!

Each of your samples is assayed (but only once). (... but I can't tell
what's involved in an assay -- a single thing being measured or multiple
things being measured?)

If these are accurate statements, then a well-normalized table structure in
Access might look something like (untested):

tblLocation
LocationID
LocationName
LocationCoordinates
... any other fields that are ONLY about Location
(... and I don't understand how "regions" relate to "locations")

tblSample
SampleID
SampleDate
Sampler
... any other fields that are ONLY about samples

trelLocationSample
LocationSampleID
LocationID (a foreign key, pointing back to the location table)
SampleID (a foreign key, pointing back to the sample table)

(... and because I don't know enough about assays, I don't know how they
relate to the above)

Good luck!

Regards

Jeff Boyce
Microsoft Access MVP

--
Disclaimer: This author may have received products and services mentioned
in this post. Mention and/or description of a product or service herein
does not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.
 
hi Jeff,

Thanks for your detailed response! Obviously I was more out of my depth than
I thought. :) I think I have a lot of reading to do tonight and tomorrow to
digest yours and Ken's responses, and I will see if your instructions help me
solve my problem, or if I have to come back with more clarifications.

cheers,
alia
 
hi Ken,

Thanks so much for taking the time to respond. I'm going to have to go over
your response carefully and do a lot of background reading I think. :) I'm
going to go over your instructions, and if they help me figure everything
out, I will let you know - otherwise if I still have questions I will come
back with some more clarifications.

thanks again and cheers,
alia

KenSheridan via AccessMonster.com said:
alia:

The principle issue here is that the structure of the .csv file in which the
assay results are being presented is much more akin to a spreadsheet than a
database table. Rather than having a separate column for each result
category these would, in a relational database, be stored as separate rows in
a related table. However, it's not difficult to take the data from the .csv
file and recast it in a format suitable for a database table. Before coming
to that, though, it looks to me like your existing database might be in need
of a little remodelling.

The design of tables in a relational database is governed by a process known
as normalization. I won't go into the details of this (The Wikipedia article
on the subject is pretty good), but essentially normalization is a set of
rules (normal forms) which ensure the elimination of any redundancies which
can leave the database open to inconsistent data. The end result is a set of
related table, each of which models an entity type and whose tables model the
attributes of each entity type without any redundancy. This is achieved by a
process of 'decomposition' of tables, breaking them down into separate tables
so that the attributes (columns) of each contain no redundant information.
Assuming a location might have more than one sample taken a table of the
structure you cite:

location ID | northing | easting | region name | sample number |

contains redundancies because for each sample per location we are told the
easting, northing and region of the location. This allows for
inconsistencies as there is nothing to prevent different value of one or more
of these attributes being entered in separate rows for the same location.
Separate Locations, Regions and Samples tables are needed, e.g.

Regions
....Region

Locations
....LocationID
....LocationName
....Region

Samples
....SampleNumber
....SampleDate
....LocationID

Region in Locations is a foreign key referencing the primary key of Regions,
and LocationID in samples is a foreign key referencing the primary key of
Samples. The tables might well have other non-key columns of course, but the
important thing is that each of these must be a specific attribute of the
entity type which the table models. In the language of the relational model
each must be 'functionally dependent' solely on the whole of the primary key
of the table.

Turning to the assay results a suitable table for this would be:

AssayResults
....SampleNumber
....ResultCategory
....Result

The primary key of this table would be a composite one made up of the two
columns SampleNumber and ResultCategory, each of which are foreign key
columns, the former referencing the primary key of Samples, the latter the
primary key of a ResultCategories table:

ResultCategories
....Resultcategory

This table would have one row for each type of assay result, so the values
might be Au_ppm, Pt_ppm etc.

When it comes to importing the results data from the .csv file you'd link to
the file and use a set of 'append' queries to insert rows into AssayResults,
with a separate query per result category, so for gold you'd use:

INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Au_ppm", Au_ppm
FROM [TheLinkedCSVFile]
WHERE Au_ppm IS NOT NULL;

The "Au_ppm" in quotes is a constant which inserts the text value 'Au_ppm'
into the ResultCategory column, the Au_ppm without quotes is the column in
the linked file which contains the result for gold ppm. You might have a
similar append query for platinum for instance:

INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Pt_ppm", Au_ppm
FROM [TheLinkedCSVFile]
WHERE Pt_ppm IS NOT NULL;

Whenever you receive a .csv file with new assay results it simply a case of
linking to the .csv file and executing the set of append queries, which can
easily be automated so that they can all be run at a single click of a button
on a form. Even if the same queries were accidentally executed more than
once for the same samples no harm would be done as the violation of the
composite primary key of assay results would prevent the same row being
inserted more than once.

When it comes to making the data available to MapInfo I have no experience of
that particular product. In my own work with environmental data of a broadly
similar structure to yours we used ArcInfo as the GIS. However, given a set
of correctly normalized tables as outlined above it should a simple task to
create a query to return the data in a format compatible with MapInfo's
requirements.

Ken Sheridan
Stafford, England
hi Jeff,

Thanks for answering. Sorry that wasn't clear. I also got some more
information from the geologist who is going to be using this db, so here it
is.

The lab assay .csv will have a column for the sample number and the assay
results:

sample number | Au_ppm | result2 | result3 | ...etc.

I want to bring in the relevant result (in this case, gold in parts per
million) into my database, which contains columns like:

location ID | northing | easting | region name | sample number |

So I want to link up the two SampleNumber fields and bring in the relevant
Au_ppm information (and not every location ID will have a sample taken). The
goal is to be able to export this information into another .csv to display
the information in MapInfo.

I tried this with some made-up data and the problem that I had was that when
I tried to append the data to the table, I ended up with duplicates in the
SampleNumber field. So I'm not sure how to fix that. And, more generally, I'm
not sure what the most appropriate way to do this will be - link the table,
import the data, or append a copy. We're going to have a lot of these assay
results so there will be a lot of data to handle.

Hope this helps.

thanks,
alia
To connect a sample to a location, you need to have a way to, well, connect
the sample to the location...
[quoted text clipped - 14 lines]
Jeff Boyce
Microsoft Access MVP

--
Message posted via AccessMonster.com


.
 
hello Ken,

I did some reading through the Wiki entry on normalization and it helped
clarify things for me, so I've taken a stab at reorganizing the db I was
working on based on that and your comments.

I have separate tables for Location, Sample, and AssayResult. The 'region'
column refers to the named area a location is found in, and I think isn't
really necessary for the Access side of things. (I'm not sure it's even
relevant in MapInfo, because the points all have UTM coordinates that could
be pulled out with a query that would pick out point locations inside
specified polygon regions. So I may just remove that information all
together.)

The 'append' process was the part I was especially wondering about, as far
as getting the data in to the db, so thank you for that as well.

:

[snip]
Assuming a location might have more than one sample taken a table of the
structure you cite:

location ID | northing | easting | region name | sample number |

contains redundancies because for each sample per location we are told the
easting, northing and region of the location. This allows for
inconsistencies as there is nothing to prevent different value of one or more
of these attributes being entered in separate rows for the same location.
Separate Locations, Regions and Samples tables are needed, e.g.

I checked into this and I've been told that there will only ever be one
sample per location, because if the geos are going to take a new rock sample
they will take a new GPS reading for the new location. Does that change any
of your suggestions?

[snip]
Turning to the assay results a suitable table for this would be:

AssayResults
....SampleNumber
....ResultCategory
....Result

The primary key of this table would be a composite one made up of the two
columns SampleNumber and ResultCategory, each of which are foreign key
columns, the former referencing the primary key of Samples, the latter the
primary key of a ResultCategories table:

ResultCategories
....Resultcategory

This table would have one row for each type of assay result, so the values
might be Au_ppm, Pt_ppm etc.

When it comes to importing the results data from the .csv file you'd link to
the file and use a set of 'append' queries to insert rows into AssayResults,
with a separate query per result category, so for gold you'd use:

INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Au_ppm", Au_ppm
FROM [TheLinkedCSVFile]
WHERE Au_ppm IS NOT NULL;

The "Au_ppm" in quotes is a constant which inserts the text value 'Au_ppm'
into the ResultCategory column, the Au_ppm without quotes is the column in
the linked file which contains the result for gold ppm. You might have a
similar append query for platinum for instance:

INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Pt_ppm", Au_ppm
FROM [TheLinkedCSVFile]
WHERE Pt_ppm IS NOT NULL;

If I wanted to bring in all the result columns for a particular assay, would
I have to set an individual append query like the above up for each column,
like the following:

INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Column_name", Column_name
FROM [TheLinkedCSVFile]
WHERE Column_name IS NOT NULL;

or is there a way to bring in all the assay data for each sample?
Whenever you receive a .csv file with new assay results it simply a case of
linking to the .csv file and executing the set of append queries, which can
easily be automated so that they can all be run at a single click of a button
on a form. Even if the same queries were accidentally executed more than
once for the same samples no harm would be done as the violation of the
composite primary key of assay results would prevent the same row being
inserted more than once.

When it comes to making the data available to MapInfo I have no experience of
that particular product. In my own work with environmental data of a broadly
similar structure to yours we used ArcInfo as the GIS. However, given a set
of correctly normalized tables as outlined above it should a simple task to
create a query to return the data in a format compatible with MapInfo's
requirements.

So it would be a matter of creating a query that would in effect combine all
the tables I broke out into separate entities into one query result, and then
exporting that result into a .csv (MapInfo has no trouble with .csv or .xls
or any of that sort of data, which is nice).

I have a couple of screenshots of how I've set things up so far; if they
would be more helpful in seeing what I'm working with, I can post the links.
 
Back
Top