Crosstab query question

  • Thread starter Thread starter Dale Peart
  • Start date Start date
D

Dale Peart

I posted a question last week but my example was less complicated than
reality so I'm posting again. Here is the situation:

I have a table that looks (in effect) like this:

Sample Lab Analyte Result Dilution Method
S1 L1 Al 120 1 M1
S1 L1 Al 118 1 M2
S1 L2 Al 122 1 M1
S1 L3 Al 119 2 M3
S1 L1 Cu 3.5 1 M1
S1 L1 Cu 3.3 1 M2
S1 L2 Cu 3.6 1 M1
S1 L3 Cu 3.8 1 M2
S2 L1 Al 135...
S2 L2 Al 140
S2 L3 Al 138
S2 L1 Cu 4.3
S2 L2 Cu 4.5
S3 L1 Al 89
S3 L3 Al 92
....

And I would like to transpose this to a table that looks like this:
Sample Lab Al AlDilution AlMethod Cu CuDilution CuMethod
S1 L1 120 1 M1 3.5 1
M1
S1 L1 118 1 M2 3.3 1
M2
S1 L2 122 1 M1 3.6 1
M1
S1 L3 119 2 M3 3.8 1
M2
S2 L1 135 4.3
S2 L2 140 4.5
S2 L3 138
S3 L1 89
S3 L3 92

Can this be done without a macro? If so how? There are many more element
columns in the real table but I think this should do for an example.

Thanks,

Dale
 
Dear Dale:

I thought at first what you want is based on each combination of
Sample, Lab, and Method. But in the fourth row of the results, you
have associated the Al Method 'M3' with the Cu Method 'M2'. So now I
cannot see how you are associating the rows, except that they are
paired according to the order they are "in the table."

This makes it a nasty problem, because the rows in the table are NOT
in any order, except as you choose to sort them on the data.
Specifically, there is no guarantee they will keep any physical order
over time. You MUST provide an order to them logically, based on the
values in the columns.

Looking at your rows 3-4 and 7-8, how do you propose to cause the
query to know that 3 associates with 7 and 4 with 8?

It looks to me that this is the real problem, one to do with an
insufficiency in the data recorded. Why does it not associate 3 with
8 and 4 with 7?

There are two approaches that occur to me.

Create a query that shows only Al, and another that shows only Cu.
Make a full outer join (more on that later if you use Jet) between
them on Sample and Lab. That does it, except that you will get a
cross product of the 3-4 / 7-8 sets, being 37, 38, 47, and 48
combinations. That points out the insufficiency of the data. There's
no way to filter out the 38 and 47 combinations.

Another way does this in a single query, with subqueries, but that's
not going to help the overall problem.

I don't see this as a crosstab at all. Your results do not create new
column names out of data.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Tom,

Thanks for your response. I think your confusion is in the chemistry (and
perhaps my incomplete table.) The various elements can be analyzed in
different ways and are not necessarily tied to any one Method. For example:
M1 is a method that can analyze for multiple elements simultaneously as is
M2 and perhaps M3. So when Lab 3 (L3) used M3 to get it's Al value and M2
to get it's Cu value that doesn't conflict with anything. All I want to do
is make columns out of the different Analyte data and keep whatever Sample,
Lab, Dilution, and Method go with that particular Result value. There is no
intention or any way to know that line 1 in the original table actually goes
with line 5 in the original table even though when the crosstab is finished
they end up on the same line. I just want the Query to take the first Al
value for L1, & S1, keep the associated Dilution and Method and stick it on
a line with the first Cu value for L1/S1 and the first Cd value, ... Zn
value. It doesn't matter that different methods will be on the same result
line, they will be tied to the particular element.

If you look at the original table you will see that Lab 1 analyzed for Al 4
times (two different methods) and for Cu 3 times (two different methods) so
there is no Cu result for L1 on line 8 of the result table.
Similarly L2 has 2 results for both Al & Cu so both L2 lines are full. L3
has 3 Al values and 1 Cu so the last 2 L3 lines are blank for Cu.

The original table filled out:
Sample Lab Analyte Result Dilution Method
S1 L1 Al 120 1 M1
S1 L1 Al 118 1 M2
S1 L2 Al 122 1 M1
S1 L3 Al 119 2 M3
S1 L1 Cu 3.5 1 M1
S1 L1 Cu 3.3 1 M2
S1 L2 Cu 3.6 1 M1
S1 L3 Cu 3.8 1 M2
S2 L1 Al 135 1 M1
S2 L2 Al 140 2 M2
S2 L3 Al 138 1 M1
S2 L1 Cu 4.3 1 M2
S2 L2 Cu 4.5 1 M1
S3 L1 Al 89 1 M2
S3 L3 Al 92 1 M1


Desired Result:
Sample Lab Al AlDil AlMeth Cu CuDil CuMeth
S1 L1 120 1 M1 3.5 1 M1
S1 L1 118 1 M2 3.3 1 M2
S1 L2 122 1 M1 3.6 1 M1
S1 L3 119 2 M3 3.8 1 M2
S2 L1 135 1 M1 4.3 1 M2
S2 L2 140 2 M2 4.5 1 M1
S2 L3 138 1 M1
S3 L1 89 1 M2
S3 L3 92 1 M1


Dale
 
Dear Dale:

I seriously doubt there's anything about the chemistry here where I'm
confused, but enough said.

There does seem to be some confusion about how to make the data work,
and it's in that direction I'm trying to direct my attention.

For reference, I have printed and labelled the two sets of data you
presented. I numbered the rows of your data 1-15 and lettered the
rows of your results A-I.

It appears that the data are associated into results as follows:

A 1/5
B 2/6
C 3/7
D 4/8
E 9
F 10
G 11
H 14
I 15

Does this look correct?

Now, assuming I have that correct, I notice the following.

- In the four results A-D there have been pairings of data based in
large part on having the Sample/Lab/Method identical. You show two
Method columns, but in the cases shown these two are always the same.
Are they potentially different? If so, how does this affect your
intended matching of the Al/Cu pairs?

- In matching these pairs, I do not see a rule that can be used to
match 3/7 and 4/8 but not match 3/8 and 4/7. Can you propose such a
rule? This is exactly the point at which I suggested there may be
additional information that needs to be stored to permit the
association to be made.

I see where 3/4 differ in the dilution used in the Al analysis and the
methods also differ. This partially blows the association (JOIN) I
was considering.

In order to understand how the database can operate on this data,
pretend you are the computer. Someone hands you a bag containing 15
pieces of cardboard, on each of which is written one of the 15 rows of
data.

You must pair them so you have 1/5, 2/6, 3/7, and 4/8. However, the
numbers 1-8 are NOT written on the pieces.

You must do this job based on nothing but what is written on each
piece.

I can readily associate 1/5 and 2/6. But I don't see how to get 3/7
and 4/8 and not 3/8 and 4/7. Do you know how?

If you do know, then please explain. If there is such a rule,
probably it can be queried.

The illustration I give here is fairly accurate as to how a database
engine does its job, but is not necessarily rigorously exact. It is
meant as a helpful illustration only.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Tom,

Thanks again for trying to help me with this. I'm probably not doing a very
good job of explaining this however. There is nothing magical or sacred
about matching 1 with 5. If for example the first two lines of the original
table were switched then it would still be 1/5 except the Result value and
the Method value for the first two Al/Method values in the resulting table
would be switched but the Cu values/methods would be the same. This is
fine. However, if lines 2 & 5 were switched then the pairing would be 1/2.
What is critical is that since there is only one Lab column that no L2 data
get stuck in a L1 row and that the Result, Dilution & Method go with the
Lab, Sample and Result value.

What I envision is that as the database chugs along reading the original
table it will create a row for L1/S1, create a column heading based on the
value of Analyte and put that value in place, then create two other columns
to hold the values for Dilution and Method for that Analyte. As it reads
the second line it will create a new row if the Lab/Sample/Analyte are the
same (as in line 2) or create new columns for row 1 if the Lab/Sample are
the same but the Analyte is different (row 5). Since row 5 is the first
time a new Analyte is introduced for L1/S1 it gets appended to the first
Lab/Sample row that matches the values for that original row (row 1 in this
case).

What I don't know is how to get Access to create the AlDilution/AlMethod
column headings because I don't know how to tell it to use the value of
Analyte in the original table to create those headings.

Again the only pairing issue that is important is that of Lab/Sample.
Matching methods between Analytes in not important.


Dale
 
Dear Dale:

OK, then trying to dispell some of the mist, here's a proposed query.
I think it won't be what you want, but it does answer some of your
problem.

You need to join the table on itself. This means you must get the
database engine to look at the table twice, so you must alias it to
keep two instance of the same table distinct.

SELECT T1.Sample, T1.Lab, T1.Result AS Al, T1.Dilution AS AlDilution,
T1.Method AS AlMethod, T2.Result AS Cu, T2.Dilution AS CuDilution,
T2.Method AS CuMethod
FROM YourTable T1
LEFT JOIN YourTable T2
ON T2.Sample = T1.Sample AND T2.Lab = T1.Lab
WHERE T1.Analyte = "Al" AND T2.Analyte = "Cu"

Hopefully, this solves the problem of combining the data and creating
the columns for which you asked.

You may want to review my earlier posts to see what I was talking
about in trying to get the correct rows to match. But perhaps getting
past the other difficulties will help us to be able to move on to the
issues I thought were your problem at first.

Be sure to change YourTable to the actual name of your table, but,
initially, leave the rest alone and see what it does.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Tom,

Thanks once again. This did solve some of the issues that didn't seem to be
possible with a crosstab query. One problem that remains is that the left
join is duplicating Al values in rows 1 & 2, eliminating Al values in row 14
& 15 of the original table and duplicating the Cu values in rows 5 & 6. The
resultant table looks like this:

Query3 Sample Lab Al AlDilution AlMethod Cu CuDilution CuMethod
S1 L1 118 1 M2 3.5 1 M1
S1 L1 120 1 M1 3.5 1 M1
S1 L2 122 1 M1 3.6 1 M1
S1 L3 119 2 M3 3.8 1 M2
S2 L1 135 1 M1 4.3 1 M2
S2 L2 140 2 M2 4.5 1 M1
S1 L1 118 1 M2 3.3 1 M2
S1 L1 120 1 M1 3.3 1 M2


This may be a more manual approach but if this isn't easily solvable I could
just do one element at a time in a new table. Export each table to Excel,
sort by Sample and Lab and cut and paste.

Do you have a better idea?

Thanks
Dale
 
Dear Dale:

Using the same nomenclature, your results are for these combinations,
in order: 2/5, 1/5, 3/7, 4/8, 9/12, 10/13, 2/6, 1/6

So, the "duplicates" are 1/5 and 1/6 plus 2/5 and 2/6. That is
because the query picks up all possible combinations. This is
expected behavior, and is what I was commenting on when we started.

Now, you showed 1/5 and 2/6. On what basis is that pairing better
than 1/6 and 2/5? If there is a rule that straightens this out, I
need to know that rule.

Your proposed solution originally eliminated rows 12 and 13, rather
than 14 and 15. Would that be better? Or do you want both sets?

Changing from LEFT JOIN to RIGHT JOIN will make a difference. A FULL
JOIN (which Jet won't do) eliminates none of the rows. But, the FULL
JOIN can by synthesized if thats what you need.

The manual work in Excel is fine, as long as you're only doing one or
two of these and they aren't any more complex than this one. But if
this is a small sample, or if it's going to be done many times, it
would be better to automate, don't you think?

Nice challenge, really. Shall we continue?

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Dear Tom,

If you're willing to contine, I'm very grateful. I know just enough about
queries to be dangerous. Thanks for your time.

You said:
....Now, you showed 1/5 and 2/6. On what basis is that pairing better than
1/6 and 2/5? If there is a rule that straightens this out, I need to know
that rule.

I say:
Either 1/5, 2/6 or 1/6, 2/5 is OK. There is nothing in the original data
that would make one pairing better than the other or that could specifically
tie one Al value to a particular Cu value. I just need to be able to
eliminate the duplication because the statistical treatment of these data
will be skewed if I have extra weight given some analyses because of
duplication. This gets more complicated when I add the other 30 or so
elements to this example (i.e. the real data file).

You said:
Your proposed solution originally eliminated rows 12 and 13, rather than 14
and 15. Would that be better? Or do you want both sets?

I say:
I need both sets!

Thanks again!

Dale
 
Dear Dale:

Good! I rather enjoy this, possibly because of the chemistry angle,
on of my passions.

You say you want 1/5 and 2/6 or 1/6 and 2/5, but not both. Here, I'm
going to 'Invent' a principle of query design, because I've seen it
happen often.

You cannot get a query to make arbitrary decisions. You must find a
way to specify how it is to decide, or you're not going to get the
results you expect.

This is a prime example. If 1/5 is valid and 1/6 is valid, then
you're going to get both. The rows in the source don't get "used up"
and disappear.

One thing that occurs to me is that you aren't really wanting to
associate these pairs of Al/Cu analyses, but that you just want to see
them together. This would be possible if you just went vertical with
the data, sorting by Sample/Lab/Analyte for example. The person
reading the data could then decide how he wants to associate them, if
at all. If you build a report, you can leave a blank line where
Sample/Lab changes and see this plainly.

The other problem, where you want to see "both sets" implies a full
join, assuming you still want a join at all. If you implement what I
suggested in the previous paragraph, you wouldn't have a join at all,
and everything will show up. I'll wait to hear from you about this
before suggesting a way to do that.

If you do still want a join, we can randomly associate the rows.
After generating the random number on each row, we would rank this,
then JOIN on the ranking. This would randomly pair the rows 1/5 and
1/6 by 2/5 and 2/6 (by meaning a cross product, like what you're
seeing now.

The kind of processes were discussing here, the full join and
expecially the random/rank/join are a bit of a mess, but I think Jet
can handle it. I'll have to set up an actual test database here to
get that far with any assurance, so I'd like to clear up just which
options we're going to use before continuing, as this will be a bit of
work on my part (well, yours, too!)

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Dear Tom,

I replied a couple of days ago but it must have vanished into the Ether or I
clicked the wrong button. Anyway...

You said:
Now, you showed 1/5 and 2/6. On what basis is that pairing better
than 1/6 and 2/5? If there is a rule that straightens this out, I
need to know that rule.

I say:
Either 1/5 & 2/6 OR 1/6 & 2/5 are acceptable. There is nothing in the
original data that would allow us to know which Al value actually went with
a particular Cu value so either paring is acceptable but not both. The
duplication will screw up the statistics by adding more weight to the
duplicated values. This gets even worse when we add the other 30 elements
that we must consider. My example only dealt with 2 but there are 30+. By
the time we get from Al through the periodic table to Zn you can imagine the
duplication mess that would result. What we must preserve is the Lab/Sample
combo so that no value from L1/S1 (for example) gets attached to L1/S2 etc.

You said:
Your proposed solution originally eliminated rows 12 and 13, rather
than 14 and 15. Would that be better? Or do you want both sets?

I say:
I need both sets. I cannot have any elimination or duplication of data from
any table.

Thanks again for your willingness to help.

Dale

P.S. Should we continue this offline (via email?)
 
Back
Top