Eliminating Cartesian Effect when Querying Multiple Tables

  • Thread starter Thread starter esn
  • Start date Start date
E

esn

I have a database of vegetation data for a series of geographic points
that contains multiple tables based on different categories of data,
such as "trees," "stumps," "saplings," etc. I have set it up so that
the tables are all related to one "plot description" table, which
contains a few general characteristics for each point for each year.
My problem is that when I try to query the data stored in multiple
tables I get cartesian results when I don't want them.

For example, if I try to design a query to tell me how many trees are
in a particular plot, all is well. I can use the count function and
return the correct number of trees. But if I try to include the
number of stumps as well I get a cartesian effect, wherein the totals
are multiplied by each other at each site. This is clearly not what
I'm looking for.

I've tried changing the properties of the joins between tables and had
no success, in fact it only caused more problems by excluding sites
completely where the count in one or the other table was zero. I've
also tried creating direct joins between the trees and stumps tables
in addition to the relationships they each have with the "plot
description" table, but then the query won't run. The "unique
records" and "unique values" properties don't seem to help either.

As an added complication, my goal is to create a database that's easy
for someone without a solid background in Access to use. I know I
could get the correct query results by using subqueryies, but if the
people using the database have to write several subqueries to get some
basic statistics from the data they won't be happy.

So my question is this: is there a better way for me to organize this
database so that data from multiple tables can be queried at once
without cartesian results? Or is there something I'm overlooking when
writing these queries that will avoid this problem?
 
How depends on what ...

If you'll provide a more detailed description of the data structure you
currently are using, and a copy of the SQL statement for the query you've
tried, folks here may be able to offer more detailed suggestions...

Regards

Jeff Boyce
Microsoft Access MVP

--
Disclaimer: This author may have received products and services mentioned
in this post. Mention and/or description of a product or service herein
does not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.
 
As you have said, you must usually use subqueries to achieve such a result.
Also, don't think that your users will be able to build queries - with our
without subqueries - to query your database. Probably that you'll have to
build reports and one or more forms with all the necessary filters to
produce these reports.

Another solution would be to switch to OLAP. The usual database design that
you are using is called an OLTP for Transactional system while OLAP is for
Analysis. By using the Pivot Table and the Pivot Form - that I don't really
know about - Access can provide limited OLAP capabilities but they are more
powerful tools out there for doing OLAP. Excell can also be used for some
limited OLAP capabilities.

You should search the web for things like "MS-Access OLAP".

--
Sylvain Lafontaine, ing.
MVP - Windows Live Platform
Blog/web site: http://coding-paparazzi.sylvainlafontaine.com
Independent consultant and remote programming for Access and SQL-Server
(French)
 
Thanks so much for your response. I will look into OLAP for future
use - for now it looks like I'll be saving a long list of subqueries
in this database and crossing my fingers that they end up being used
correctly.
 
The problem is that you have "multiple tables based on different categories
of data,
such as "trees," "stumps," "saplings," etc.". You data is not normalized
properly and will always give you problems.

I highly recommend getting some relational database training or reading
"Database Design for Mere Mortals" by Hernandez before proceeding any further
on this database.
 
I'm not so sure it's a normalization issue. Even with a well-designed db,
you can't include multiple, unrelated 1:M joins in a single query and have
meaningful results. The original example of a Plot having many Trees and
many Stumps sounds like a reasonable design, but then you have to combine
separate queries to count trees and stumps, or logically separate queries by
using independent, correlated subqueries.

Select P.plotCode, P.plotName, T.treeCount, S.stumpCount
From Plots as P
Inner Join (Select plotCode, count(*) as treeCount From Trees Group By
plotCode) As T On T.plotCode=P.plotCode
Inner Join (Select plotCode, count(*) as stumpCount From Stumps Group By
plotCode) As S On S.plotCode=P.plotCode

You could redesign the structure to have a Plot containing many PlotItems,
each with a PlotItemType, and subcategory tables discriminated by
PlotItemTypeCode, but I'm not sure that's a better design. I think it would
depend on the details of those PlotItemTypes.
 
I have set it up so that the tables are all related to one "plot
description" table,
Silly question but do you have them joined to the "plot description" table?
It should be left joined to the others.
 
Yeah the left joins are in place and I've experimented with several
ways of manipulating the joins with no good results. On the plus
side, it looks like I'll be doing most of the analysis myself, so I
can let things get a little more complicated in terms of combining
several subqueries. That creates a new (and hopefully much simpler)
problem for me, though. When I try to analyze the data for each plot
by combining results from several subqueries I end up with null values
in several of the columns. For example, if a plot has five trees and
no stumps, all of the statistics drawn from the "Stumps" table for
that plot will be null. That makes sense, since there are no records
in the "Stumps" table that match that plot ID. The problem is that
these summaries are themselves subject to more general analysis - for
example the average number of stumps per plot. If I calculate that
based on the query results I have now, I will get a highly
overestimated average because plots with no stumps will be ignored,
rather than factored into the average with a count of zero.

I know you can use the Nz function to interpret nulls as zeros when a
record exists, but that doesn't seem to help here. What I really want
to do is interpret the lack of any records at all as a zero. I've
played around with joining the subquery used to count stumps to the
plot description table and grouping by the plot ID from plot
description, rather than the stumps table. Combining this join with
the Nz function produces a list of all plot IDs and the number of
stumps in each plot, with a zero where applicable. but that only works
if there are no criteria involved. As soon as I try to count stumps
over a certain height, for example, I end up only the plots that
contain records that fit the criteria. Is there some way to force
Access to interpret the lack of any records as a zero rather than a
null?

I'll try to post a sample as I think that might clear this up a
little, but any help in the meantime is much appreciated.
 
Wait nevermind. I can just apply the Nz function to the final query
(the one that compiles several subqueries) before I compile the
general statistics. Thanks anyway!
 
Back
Top