Is this query possible?

  • Thread starter Thread starter dano
  • Start date Start date
D

dano

Given a list of records such as this one with five fields:


X Y Z A M

W X G O B

A Q P Y R

Z W G T C


Can I do a query that will show all records (and only those records) in
which "W" appears regardless of which field?

The result the query should give would be:

W X G O B

Z W G T C


If there is likewise a way of getting Excel to show only records that meet
such a criteria, that would work for me, too. But so far, I haven't found
anything in the "Easy Excel/Access" type books yet that shows how to do it.

Thanks!
 
Dear Dano:

The reason this is difficult is because the data is structured as a
spreadsheet, not as it needs to be organized for a database.

For a database, your sample data should be stored:

1 X
1 Y
1 Z
1 A
1 M
2 W
2 X
2 G
2 O
2 B

Thus, the set numbered 1 contains X/Y/Z/A/M and the set numbered 2
contains X/X/B/O/B, just as you desired. This way of storing the data
allows a great deal of flexibility. You can have sets of varying
size, not just sets of 5 elements. For example, imagine a set with
1000 values. That would require 1000 columns. Structured as above,
even a set with a million elements would not be difficult.

From this structure, you can easily filter for any value and see
exactly which numbered set contains the desired value. From that you
could then list the other elements.

You CAN transform your existing data into the above. Do you have a
column that uniquely keys each set - that is a value to use as I have
used 1, 2, 3, . . .?

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Dear Dano:

The reason this is difficult is because the data is structured as a
spreadsheet, not as it needs to be organized for a database.

For a database, your sample data should be stored:

1 X
1 Y
1 Z
1 A
1 M
2 W
2 X
2 G
2 O
2 B

Thus, the set numbered 1 contains X/Y/Z/A/M and the set numbered 2
contains X/X/B/O/B, just as you desired. This way of storing the data
allows a great deal of flexibility. You can have sets of varying
size, not just sets of 5 elements. For example, imagine a set with
1000 values. That would require 1000 columns. Structured as above,
even a set with a million elements would not be difficult.

From this structure, you can easily filter for any value and see
exactly which numbered set contains the desired value. From that you
could then list the other elements.

You CAN transform your existing data into the above. Do you have a
column that uniquely keys each set - that is a value to use as I have
used 1, 2, 3, . . .?

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts


Yes, I have tried setting it up the way you suggested, but if I can set it
up the way I described in this post and have the query I am interested in
work, then that would be better. I'm not sure what you mean by spreadsheet
vs database though. Here is an example of a similar view:

http://databases.about.com/library/weekly/aa120300a.htm?PM=ss11_databases
 
Dear Dano:

You said:

"I'm not sure what you mean by spreadsheet vs database though."

That was really the whole point. If you are searching for the same
information in any of 5 columns, then you have 5 columns in a table
that are repeating the same thing. This is a violation of one of the
3 basic rules of database design. The fact that doing such a query is
then unnecessarily difficult is one of the reasons this is a bad
design - because it does not then function well as a database. It
would, however, function well as a spreadsheet. So, it is more of a
spreadsheet design than a database design.

I looked at your link but I don't see where that has repeating
columns.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Dear Dano:

You said:

"I'm not sure what you mean by spreadsheet vs database though."

That was really the whole point. If you are searching for the same
information in any of 5 columns, then you have 5 columns in a table
that are repeating the same thing. This is a violation of one of the
3 basic rules of database design.

However, if the kind of query I am asking about is possible, then it
wouldn't matter if I was violating such a rule, and I wouldn't have to go
through the hassel of typing in a number at the beginning that repeats for
numerous records (and for some reason, doing a "fill down" is not something
they saw fit to include as a tool in Access as it is with Excel).


You see, your suggestion of :

was actually the first thing I tried, but I thought it would save me one
field entry (eg. the repeated 1's and 2's shown above) to do it the other
way, not to mention make it an easier printout.

Specifically what I'm doing is a list of cdrs to which I have save mp3s.
For each cdr, I would enter the artists that appear. I finally found an
exceptable solution to me by entering all artists in the same field
seperate by "/" marks as such :

janis joplin / stone roses / rodan

then I set up a query for the artists field with criteria: LIKE "*rodan*"
And when I want to search for a specific artist, I open the query with
design view and type over rodan with whatever artist I want to find. The
results will show me all cdrs on which the artist appears. Clumsy, but it
works, and I get the kind of printout I like (I haven't learned how to do a
similar printout using your method if its possible).

....

All that said, you got me thinking. How would you handle something like
this, (assume no outfielders for the purpose of this example - assuming I
can get it to appear legibly in the post):

date pitcher 1b 2b 3b catcher statistician manager
10-01 smith jones doe ray fox doe doe
10-02 jones smith cox fields fox doe green
10-03 king doe fox poe davis king fox
10-04 ray fox jones green king ray ray

You'll note that in this example, the same players don't play every day,
plus they can have both a position and be the statistician *and* manager.

I suppose you would say to set it up this way:


player date pitcher 1b 2b 3b catcher stats manager
smith 10-01 yes
jones 10-01 yes
doe 10-01 yes yes yes
smith 10-02 yes


Then if you wanted to get a list of records where Doe played, yes you could
do it. However, what if your object was not just to see a list of dates
Doe played and his position, but rather was to see a list of the teams that
played on the dates when Smith played? Using the first database above, you
could get just such a list, *if* you could use the type of query my
original post mentioned. I'm not sure how you would do it with the second
example database above.
 
Naturally, it screwed up. Let's see if this prints any clearer:

date pitcher 1b 2b 3b catcher statistician manager
10-01 smith jones doe ray fox doe doe
10-02 jones smith cox fields fox doe green
10-03 king doe fox poe davis king fox
10-04 ray fox jones green king ray ray
 
Dear Dano:

For the baseball problem I would have:

Date Position Player
10-01 pitcher smith
10-01 1b jones
10-01 2b doe
10-01 3b ray
10-01 catcher fox
10-01 statistician doe
10-01 manager doe

Always try to go with more rows, not more columns. Design the table
so you do not put the same type of information into a row more than
once. That is, one player per row. You can transform this into an
output like what you show from the simpler raw data. Going the other
way is much harder to program, and will usually perform poorly.

For example, in order to search for a player using your design, and do
so quickly, you would need to index every column containing a player
name. In the above, there is only one column to index, and to search.
After the computer has found every instance of a player this way, you
can limit it to only certain positions quite rapidly.

To anyone accustomed to working with a spreadsheet, this is probably a
shocking thing. In a spreadsheet you commonly see the data entered in
the way that seems most natural to view it. In a database, everything
is quite different. But database is grounded in information theory,
and the tools provided to manipulate the data are designed around the
requirements of the theory. If you don't design using a different
paradigm that is adapted to the way databases work, then you will be
constantly fighting the tools you are given to do the job. If you
design withing these rules, you will be much happier, and more
productive.

For what it's worth, I can even design a screen that can enter the
data in the pivoted form, with columns for the positions, even though
storing the data in proper normalized form.

For example, what if you need to support a new position, say
scorekeeper. You would end up adding a column to the table and have
to change the search queries to search this new column. Using the
proper design I have shown, you have a table of positions. Just add
the new position to the data, making no changes to the tables or to
the queries. This is one of the reasons why designs follow such
rules. You'll be way ahead in the long run.

But, as with any paradigm shift, it can be a painful transition to
make in your thinking. You current travail (if I don't assume too
much about your mental processes) is not uncommon. You may be about
to "cross the bridge."

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
dano said:
However, if the kind of query I am asking about is possible, then it
wouldn't matter if I was violating such a rule, and I wouldn't have to go
through the hassel of typing in a number at the beginning that repeats for
numerous records (and for some reason, doing a "fill down" is not something
they saw fit to include as a tool in Access as it is with Excel).


You see, your suggestion of :




was actually the first thing I tried, but I thought it would save me one
field entry (eg. the repeated 1's and 2's shown above) to do it the other
way, not to mention make it an easier printout.

Specifically what I'm doing is a list of cdrs to which I have save mp3s.
For each cdr, I would enter the artists that appear. I finally found an
exceptable solution to me by entering all artists in the same field
seperate by "/" marks as such :

janis joplin / stone roses / rodan

then I set up a query for the artists field with criteria: LIKE "*rodan*"
And when I want to search for a specific artist, I open the query with
design view and type over rodan with whatever artist I want to find. The
results will show me all cdrs on which the artist appears. Clumsy, but it
works, and I get the kind of printout I like (I haven't learned how to do a
similar printout using your method if its possible).

...

All that said, you got me thinking. How would you handle something like
this, (assume no outfielders for the purpose of this example - assuming I
can get it to appear legibly in the post):

date pitcher 1b 2b 3b catcher statistician manager
10-01 smith jones doe ray fox doe doe
10-02 jones smith cox fields fox doe green
10-03 king doe fox poe davis king fox
10-04 ray fox jones green king ray ray

You'll note that in this example, the same players don't play every day,
plus they can have both a position and be the statistician *and* manager.

I suppose you would say to set it up this way:


player date pitcher 1b 2b 3b catcher stats manager
smith 10-01 yes
jones 10-01 yes
doe 10-01 yes yes yes
smith 10-02 yes

I'd suggest this:

player date position
smith 10-01 pitcher
jones 10-01 1b
doe 10-01 2b
doe 10-01 statistician
doe 10-01 manager
smith 10-02 1b



Then if you wanted to get a list of records where Doe played, yes you could
do it. However, what if your object was not just to see a list of dates
Doe played and his position, but rather was to see a list of the teams that
played on the dates when Smith played? Using the first database above, you
could get just such a list, *if* you could use the type of query my
original post mentioned. I'm not sure how you would do it with the second
example database above.

The days when smith played are (assuming 'play' means play a position)

select date from yourTable
where player = 'smith'
and position <> 'statistician'
and position <> 'manager'

The players that played on days that smith played are (with the dates
for convenience)

select date, player, position
from yourTable T1
where exists (
select * from yourTable T2
where T2.date = T1.date
and T2.position = 'smith'
and T2.position <> 'statistician'
and T2.position <> 'manager'
)

which can be written in other ways as well, as a self-join or with an IN
clause, like

select date, player, position
from yourTable T1
where date in (
select date from yourTable
where player = 'smith'
and position <> 'statistician'
and position <> 'manager'
)

It may seem harder, but to get the players that played on the days
smith played with the other format, you don't get the players to appear
all in the same column. What if you wanted information about them from
another table? The days smith played are easy enough:

select date
from yourTable
where 'smith' in (pitcher, "1b", "2b", "3b", catcher)

but to get the players from those dates in the same column,
you would need something messier, I think:

select "1b", '1b' as position
from yourTable
where 'smith' in (pitcher, "2b", "3b", catcher)
union all
select "2b", '2b'
from yourTable
where 'smith' in (pitcher, "1b", "3b", catcher)
union all
....

or

select
IIF(position = 'pitcher', pitcher
IIF(position = '1b', "1b",
IIF(position = 'catcher', "2b",
IIF(position = 'catcher', "3b",
IIF(position = 'catcher', catcher,''))))), position
from yourTable inner join (
select 'pitcher' as position
union all select '1b'
union all select '2b'
union all select '3b'
union all select 'catcher'
) as T
where 'smith' in (pitcher, "1b", "2b", "3b", catcher)

SK
 
SELECT F1, F2, F3, F4, F5
WHERE F1 = "W" OR F2 = "W" OR ...




I tried this out, but a message popped up about syntax and using
parentheses. I'm going to have to brush up on this with a how to book.




--
 
For example, in order to search for a player using your design, and do
so quickly, you would need to index every column containing a player
name.


Yes. But... so?? ;)

In other words, computer processing power seems to double every couple of
years, yet they can't have a database do a search for multiple columns???

The advantage I was trying to gain by using multiple columns was simply to
not have to type in the same cd number repeatedly and use autonumbering
instead. Not a big deal in my case, but as I see it, if you can add a
perk, why not?

At any rate, the fact of the matter is your telling me the program can't
search more than one column. That answers my question, and I've found a
way around it which I like (typing in the artists names for each cd in a
single cell seperated by " / ".) Since I'm going this route, Excel works
better for me because I can insert rows easier and manipulate the numbering
column better.

Still, I want to continue to learn about Excel and Access for other things,
so I intend to follow up on your suggestion about my baseball example in
another post after I give it a look over. I picked up an Access Plain and
Simple book at the library, but I don't think its going to go into query
statements in much detail. I'll probably have to find another book just on
queries. It's nothing urgent, but it may come in handy on a resume to say
I have a decent grasp of the programs.
 
Yes. But... so?? ;)

In other words, computer processing power seems to double every couple of
years, yet they can't have a database do a search for multiple columns???

No, it can search multiple columns. However, it does not create a
single index with all the values from multiple columns. So, such a
search does not use the indexes. This means a "table scan" must be
performed, reading all the data from every row in the table to do the
job. This takes thousands, maybe millions of times as long to do.
With a significant amount of data, and numerous users accessing the
data, this could choke your network or server. Even if computer
processing power doubles every year, a design that is a million times
less efficient will require 20 years of such improvement. So, you can
write a program like this but won't work as well till 2023 as it would
today if written correctly.
The advantage I was trying to gain by using multiple columns was simply to
not have to type in the same cd number repeatedly and use autonumbering
instead. Not a big deal in my case, but as I see it, if you can add a
perk, why not?

In my designs, the user would certainly NOT need to type the CD number
repeatedly, but that's another issue. When the user completes the
entry and tabs, it would simply move DOWN the column instead of
across. And a column can contain thousands, even millions of entries
- no limit is imposed by the design.
At any rate, the fact of the matter is your telling me the program can't
search more than one column. That answers my question, and I've found a
way around it which I like (typing in the artists names for each cd in a
single cell seperated by " / ".) Since I'm going this route, Excel works
better for me because I can insert rows easier and manipulate the numbering
column better.

You've hit it on the head. When you don't want to use database, stick
to spreadsheet. If the amount of data is that small, and your
thinking inclines to spreadsheets, then use spreadsheets.
Still, I want to continue to learn about Excel and Access for other things,
so I intend to follow up on your suggestion about my baseball example in
another post after I give it a look over. I picked up an Access Plain and
Simple book at the library, but I don't think its going to go into query
statements in much detail. I'll probably have to find another book just on
queries. It's nothing urgent, but it may come in handy on a resume to say
I have a decent grasp of the programs.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
For the baseball problem I would have:

Date Position Player
10-01 pitcher smith
10-01 1b jones
10-01 2b doe
10-01 3b ray
10-01 catcher fox
10-01 statistician doe
10-01 manager doe

But if you did this and did a search for "smith", would not the results
displayed be:

date position player
10-01 pitcher smith
10-02 1b smith

If so, that doesn't accomplish what I said I wanted in my original
question.

Using your set up, the results I would be looking for from a query is:


Date Position Player
10-01 pitcher smith
10-01 1b jones
10-01 2b doe
10-01 3b ray
10-01 catcher fox
10-01 statistician doe
10-01 manager doe
10-02 pitcher jones
10-02 1b smith
10-02 2b cox
10-02 3b field
10-02 catcher fox
10-02 statistician doe
10-02 manager green
 
But if you did this and did a search for "smith", would not the results
displayed be:

date position player
10-01 pitcher smith
10-02 1b smith

That would depend on how you write the query. Based on the data the
way I suggested, you can obtain any information desired that you could
from the alternative design you proposed. The queries would also be
easier to write in most cases, and would perform better with
appropriate indexing in all cases.

So just what results were you expecting?
If so, that doesn't accomplish what I said I wanted in my original
question.

Using your set up, the results I would be looking for from a query is:


Date Position Player
10-01 pitcher smith
10-01 1b jones
10-01 2b doe
10-01 3b ray
10-01 catcher fox
10-01 statistician doe
10-01 manager doe
10-02 pitcher jones
10-02 1b smith
10-02 2b cox
10-02 3b field
10-02 catcher fox
10-02 statistician doe
10-02 manager green

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Dear Dano:

We have a rule we use very consistently on how we design forms. As a
result of this, our forms would not have CD# in the continuous form
for Artist. It would just be a single column.

Above that subform, we would have a subform for CD. It would list
many CDs, and the Artist subform would list only those Artists for the
currently selected CD. When the Artist has been entered and the user
tabs, the BeforeInsert event would grab the CD# from the CD subform
and update that before the insertion is performed. This makes it
pretty easy to use. But the process to get to that point is not so
simple. Our subforms do not use parent/child links, but rather they
rewrite the RecordSource for each dependent subform. The result is
similar to using parent/child links, but we are permitted to have
continuous subforms on the parent side which would not otherwise be
allowed.

Even using parent/child links and being limited to a single form for
the CD, you could still do this in a slightly less effective manner.
The bottom line is, you want to just enter Artist - tab - Artist - tab
- Artist and have CD# filled in for you from the CD already selected
in another form.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts

That sounds like what I'm after as far as doing the entry. thanks. I'll
have to get a more thorough tutorial book.
 
Is this the result to which you refer:

player date pitcher 1b 2b 3b catcher stat manager
smith 10-01 yes
jones 10-01 yes
doe 10-01 yes yes yes
smith 10-02 yes


No, the result I seek would be one in which you enter a single name into a
query, and it then lists all dates for which that person was a player
and/or manager and/or statistician, and lists *all* participants and their
duties for those dates. So the results of a search for smith would have
to contain all of the information below (in whatever configuration):

Date Position Player
10-01 pitcher smith
10-01 1b jones
10-01 2b doe
10-01 3b ray
10-01 catcher fox
10-01 statistician doe
10-01 manager doe
10-02 pitcher jones
10-02 1b smith
10-02 2b cox
10-02 3b field
10-02 catcher fox
10-02 statistician doe
10-02 manager green
 
Dear Dano:

So that could be:

SELECT *
FROM ThatTable
WHERE [Date] IN (SELECT DISTINCT [Date]
FROM ThatTable
WHERE Player = [Enter Player Name])

You can put the results from this in crostab form if you like.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Dear Dano:

So that could be:

SELECT *
FROM ThatTable
WHERE [Date] IN (SELECT DISTINCT [Date]
FROM ThatTable
WHERE Player = [Enter Player Name])

I see, thanks. The basic books I have looked at don't cover any of this
terminology. I'll have to find some more thorough books. Thanks for the
help.
 
Dear Dano:

So are we finished with this thread? Already? Well, it's been fun!

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts

Dear Dano:

So that could be:

SELECT *
FROM ThatTable
WHERE [Date] IN (SELECT DISTINCT [Date]
FROM ThatTable
WHERE Player = [Enter Player Name])

I see, thanks. The basic books I have looked at don't cover any of this
terminology. I'll have to find some more thorough books. Thanks for the
help.


You can put the results from this in crostab form if you like.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Dear Dano:

So are we finished with this thread? Already? Well, it's been fun!

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts

Yes. Thanks for the help. When I learn more about the syntax of more
involved queries I'll no doubt come up with some more questions. A little
knowledge is a dangerous thing. Happy Holidays.
 
Back
Top