Evaluate only the first word in a field

  • Thread starter Thread starter Nick Mirro
  • Start date Start date
N

Nick Mirro

One column in a table contains 3 to 4 words with various other characters
interspersed (dashes, periods, etc.) I would like a "find duplicates" query
to evaluate only the first word encountered, ignoring everything following
the first blank space. How might this be done? Thanks.

Nick
 
Hi Tom. Thanks for taking your time to help me with this. I am trying to
compare a genus/species list from the Dallas World Aquarium, with a large
taxonomic database from "Natl cntr for biotech info" - NCBI. The goal is to
produce taxonomic hierarchies for the 114 species at the aquarium/zoo, using
the parentID structure of NCBI's data.

I currently have an Access database that can generate the hierarchies
(thanks Graham Mandeno!) so I am now first trying to match genus' (genuses?)
between the small and large datasets, in order to assign NCBI NodeIDs to the
new organisms. Sorry to bore you with the details here, but this is why I
need to be able to run a find duplicates query.

The large db has the name list with up to 4 words, etc. Anyway, I tried the
new statement and it ran, but returned no records. My first question is:
Is this statement going to return the first word it encounters for its
column or is there any "duplicate finding" code included? I ran it only for
the NCBI NodeName column hoping it would return just the first words, which
could be pasted into a new table.

I'm sorry if I have been unclear here. Here are a few facts that hopefully
clarify my goal.


1. I first need a query to return the first word in each record in the
NodeName column (190,000 records)

2. I would like to paste these new single word records into a new table.

3. Worry about how to "find duplicates" between the small table (114
records) and the newly created one.

4. Assign the NCBI NodeIDs to the 114 aquarium names.

5. Generate taxonomic hierarchies for the 114 critters (monkeys, stingrays,
toucans, etc.)

Nick
 
Dear Nick:

Well, it sounds like fun. It may take a genius (genii) to figure it
out!

So, you should be past steps 1 and 2, right?

As for the later steps, without becoming intimately acquainted with
your data, it's hard to guess just how those will work out.

Would there be any sense in just adding the "first name" results to a
new column in the existing table (or a copy of that table) and
indexing it before proceeding. I'm thinking your later steps will
need to reference other data once the first name has been matched.
That should make it run much faster I think.

But I'm at a loss to imagine what the rest of your data looks like and
how you will handle any more of this. But you should be able to learn
a lot from trying! You'll learn about genies if not also about Access
as well.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
Genii : ))

I'm still at step 1. I think I buried this too deeply in my message:
"Anyway, I tried the new statement and it ran, but returned no records." I
thought that the sql statement would take the string from each record and
return just the first word. Something must have gone wrong.

The reason for pulling the first word from each string is that I have
filtered the entire NCBI names list for genus. Nodes are ranked genus even
if they include species (binomial). This means that the first word will
always be a genus. The aquarium list is just single word genus. Running a
"find duplicates" will give me the NodeIDs for the aquarium list. I need
the NCBI NodeIDs for the new creatures.

Was the sql supposed to only return the first word in each record string?
I added a right parentheses to the end to even the number.

Nick
 
Back
Top