Dear Jonathan:
First of all, when you say there are duplicate rows, are you saying
that this means two rows with every column identical? Or are you
defining "duplicate" as being based on a subset of all the columns?
The typical approach is to aggregate (create a totals query) grouped
on all the rows that are in your definition of what is a duplicate.
Then, for every other column in the table, you must have some rule
that exactly specifies which value will go into those columns in the
row to be retained. This can be done by aggregating (sum, min, max,
etc.) the values column-by-column or by choosing a specific row from
which the values are to be retained. If you wish to choose a specific
row from which to retain the values, then the method of selecting this
row MUST specify only one row from which to choose.
Your post suggest you have a ModDate by which you would want to choose
the row to be retained. I'll assume you want to keep the most recent
ModDated row for each PatientID.
SELECT PatientID, ModDate, <other columns go here>
FROM PatientTreatment PT
WHERE ModDate = (SELECT MAX(ModDate)
FROM PatientTreatment PT1
WHERE PT1.PatientID = PT.PatientID)
The above MAY give you just one row for each PatientID. However, if
there exists a patient where the most recent ModDate is found in more
than 1 row then ALL the rows with that PatientID and ModDate will be
represented. That would be because you have not provided sufficient
data to make this unambiguous (we would say "not unique")
The need to have PatientID / ModDate unique may be a requirement in
your database. In fact, you may have implemented this as the Primary
Key to this table, or as a Unique Key to the table. This would
prevent there being duplicates in the table on this combination. In
order to test whether there is such ambiguity in choosing the most
recent record for each patient, you could add a Unique Index on this
key. It will fail if it is not unique. If it succeeds, it not only
proves this combination is unique, but will enforce that against
future entries into the table (whether insertions or updates).
If the PatientID / ModDate does not prove to be unique, you would have
to find additional rules to determine uniquely the values to be
extracted from the existing data into the new data. The query simply
cannot be made to work until such ambiguities are removed by rules you
define. For this reason, sufficient constraints on the original
database (especially the PatientID / ModDate unique index I suggested)
are invaluable.
Finally, I suggest that, rather than delete data from the existing
table, you keep this table intact and create a new table for the data
you are preparing. Create this table empty and use the above SELECT
query (or one modified according to the rules you develop) to INSERT
into that table (an append query). Put the necessary constraints on
this table (Primary Key of PatientID for example) to prove the
integrity of the process thus created. You can then run the process
again and again as you test it without lowing any of the original
data.
Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts