database cleanup

  • Thread starter Thread starter Angela Byars
  • Start date Start date
A

Angela Byars

I am trying to clean up a medical records database of electronic images(not
Access) which contains many, many duplicate records and images. The fields
in the database are:
Date, MedicalRecordNo, AcctNo, PatientType, PatientName, DocumentID,
DocumentPath and DocumentPageNo.
I have imported into Access an index file of the contents of the database
and am trying to write a query which will identify the duplicate records.
My problem is two-fold: even though the first five fields are identical,
the ID field and the Path field are different because we actually have two
copies of the same .tiff file. I would like to include the Page# field in
the query but, because these are single-page tiff's, I have an entry for
each page rather than one entry which shows the total number of pages. So
if I have two documents, one with 4 pages and one with 14, pages 1,2,3, and
4 show as duplicates even though they are not. My question is is there any
way to force Access to look at all of the page number entries and only count
as a duplicate only those that have the same number of pages? Or does
anyone have any other ideas for identifying the duplicates? We are talking
about a 354 GB database which is why I'm using indexes.

Hope I explained this clearly enough. TIA - Angela
 
Hi Angela,

I'm not quite clear what constitute a duplicate.
if I have two documents, one with 4 pages and one with 14, pages 1,2,3, and
4 show as duplicates even though they are not.

In this example, would the two documents have the same DocumentID? If not,
then they're not duplicate, no?


Immanuel Sibero
 
Therein lies the problem. We are not certain how the duplicates came to be
but we have identical .tiff files with two different document i.d.'s and
file names. LOTS of them. If they were multi-page tiff's then I could
"assume" that if all fields except document i.d. and path were the same then
I'd have a duplicate. I could include the "page" field as it would identify
the number of pages in the document instead of the page number. But, as it
stands with these being single-page tiff's, I cannot use the page field
since a document with, say, 4 pages and a document with 14 pages would have
identical entries in that field for pages 1 through 4. Does that make
sense? Thanks for the input - Angela
 
Back
Top