Finding Duplicate Entries

  • Thread starter Thread starter MarkC
  • Start date Start date
M

MarkC

Using XP Pro SP2 and Excel XP (2002)

Like to know if it is possible find all duplicate entries?. Can not use a
starting string because it's unknown having all kinds of possibilities.

I text pasted a file directory with paths and filenames, I can have
duplicate filenames in different directories, and like to search for
duplicate names if possible. The list was entered in delimited based on "\"
which will be in it's own column, dependent on how many sub-folders.

At this stage, I am interested in only finding duplicate file naming. Not
concerned if these files are identical files, or different files having the
same name.

Thanks,

Mark
 
Mark,

I'm assuming that your list is in column A, starting in cell A2.

In Cell B2, array enter (Enter using Ctrl-Shift-Enter) the formula:

=MID(A2,MAX(IF(ISERROR(SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))),0,SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))))+1,256)

in Cell C2, enter the formula:

=IF(COUNTIF(B:B,B2)>1,"Duplicate File Name","")

Then copy B2:C2 down to match your list of file names in column A, and duplicate filenames will be
highlighted by column C's value.

HTH,
Bernie
MS Excel MVP
 
(TOP-POSTING CORRECTED)

Mark,

I'm assuming that your list is in column A, starting in cell A2.

In Cell B2, array enter (Enter using Ctrl-Shift-Enter) the formula:

=MID(A2,MAX(IF(ISERROR(SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))),0,SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))))+1,256)

in Cell C2, enter the formula:

=IF(COUNTIF(B:B,B2)>1,"Duplicate File Name","")

Then copy B2:C2 down to match your list of file names in column A, and
duplicate filenames will be highlighted by column C's value.

HTH,
Bernie
MS Excel MVP


That doesn't appear to work (at least not in Excel 2003).

Here, however, is a low-tech solution that does work in Excel 2003:

Assuming that A1 is a column heading and your name entries begin in A2,

1. select column A
2. data> filter > advanced filter> check Unique records only > click OK
3. Alt; (selects visible cells)
4. Data > filter > show all

Step 2 will hide all the duplicates.
Step 3 will highlight the remaining unique records.
Step 4 will make the entire column A reappear, with Unique records
highlighted (grey) and the duplicates un-highlighted (white).

Hope that helps.
 
Thanks, I will give it a try!
M


Bernie Deitrick said:
Mark,

I'm assuming that your list is in column A, starting in cell A2.

In Cell B2, array enter (Enter using Ctrl-Shift-Enter) the formula:

=MID(A2,MAX(IF(ISERROR(SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))),0,SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))))+1,256)

in Cell C2, enter the formula:

=IF(COUNTIF(B:B,B2)>1,"Duplicate File Name","")

Then copy B2:C2 down to match your list of file names in column A, and
duplicate filenames will be highlighted by column C's value.

HTH,
Bernie
MS Excel MVP
 
First, it's the custom of the microsoft.public.excel newsgroups to top post.
It's different than most of Usenet.

And try Bernie's array formula once more. That long formula returns just the
filename and drops the drive and path (all folders). The formula returns
everything after the last backslash.

It should work for you in xl2003--it worked for me.

Second, your formula looks for exact duplicates for the whole string--not just
the filename. Your technique doesn't address the OP's problem.
 
I tried both methods and got both of them to work. Bernie's formula is the
one that I needed for this particular project. However, it is good know
about the advance filter explained by Alan. It may come in handy for some
other projects down the road.

I do have one more issue, I have 10,000 rows, is there a quicker way to
"copy down" the formula cells other than holding down the bottom right
corner and dragging? I have other duplicate filenames sheet tabs that have
20,000+ rows and there's got to be a faster way. Using Excel 2002.

Thanks,

M
 
Mark

Enter the formula then double-click on the fill handle to copy down as far as
you have data in an adjacent column.

Or type the formulas in B2 and C2 then in namebox type B2:C10000 then ENTER key.

Then CTRL + d to fill.


Gord Dibben MS Excel MVP

I tried both methods and got both of them to work. Bernie's formula is the
one that I needed for this particular project. However, it is good know
about the advance filter explained by Alan. It may come in handy for some
other projects down the road.

I do have one more issue, I have 10,000 rows, is there a quicker way to
"copy down" the formula cells other than holding down the bottom right
corner and dragging? I have other duplicate filenames sheet tabs that have
20,000+ rows and there's got to be a faster way. Using Excel 2002.

Thanks,

M
 
Thanks Gord, I got the namebox method to work. But for some reason couldn't
get the double-click on the fill handle to work. Can you explain that one a
little more? And is it available in Excel 2002?

Thanks,

M


Gord Dibben said:
Mark

Enter the formula then double-click on the fill handle to copy down as far
as
you have data in an adjacent column.

Or type the formulas in B2 and C2 then in namebox type B2:C10000 then
ENTER key.

Then CTRL + d to fill.


Gord Dibben MS Excel MVP
 
Too ALL:

I have played around with the duplicates and found out with Bernie's method,
it does find all filename duplicates, but the problem is, when deleting the
rows, it removes "ALL" duplicates. I need to keep one on record to show
that it's on file.

With Alan's method, it looks for exact duplicates for the whole string. The
good part is when deleting the marked duplicates, it leaves one on record.

M
 
I think that the difference is how people understand your original question.

If you want all duplicates marked, then use Bernie's formula:
=IF(COUNTIF(B:B,B2)>1,"Duplicate File Name","")

If you want all duplicates marked but not the first entry, you can use:
=IF(COUNTIF(B$2:B2,B2)>1,"Duplicate File Name","")
 
Yes, it is available in 2002

Enter the formula in top cell of column.

If adjacent column has continuous data, double-clicking on the little black
square at lower right corner of the cell should copy down as far as you have
data in the adjacent column.

Note: you must have "allow drag and drop" checked in Tools>Options>Edit in
order to get the fill handle.

You did say you could drag/copy so I assume it is checked.


Gord

Thanks Gord, I got the namebox method to work. But for some reason couldn't
get the double-click on the fill handle to work. Can you explain that one a
little more? And is it available in Excel 2002?

Thanks,

M
 
I am retyping Alan's method for quick reference:

*****Assuming that A1 is a column heading and your name entries begin in A2,

1. select column A
2. data> filter > advanced filter> check Unique records only > click OK
3. Alt; (selects visible cells)
4. Data > filter > show all

Step 2 will hide all the duplicates.
Step 3 will highlight the remaining unique records.
Step 4 will make the entire column A reappear, with Unique records
highlighted (grey) and the duplicates un-highlighted (white).*****

My Questions:
Is there an opposite way in doing this? That is, is there a way to select
non-visible cells? Or does Excel have a reverse highlight/un-highlight
feature? Highlight becomes Un-highlighted and vice-versa.

Thanks,
M
 
Got IT!!!

Thanks!
M


Gord Dibben said:
Yes, it is available in 2002

Enter the formula in top cell of column.

If adjacent column has continuous data, double-clicking on the little
black
square at lower right corner of the cell should copy down as far as you
have
data in the adjacent column.

Note: you must have "allow drag and drop" checked in Tools>Options>Edit
in
order to get the fill handle.

You did say you could drag/copy so I assume it is checked.


Gord
 
Back
Top