Deleting near-duplicate records that differ only by blanks

  • Thread starter Thread starter toadflax
  • Start date Start date
T

toadflax

Hi,

I am struggling with a dataset of 9699 rows and 29 columns in Excel
2003. I have already removed exact-duplicate records but now seem to
have some near-duplicates that differ only by the value in one column.
Specifically, they appear to be identical except that the duplicate
records have a blank in Column M rather than text. Is there any way to
easily delete records where the only difference between it and another
record is that one (or perhaps even more than one) of the fields is
blank? I'm only interested in retaining records with actual
differences in the text in each column, not those that only differ by
having some data missing. I've searched on several terms but haven't
come across anything that quite matches what I'm trying to do.


Thanks very much,
Michelle
 
Hi,

I am struggling with a dataset of 9699 rows and 29 columns in Excel
2003. I have already removed exact-duplicate records but now seem to
have some near-duplicates that differ only by the value in one column.
Specifically, they appear to be identical except that the duplicate
records have a blank in Column M rather than text. Is there any way to
easily delete records where the only difference between it and another
record is that one (or perhaps even more than one) of the fields is
blank? I'm only interested in retaining records with actual
differences in the text in each column, not those that only differ by
having some data missing. I've searched on several terms but haven't
come across anything that quite matches what I'm trying to do.

Thanks very much,
Michelle

At least show us your layout. You may send the file to dguillett1
@gmail. com with a complete explanation.
 
It might be useful to apply Trim() to all of your cell values. It's
rare for leading/trailing spaces, or values which contain only spaces,
to be significant, so a good first step is to remove them.

Eg:

Dim c as Range
For Each c in ActiveSheet.Range("A1:AC10000").Cells
If not c.HasFormula Then c.Value = Trim(c.Value)
Next c

Tim
 
Hi,

Sorry if I wasn't clear, I'm very much an Excel novice in this area. I
can't send the data since it doesn't actually belong to me but the way
I currently have it sorted it's along the lines of:

Row 1) URL:Name1:<ten columns, including some which are blank>:Text
Info:<sixteen columns, including some which are blank>
Row 2) URL:Name1:<ten columns identical to those in Row
1>:BLANK:<sixteen columns identical to those in Row 1>
Row 3) URL:Name2:<ten columns, including some which are blank>:Text
Info:<sixteen columns, including some which are blank>
Row 4) URL:Name2:<ten columns identical to those in Row
3>:BLANK:<sixteen columns identical to those in Row 3>
Row 5) URL:Name3:<twenty-seven columns, including some which are
blank>
Row 6) URL:Name4:<twenty-seven columns, including some which are
blank>

So it's a mix of records with near-duplicates and some without. I was
hoping to find a way to delete the near-duplicates that differ from
another record in the sheet only by these blank cells rather than
having to check them all manually. I hope that's clearer?


Thanks very much,
Michelle
 
Back
Top