Logic to check a range for an existing sequence?

  • Thread starter Thread starter Ray
  • Start date Start date
R

Ray

I need help checking to see if an entry in column A pre-
exists within that same column, but in the rows above the
current location. This data is potentially non-sequential.

What I need to do is generate a unique cell entry that
contains a portion of a last and first name. I started
simple, like this;

All examples are entered in Cell A2, with row 1
containing a heading.

Column C = last name
Column D = first name
Column A = formula as seen below
=CONCATENATE(LEFT(C2,4),LEFT(D2,1))

There is a conditional formula that highlights
duplicates in RED
=IF(AND(NOT(A1="")),(A1)=(A2))

I then modified the formula to check for duplicates in
the cell portions I would prefer to pull from, and, if
applicable, change the format from 4LAST/1FIRST to
3LAST/2FIRST, like this;

=IF(CONCATENATE(LEFT(C1,4),LEFT(D1,1))=(CONCATENATE(LEFT
(C2,4),LEFT(D2,1))),CONCATENATE(LEFT(C2,3),LEFT
(D2,2)),CONCATENATE(LEFT(C2,4),LEFT(D2,1)))

Which works great, if there is only one duplicate. Of
course, my data contains several potential duplicates.

RODRIGUEZ ABUNDIO
RODRIGUEZ ANDRES
RODRIGUEZ ANDRES
RODRIGUEZ ANTONIO
RODRIGUEZ JOSEPH
RODRIGUEZ MIQUEL
RODRIGUEZ RAUL
RODRIGUEZ RICARDO
RODRIGUEZ RICARDO
RODRIGUEZ RUBEN
RODRIGUEZ VALERY

I was thinking I could use something like the LOOKUP or
INDEX function and copy a range that anchors the first
cell so the range continually expands from A$2 to the
cell above the current entry, but I'm not sure how to go
about this.

Grant it, I would prefer to generate this unique cell
entry entirely by formula, but I assume that due to
Excel's limitations, I may have to settle for overcoming
a few duplicates and flagging then rest for manual
intervention.

Further more, I would love to conditionally format (if
not correct) the cell based on validation for proper
charector length, but mayne I am dreming here.

Any help would be greatly appreciated!
 
Ray,

In Cell A2, use this formula:

=LEFT(C2,4)&LEFT(D2,1)&SUMPRODUCT((LEFT($A$1:A1,5)=(LEFT(C2,4)&LEFT(D2,1)))*
1)+1

Copy it down to match. It will create codes like

Rodrr1
Rodrr2
Roddr3

and so on...

HTH,
Bernie
MS Excel MVP
 
Yeah, I had thought of that, but the codes need to be
exclusively alpha.

Although each of these would autogenerate and duplicate
RODRA, what I need would actually be something like;
RODRA - RODRIGUEZ ABUNDIO
RODAN - RODRIGUEZ ANDRES
ROAND - RODRIGUEZ ANDRES
ROANT - RODRIGUEZ ANTONIO
Of course, it is possible that these codes exist
elsewhere, which is why I am seeking some sort of lookup
function.
 
Ray,

Then simply change the number to a character, using this formula (entered
without any linebreaks)
=LEFT(C2,4)&LEFT(D2,1)&CHAR(SUMPRODUCT((LEFT($A$1:A1,5)=(LEFT(C2,4)&LEFT(D2,
1)))*1)+65)

You'll get
RODRAA
RODRAB
RODRAC
RODDAD

Etc...

HTH,
Bernie
MS Excel MVP
 
...
...
What I need to do is generate a unique cell entry that
contains a portion of a last and first name. I started
simple, like this; ...
course, my data contains several potential duplicates.

RODRIGUEZ ABUNDIO
RODRIGUEZ ANDRES
RODRIGUEZ ANDRES
RODRIGUEZ ANTONIO
RODRIGUEZ JOSEPH
RODRIGUEZ MIQUEL
RODRIGUEZ RAUL
RODRIGUEZ RICARDO
RODRIGUEZ RICARDO
RODRIGUEZ RUBEN
RODRIGUEZ VALERY
...

There's no one, two or three rules that will give you distinct 5-letter IDs for
all individuals in a given surname-given name list. Generally, with names
deriving from European languages, there's more variation in first names than
last names, so the best default is the first char of the surname plus the first
four chars of the given name. Given your data, this gives

RABUN
RANDR
RANDR
RANTO
RJOSE
RMIQU
RRAUL
RRICA
RRICA
RRUBE
RVALE

with duplicates for RANDR and RRICA. The next simplest rule is to replace the
last char of the second duplicate with the 5th char of the given name. That
leads to

RABUN
RANDR
RANDE
RANTN
RJOSE
RMIQU
RRAUL
RRICA
RRICR
RRUBN
RVALE

which has no duplicates. I'll generalize this a bit. With surnames in C1:C#,
given names in D1:D#, you'd enter the following formula in col A.

A1:
=LEFT(C1,1)&LEFT(D1,4)

A2: [array formula]
=INDEX(LEFT(C2,1)&LEFT(D2,3)&MID(D2,Seq+3,1),
MATCH(0,COUNTIF(A$1:A1,LEFT(C2,1)&LEFT(D2,3)&MID(D2,Seq+3,1)),0))

and fill A2 down as far as needed. You may still have duplicates, but there'll
be fewer of them. You'll also need rules about how to handle given names that
have 3 or fewer chars, e.g., MIA.

Another alternative would be using initials plus 3 effectively random chars.

A1:
=LEFT(C1,1)&LEFT(D1,1)&CHAR(65+INT((ROW()-1)/26^2))
&CHAR(65+MOD(INT((ROW()-1)/26),26))&CHAR(65+MOD(ROW()-1,26))

That said, there are well-tested, public domain admin scripts written in
languages such as Perl, Python, Ruby etc. and even shell scripts which can
generate distinct IDs from lists of names subject to various configurable
constraints. You're reinventing the wheel, and you're using a particually
inappropriate tool to do so. Spreadsheets aren't really meant for text
processing applications such as this.
 
Back
Top