Extract unique values with criteria, and sort them

  • Thread starter Thread starter Stephen Dunn
  • Start date Start date
S

Stephen Dunn

Hi All,

Okay, forget about all the manual methods, as well as VBA and add-ins,
this is a CHALLENGE to build a single formula from standard functions.
At this point I have a formula that will extract and sort unique items
in one column. However, I can't seem to build in any form of criteria
based on other columns. It can be done in two formulae, but that's
missing the point.


Take the following example list (in A2:A12):

a
ab
15
BLANKCELL
2
d
aaaab
1
100
2
aaaac


With this formula (which, I must admit, I'm quite proud of):

=INDEX($A$2:$A$12,MATCH(ROWS($A$2:$A$12)-
SUM(COUNTIF($A$2:$A$12,C$1:C1))-COUNTBLANK($A$2:$A$12),
COUNTIF($A$2:$A$12,">="&$A$2:$A$12)+ISNUMBER($A$2:$A$12)*
SUM(--ISTEXT($A$2:$A$12)),0))

typed into C2 as an array formula and copied down to C12, we get this:

1
2
15
100
a
aaaab
aaaac
ab
d
0
0

I'm not worried about anything that comes after the desired list (0's,
errors, etc), that can all be cleared up later.

Curiously, the formula to sort the full list is a little bit longer, but
that's beside the point.

Now, the problem (that is costing me hair, and hours of social life) is
with a list such as:

d 1
c 2
b 3
a 1
a 2
b 3
c 1
d 2
d 3
c 1
b 2

(simplified - blanks and numbers must still be accounted for)

Pulling and sorting the unique values that have a 1 in the next column
should produce:

a
c
d


Even though I started this as a personal challenge, I'm running out of
ideas and need fresh thoughts. All suggestions (except rude ones <g>)
are welcome, remembering the boundaries of the challenge.

Anyone that can solve this one before I flip my lid will earn a night
out in Newcastle, England
Drinks and Club entrance only
Travel and Accomodation, etc at own cost...
(option to crash at my flat)

Regards,
Steve D.

p.s. this is from home (unusual for me recently) but it is me, and the
offer is genuine - I look forward to meeting any of the regulars,
however it doesn't exclude anyone else.
 
Improvement to given formula:

=INDEX($A$2:$A$12,MATCH(SUM(COUNTIF($A$2:$A$12,C$1:C1)),
COUNTIF($A$2:$A$12,"<"&$A$2:$A$12)+ISTEXT($A$2:$A$12)*
SUM(--ISNUMBER($A$2:$A$12))-ISBLANK($A$2:$A$12),0))

Quest still stands.
 
Hi Alan,

Hmmm, I'm using xl2000, so that isn't the problem. The formula is
designed to deal with any data in one column, so any variation there
wouldn't matter, and I've just tried pasting it directly from the post
without a problem.

The one time I got a list of 0's was in a list with several blanks,
before the addition of -COUNTBLANK, or in the case of the revised
formula -ISBLANK.

Would you be willing to accept a copy of the workbook?

Steve D.
 
As I emailed you, I recopied the formula you originally posted, and it
worked as advertised; I don't know what I did wrong the first time.
Sorry for the misinfo.

Alan Beban
 
No problem, Alan. I just hope that you (or anyone) can provide the
missing piece.

Alan Beban said:
As I emailed you, I recopied the formula you originally posted, and it
worked as advertised; I don't know what I did wrong the first time.
Sorry for the misinfo.

Alan Beban

<snip>
 
Stephen Dunn said:
Improvement to given formula:

=INDEX($A$2:$A$12,MATCH(SUM(COUNTIF($A$2:$A$12,C$1:C1)),
COUNTIF($A$2:$A$12,"<"&$A$2:$A$12)+ISTEXT($A$2:$A$12)*
SUM(--ISNUMBER($A$2:$A$12))-ISBLANK($A$2:$A$12),0))
....

Good, it fixes a bug in the first formula when there are zero-length strings
(which COUNTBLANK includes as blank cells, but ISTEXT considers text, which
I'll call 'quasiblank'). However, it shares another bug with the first
formula when there are numeric zeros in the range. Replace your blank cell
with a zero, and you'll see.

If quasiblank cells may be treated the same as truly blank cells, the best
I've come up with so far is

=INDEX(X,MATCH(COUNTA(X)-SUM(COUNTIF(X,"="&OFFSET(Y,0,0,SUM(ROW()
-ROW(Y)),1))),(X<>"")*(COUNTIF(X,">="&X)+(COUNTA(X)-COUNT(X))
*ISNUMBER(X)+SUM((X="")*(X<>0))*ISTEXT(X)+1)-1,0))


where X refers to the list and Y refers to the topmost result's cell as a
row absolute (or fully absolute) reference. It's longer than yours, but it's
fully selc-contained. It doesn't need a blank cell above the topmost result
cell to seed the COUNTIF.

Also note: SUM(--ISNUMBER($A$2:$A$12)) == COUNT($A$2:$A$12).
 
Hi Harlan,

I hoped you might like getting into this one.

Harlan Grove said:
...

Good, it fixes a bug in the first formula when there are zero-length strings
(which COUNTBLANK includes as blank cells, but ISTEXT considers text, which
I'll call 'quasiblank'). However, it shares another bug with the first
formula when there are numeric zeros in the range. Replace your blank cell
with a zero, and you'll see.


Sorry, but you'll have to explain the problem, as it works just as I
would expect it. The 0 appears at the top of the list before other
numbers. Do you get something different?


If quasiblank cells may be treated the same as truly blank cells, the best
I've come up with so far is

=INDEX(X,MATCH(COUNTA(X)-SUM(COUNTIF(X,"="&OFFSET(Y,0,0,SUM(ROW()
-ROW(Y)),1))),(X<>"")*(COUNTIF(X,">="&X)+(COUNTA(X)-COUNT(X))
*ISNUMBER(X)+SUM((X="")*(X<>0))*ISTEXT(X)+1)-1,0))


where X refers to the list and Y refers to the topmost result's cell as a
row absolute (or fully absolute) reference. It's longer than yours, but it's
fully selc-contained. It doesn't need a blank cell above the topmost result
cell to seed the COUNTIF.


My thinking was that, once the mechanics of the overall problem were
sorted, the topmost cell would replace SUM(COUNTIF($A$2:$A$12,C$1:C1))
with 0, then that cell could be used as the starting reference from the
second onwards. It seemed an unnecessary complication to explain that,
but I can see how my stipulation of 'one' formula may have obstructed
that idea.

Also note: SUM(--ISNUMBER($A$2:$A$12)) == COUNT($A$2:$A$12).

Doh! I was stuck in one train of thought, after starting with a version
that put text at the top and used ISNUMBER(...)*SUM(--ISTEXT(...))

So, what's your opinion on whether the challenge can be met? I keep
making steps forward, then finding myself right back at the same point.

Regards,
Steve D.
 
Works for the ilustrated table. It seems to provide erratic results for
inclusions in the first column of a blank cell, 0 cell and/or ="" cell.

Alan Beban
 
I used A1:B11 as the table. Using the formula on the table as provided,
entered into 4 cells, returns a,c,d,#NUM! as advertised.

Enter 0 in A4 and delete contents (i.e., blank in this discussion) in A7
and it returns 0,0,c,d; one 0 for the blank and one for the 0.

Leaving those changes except entering ="" into A1 returns
0,blank,c,#NUM! It no longer returns a 0 for each of 0 and blank.

Leaving those changes except entering blank in A1 and ="" in A7 returns
0,0,c,#NUM! It no longer returns blank for =""

Leaving those changes except entering ="" in A1 returns 0,blank,c,#NUM!
It no longer returns 0 for the blank cell A7.

I didn't spend a lot of time trying to rationalize the results, but they
seemed idiosyncratic to me.

Alan Beban
 
Alan Beban said:
Enter 0 in A4 and delete contents (i.e., blank in this discussion) in A7
and it returns 0,0,c,d; one 0 for the blank and one for the 0.

Are you using the revised formula from my most resent response? With the
table

d 1
c 2
b 3
0 1
a 2
b 3
<b> 1
d 2
d 3
c 1
b 2

where <b> stands for blank, I get the 4-cell result

0
c
d
#NUM!

which is what I want.
Leaving those changes except entering ="" into A1 returns
0,blank,c,#NUM! It no longer returns a 0 for each of 0 and blank.

I'd argue these were the correct results.
Leaving those changes except entering blank in A1 and ="" in A7 returns
0,0,c,#NUM! It no longer returns blank for =""

So the table now looks like

<b> 1
c 2
b 3
0 1
a 2
b 3
' 1
d 2
d 3
c 1
b 2

? If so, my revised formula gives

0
c
#NUM!
#NUM!

which I'd argue were correct results.
Leaving those changes except entering ="" in A1 returns 0,blank,c,#NUM!
It no longer returns 0 for the blank cell A7.

Again, looks correct to me.
I didn't spend a lot of time trying to rationalize the results, but they
seemed idiosyncratic to me.

I'm now getting the results I meant to get. Only Stephen can answer whether
they're what he's looking for.
 
You had stated a single formula for all cells in the result range. You're
changing the rules now. I agree that your *TWO* new formulas are more efficient
than my *ONE* formula.


So, you're ignoring the last *ONE* formula that I gave, and my final
comments...

I just noticed another problem. COUNTIF not only fubars when there are truly
blank and quasiblank cells in a range, it also fubars when there are numeric
strings and any other strings beginning with ASCII characters that precede 0,
i.e., any strings that sort before numeric strings in lexical order. If X
evaluates to {"a";"#";"0";"c"}, then COUNTIF(X,"<"&X) evaluates to
{2;0;0;3}


In xl2k, this gives {1;0;0;2} so that has been fixed, and may explain why
you are finding a lack of robustness in my formulae.


=INDEX(TBL,MATCH(SMALL(IF(--IF(INDEX(TBL,0,2)=N,
MATCH(INDEX(TBL,0,1)&CHAR(127)&TRUE,TRANSPOSE(INDEX(TBL,0,1)&CHAR(127)
&(INDEX(TBL,0,2)=N)),0))>ROW(TBL)-CELL("Row",TBL),
MMULT(--(INDEX(TBL,0,1)>=TRANSPOSE(INDEX(TBL,0,1))),ROW(TBL)^0)),
ROW()-ROW(YY)+1),IF(--IF(INDEX(TBL,0,2)=N,MATCH(INDEX(TBL,0,1)&CHAR(127)
&TRUE,TRANSPOSE(INDEX(TBL,0,1)&CHAR(127)&(INDEX(TBL,0,2)=N)),0))>ROW(TBL)
-CELL("Row",TBL),MMULT(--(INDEX(TBL,0,1)>=TRANSPOSE(INDEX(TBL,0,1))),
ROW(TBL)^0)),0),1)


I had a funny feeling that you'd end up with MMULT in there, I'm just
This is
obviously a task for which an advanced filter is far more appropriate.


Agreed, and as I stated at the beginning, this started purely as a personal
challenge (based on repeated requests in this newsgroup), before I got
desperate for new ideas.
 
Hi both,

It's difficult to come down firmly on either side of this argument, I think
that would have to be a decision by whoever applies it (remember, this was
just a challenge, not a need). It would be nice to have the option for
formula-returned blanks to be within the results, and truly blank cells to
be ignored.

However, your final formula certainly meets the challenge as set, I still
have some hair left, and the offer is open for whenever you want to take it.
Maybe we can even rope in some of the other regulars (at their own expense
of course <VBG>).

Thanks and congrat's
Steve D.
 
Stephen Dunn said:
Hi both,

It's difficult to come down firmly on either side of this argument


I take that back, after testing it fully, Harlan's formula definitely works
in the way I would expect it to.

, I think
that would have to be a decision by whoever applies it (remember, this was
just a challenge, not a need). It would be nice to have the option for
formula-returned blanks to be within the results, and truly blank cells to
be ignored.


I realise now, that's exactly what it does.


<snip>
 
Stephen Dunn said:
Would you explain why:

MATCH(col1&CHAR(127)&TRUE,TRANSPOSE(col1&CHAR(127)&(col2=N)),0))

couldn't be reduced to:

MATCH(col1&CHAR(127)&N,col1&CHAR(127)&col2,0))

Probably could be. This is my SUM(--ISNUMBER(..)).
 
So, you're ignoring the last *ONE* formula that I gave, and my final
comments...
...

Yup, I missed it.
...>=INDEX(X,MATCH(SUM(COUNTIF(X,OFFSET(Y,0,0,SUM(ROW()-ROW(Y))))),
...> COUNTIF(X,"<"&X)+ISTEXT(X)*COUNT(X)-ISBLANK(X),0))

Comments. Given the data range (X) with <b> representing blank cells and ' zero
length strings

x
1
<b>
c
5
'
3
2
0
d
a
c
5
<b>
9

filling your formula into the 15 cells in the column adjacent to X (with the
topmost cell of this second range named Y) gives

0
1
2
3
5
9

#N/A
#N/A
#N/A
#N/A
#N/A
#N/A
#N/A
#N/A

under Excel 97 SR-2. This may be the nasty COUNTIF 'bug' in XL97. Changing the X
range to

x
1
'
c
5
'
3
2
0
d
a
c
5
<b>
9

gives the result range

0
1
2
3
5
9

c
d
x
#N/A
#N/A
#N/A
#N/A
#N/A

Any combination of truly and quasi- blank cells in X causes this formula to
fubar in Excel 97 SR-2. What are the results in Excel 2000/2002?
 
Unfortunately, these datasets remain problematic under xl2k also, so your
formula is the more effective. I'll have to wait until this evening to test
it under 2002, but I suspect that the results will be the same.

COUNTIF(A1,B1) it seems, is very inconsitent where blank cells of either
type are concerned.

COUNTIF(A1,TRIM(B1)) works because TRIM has a consistent return with a blank
or quasiblank, but

COUNTIF(A1,"<="&TRIM(B1)) falls over again, proving that COUNTIF is still
bugged.


Regards,
Steve D.
 
Back
Top