Question for Mr. T Valko

  • Thread starter Thread starter Demosthenes
  • Start date Start date
D

Demosthenes

Hi Mr. Valko,

You answered my question earlier this week on an Excel '07 discussion
board, and your answer was by far the best and most elegant. Mine was
the question about counting cells in a series, and you answered with a
sum over a frequency. If you don't mind, I have another question
that's based off an elaboration of my first one. Say you have the
following data:

A C
B D
B D
B E
A C
B E
A E
B D
B E
A C
B D

And you want to count how many times an "A C" is followed by a "B D,"
and how many (these could be 2 columns of data or one). So, output for
the above would be:

1: 1
2: 1

I think the way to do it would be to sum over a frequency, but I can't
figure out how to set the bin array the way I need it. The bins would
have to start at "A C" and reset after any "A <>C"

Thoughts? Maybe a way to do it that way or a better way entirely?

Thanks for your help,
 
Yes, I saw that post.
A C
B D
B D
B E

Is this data in one or two columns?

It will be *extremely* complicated to do this with a single formula if it
can be done at all.
 
The data is in two columns, but would be easy enough to concatenate it.

I spent quite a while trying to figure this out, without any success.

Thanks,
 
With your data in cols A & B, in C1 enter 0 and in C2 enter:

=IF(AND(A2="B",B2="D",A1="A",B1="C"),1+MAX($C$1:C1),0)

and elsewhere:

=MAX(C:C)

For example:

A C 0 2
B D 1
B D 0
B E 0
A C 0
B E 0
A E 0
B D 0
B E 0
A C 0
B D 2


So A C is followed by B D two times.
 
Gary''s Student said:
With your data in cols A & B, in C1 enter 0 and in C2 enter:

=IF(AND(A2="B",B2="D",A1="A",B1="C"),1+MAX($C$1:C1),0)

and elsewhere:

=MAX(C:C)

For example:

A C 0 2
B D 1
B D 0
B E 0
A C 0
B E 0
A E 0
B D 0
B E 0
A C 0
B D 2


So A C is followed by B D two times.


=SUMPRODUCT((A1:A10="A")*(B1:B10="C")*(A2:A11="B")*(B2:B11="D"))
 
I couldn't come up with a single formula that would do this so I guess
you'll need to use helper column *and* concatenate your data (unless someone
else can do it without the use of a helper column).

Let's assume the concatenated data is in the range A2:A12

Leave B2 empty

Enter this formula in B3 and copy down to B12:

=IF(OR(AND(A2="ac",A3="bd"),AND(A3=A2,B2<>"")),1,"")

That'll identify the cells to count.

D2:Dn = 1,2,3,4,5 etc

Enter this array formula** in E2 and copy down as needed:

=SUM(--(FREQUENCY(IF(B$2:B$12=1,ROW(B$2:B$12)),IF(B$2:B$12<>1,ROW(B$2:B$12)))=D2))

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER). Hold down both the CTRL key and the SHIFT
key then hit ENTER.
 
I couldn't come up with a single formula

i haven't time to test it thoroughly but maybe with a similar setup...
try this in E2 with CTRL+SHIFT+ENTER and fill down:

=COUNT(1/(A$2:A$12="AC")/(FREQUENCY(IF(A$2:A$12="BD",
-ROW(A$2:A$12)),IF(A$2:A$12<>"BD",-ROW(A$2:A$12),0))=D2))
 
Seems to work. Nice one!

Thanks - a "trick" that occured to me is to use negative values.
This simplifies things as it puts the frequencies in the relevant
rows, and it shifts the zero count past the last row which means
you don't need the IF statements. This should allow for non-array
formulas but i haven't found a simple way to do it.

Also, with unconcatenated data in B2:C12 you could try {array-entered}:

=COUNT(1/(B$2:B$12="A")/(C$2:C$12="C")/(FREQUENCY(-ROW(B$2:B$12),
-(1-(B$2:B$12="B")*(C$2:C$12="D"))*ROW(B$2:B$12))=D2+1))
 
Your formula on unconcatenated string does not seem to work correctly.

Worked ok for me, I suggest you try it again. You can change
(B$3:B$13="A")/(C$3:C$13="C") to MMULT((B$3:B$13="A")/(C$3:C$13="C"),1)
to make it a normal formula with out array-entry.
formula solutions are about 20 times slower than my VBA code.

Formula solutions are always preferred in the worksheet functions group,
many people don't want the hassle of enabling macros and the files are easier
to distribute. If you want to do these sort of operations efficiently and
much faster try
downloading cygwin and write a one line awk command.
 
Oops, i think there was a typo: Change (B$3:B$13="A")/(C$3:C$13="C") to
MMULT((B$3:B$13="A")*(C$3:C$13="C"),1) for non-array entry.
 
Hello Lori,

Apologies, your solution works correctly.

Its a nice worksheet formula solution. But I think its suboptimal with
regards to runtime. If you have several thousand rows of data you
would certainly prefer a VBA approach.

If you have to expect up to 4 repetitions of the second string your
formula is slower than my VBA code for even less than 100 rows of
data.

Regards,
Bernd
 
Back
Top