Counting and deleting

  • Thread starter Thread starter KevinUK
  • Start date Start date
K

KevinUK

I have a spreadsheet of over 40,000 lines for analysis. The first
column contains phrases from the British National Corpus. The other
columns contain various categorical descriptions of each phrase. This
is what I need to do (but don't know how!):

1. How to count the number of words in each phrase and write the
result to a new column.

2. How to count the number of letters in a single-word phrase and
write the result to a new column.

3. How to delete every line whose phrase contains more than one word.

4. How to delete every line whose phrase contains one word of less
than x letters.

5. How to delete lines whose phrases consist of one specific word
(e.g. 'it')

I would be very grateful for any guidance anyone can give. Many
thanks.

Kevin Glover
 
1. The trick here is to count the number of spaces and add one. This
formula will do so, say in C2:

=LEN(A2) - LEN(SUBSTITUTE(A2," ","")) + 1

assuming your string is in A2. You might like to put Num_words in C1
as a heading.

2. LEN returns the number of characters, so you could put this formula
in, say, D2:

=IF(C2=1,LEN(A2),"")

which only returns a value for single-word entries in column A. Put
Num_letters as the heading in D1.

3. A formula cannot delete a row. You can use a formula to mark a row
for deletion, and then delete all marked rows later in one operation
using a filter (or you could use a macro to do it for you). You could
put this formula in E2:

=IF(C2>1,"Delete","OK")

4. Put this formula in F2:

=IF(AND(D2<>"",D2<5),"Delete","ok")

I've assumed 5 letters for your "x".

5. Put this formula in G2:

=IF(AND(C2=1,A2="it"),"Delete","ok")

All these formulae can be copied down for as many phrases as you have.
Then you can apply autofilter (Data | Filter | Autofilter), and then
you can use the filter drop-down in column E and select Delete from
the list. Then highlight all the visible rows and click on Edit |
Delete and then select All in that filter. Do the same for columns F
and for G.

Hope this helps.

Pete
 
I have a spreadsheet of over 40,000 lines for analysis. The first
column contains phrases from the British National Corpus. The other
columns contain various categorical descriptions of each phrase. This
is what I need to do (but don't know how!):

1. How to count the number of words in each phrase and write the
result to a new column.

2. How to count the number of letters in a single-word phrase and
write the result to a new column.

3. How to delete every line whose phrase contains more than one word.

4. How to delete every line whose phrase contains one word of less
than x letters.

5. How to delete lines whose phrases consist of one specific word
(e.g. 'it')

I would be very grateful for any guidance anyone can give. Many
thanks.

Kevin Glover

I would be willing to try to provide a one click macro solution. A
very clear explanation and

"If desired, send your file to dguillett @gmail.com I will only look
if:
1. You send a copy of this message on an inserted sheet
2. You give me the newsgroup and the subject line
3. You send a clear explanation of what you want
4. You send before/after examples and expected results."
 
Pete, I have a sneaking suspicion that you just did Kevin's homework
assignment for him (I could be wrong about this though).

Rick Rothstein (MVP - Excel)




"Pete_UK" wrote in message

1. The trick here is to count the number of spaces and add one. This
formula will do so, say in C2:

=LEN(A2) - LEN(SUBSTITUTE(A2," ","")) + 1

assuming your string is in A2. You might like to put Num_words in C1
as a heading.

2. LEN returns the number of characters, so you could put this formula
in, say, D2:

=IF(C2=1,LEN(A2),"")

which only returns a value for single-word entries in column A. Put
Num_letters as the heading in D1.

3. A formula cannot delete a row. You can use a formula to mark a row
for deletion, and then delete all marked rows later in one operation
using a filter (or you could use a macro to do it for you). You could
put this formula in E2:

=IF(C2>1,"Delete","OK")

4. Put this formula in F2:

=IF(AND(D2<>"",D2<5),"Delete","ok")

I've assumed 5 letters for your "x".

5. Put this formula in G2:

=IF(AND(C2=1,A2="it"),"Delete","ok")

All these formulae can be copied down for as many phrases as you have.
Then you can apply autofilter (Data | Filter | Autofilter), and then
you can use the filter drop-down in column E and select Delete from
the list. Then highlight all the visible rows and click on Edit |
Delete and then select All in that filter. Do the same for columns F
and for G.

Hope this helps.

Pete
 
Pete,

You are a star. Thanks so much for the guidance. I was amused by the
suggestion that you might have done my homework. At age 60, it's been
a while since I did homework, so I'm afraid I won't be getting that A!
Thanks again.

Kevin
 
You are welcome, Kevin - thanks for feeding back.

When you said you had 40,000 records I thought that would be a bit
much for a homework assignment, so I'm glad I was able to help you in
your task.

Pete
 
Back
Top