How do you remove all the symbols?

E

elaine

Hello

I have 2 problems here:


1. How do you remove all the symbols from a column?


Like, all of these: !"£$%^&*()_+-={}[]:mad:~;'#<>?,./¬`|\ without
deleting them one by one from a cell?


eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"


Get only "5.99 pounds" from "£5.99 pounds"


2. How do you remove all the numbers from a cell?


eg. I only want to get the street names from a list of address:


Get only " oxford street" from "123 oxford street"
Get only "abbey road" from "2 abbey road"


Please help.
 
G

Guest

For the second problem try:

Sub de_number()
For Each r In Selection
If Application.WorksheetFunction.IsText(r.Value) Then
s = r.Value
For i = 49 To 57
s = Replace(s, Chr(i), "")
Next i
r.Value = s
End If
Next
End Sub

This will the numerical characters from cells containing a mixture of text
and numbers. It will not clear cells that are purely numeric. You can use a
variation of this to remove symbols as well.
 
P

Peter T

One way (adapted from a suggestion by Nate Oliver) -

Function RepChars(strIn As String, _
bMisc As Boolean, _
bNos As Boolean)
Dim i As Long
Dim bArr() As Byte
Dim sTmp As String
On Error GoTo errH

bArr = StrConv(strIn, vbFromUnicode)
For i = LBound(bArr) To UBound(bArr)
If bMisc Then
Select Case bArr(i)

Case 33 To 47 ' !#$%&'()*+,-./
bArr(i) = 35

Case 58 To 64 ' :;<=>?@

bArr(i) = 35
Case 91 To 96, 123 To 126, 163, 172
' [\]^_`{|}~£¬
bArr(i) = 35
End Select
End If
If bNos Then
Select Case bArr(i)
Case 48 To 57 ' 0-9
bArr(i) = 35
End Select
End If
Next
sTmp = StrConv(bArr, vbUnicode)

RepChars = Replace(sTmp, "#", "")
Exit Function
errH:

RepChars = "error"
End Function

Sub test()
Dim s As String

s = " !£h$%i^&d*(d)_e+n- ={m}[e]:mad:s~;s'#<a>?g,./¬`|e\ "
s = s & vbCr & "123 oxford street"

MsgBox s & vbCr & vbCr & _
RepChars(s, True, True)
End Sub

Can use the function as a UDF in cells

Regards,
Peter T


Hello

I have 2 problems here:


1. How do you remove all the symbols from a column?


Like, all of these: !"£$%^&*()_+-={}[]:mad:~;'#<>?,./¬`|\ without
deleting them one by one from a cell?


eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"


Get only "5.99 pounds" from "£5.99 pounds"


2. How do you remove all the numbers from a cell?


eg. I only want to get the street names from a list of address:


Get only " oxford street" from "123 oxford street"
Get only "abbey road" from "2 abbey road"


Please help.
 
E

elaine

Wow! It works Very well!! although I know nothing about Macros, VBA,
but it works!!!
I suppose the 49 to 57 are ASCII code (I remember something from uni!).

What if I want to delete all the numbers apart from 5, what should i
change? (as 5 = 53 in ascii)

i = 48 To 52

and i = 54 To 57???
 
P

Peter T

What if I want to delete all the numbers apart from 5, what should i

If you always want to keep 5's

change
'Case 48 To 57 ' 0-9
Case 48 To 52, 54 To 57 ' 0-4, 6-9

If normally you want to remove 5's but have an option to keep them, add an
additional optional argument to the function

Function RepChars(strIn As String, _
bMisc As Boolean, _
bNos As Boolean, _
Optional bKeep5 As Boolean)

'code

If bNos Then
Select Case bArr(i)
'Case 48 To 57 ' 0-9
If bKeep5 And bArr(i) = 53 Then
'do nothing
Else
bArr(i) = 35
End If
End Select
End If

When you want to keep 5's include the third argument True when calling the
function.

If you want to see a list of the character codes in your sheet -

fill the numbers 32 to 255 starting in A1 down the column. In adjacent B1
=CHAR(A1) (returns a space)

fill down to B224

If you have a quick read about Select Case in Help you will probably be able
to adapt the function as needs change.

Regards,
Peter T
 
P

Peter T

Typo
If bNos Then
Select Case bArr(i)
'Case 48 To 57 ' 0-9
If bKeep5 And bArr(i) = 53 Then
'do nothing
Else
bArr(i) = 35
End If
End Select
End If

remove the comment ' before Case 48 To 57 ' 0-9

Peter T
 
E

elaine

hi pete,
the first code you posted wasnt working... and I dont really understand
how it works.. and after i run the code, a message box pops up with all
the symbols and 123 oxford st, with hidden message oxford st. I guess
its just show that cos its in the code... but it didnt delete any of my
sumbols or numbers in the column...
 
P

Peter T

The fact the msgbox shows "hidden message oxford st" shows the function is
working.

The function as it stands does not change the original string, it returns a
new changed string. I wrote that way so you can use it as a UDF, eg formula
in a cell, say B1

=RepChars(A1)

If you are only using macros and want to change your original string you can
either work with the original function like this -

Sub Test2()
Dim sOld As String
Dim sNew As String

sOld = "!#£$abc!£$ def)(&*-"
sNew = RepChars(sOld, True, True)
sOld = sNew

MsgBox sOld

End Sub

Or you can dispense with sOld and simply

sOld= RepChars(sOld, True, True)

A different approach is to amend the function to change the original string
within the function. Start by copying the entire function and give it a new
name, eg RepString. Remove "As String" at the end of the function line. Then
change the following two lines

' sTmp = StrConv(bArr, vbUnicode)
' RepString= Replace(sTmp, "#", "")

strIn = StrConv(bArr, vbUnicode)
strIn = Replace(strIn, "#", "")

Also comment or delete the line after the error handler,
' RepString= "error"
While testing also comment the line
On Error Goto errH

To call the function and directly change your string try something like

Sub Test3()
Dim s As String
s = "!#£$abc!£$ def)(&*-"

RepString s, True, True

MsgBox s

End Sub


But remember you can't use this version as a UDF in worksheet formulas.

Hope you understood the purpose of the arguments True, True in the example.
This first instructs remove all the misc characters, the second True/False
to remove the numbers. You would want to pass at least one True or the
function will do nothing.

Regards,
Peter


If you want to change the original string to the new string there are
different ways
 
P

Peter T

but it didnt delete any of my
sumbols or numbers in the column...

I overlooked what you said about 'column', so you want a macro to process
all cells in your selection, eg a column.

Try this with the original function

Sub RepCellChar()
Dim rng As Range
Dim cel As Range
Dim s As String

On Error GoTo errH
Set rng = Selection

If MsgBox(rng.Address & vbCr & _
"Replace misc characters and 0-9", vbOKCancel) <> vbOK Then
Exit Sub
End If
Set rng = Intersect(ActiveSheet.UsedRange, rng)

For Each cel In rng
s = CStr(cel.Value)
If Len(s) Then
cel.Value = RepChars(s, True, True)
End If
Next

Exit Sub
errH:
MsgBox "an error occured"

End Sub

Adapt the two boolean arguments to remove misc charactors and/or numbers.
Adapt further to keep 5's along the lines I described earlier.

Regards,
Peter T
 
G

Guest

Hi elaine,

Here's a function you can try. It takes a different approach in that it
filters for valid characters only, excluding all others. By default it
includes upper/lower case alpha characters and numbers from 0 to 9. It also
provides that you can optionally include additional valid characters that you
want to keep, such as the space character, the hyphen, and so on. The
function can be used as a UDF. (modify to suit)

Function FilterString2(ByVal text As String, Optional ValidChars As String)
As String
' Filters out all unwanted characters in a string.
' Arguments: Text The string being filtered
' ValidChars [Optional] Any additional characters to keep
' Returns: String containing only the valid characters.

Const sSource As String = "FilterString2()"

Dim i As Long
Dim sResult As String, sAlphaChrs As String, sNumbers As String

'The basic characters to keep
sAlphaChrs = "abcdefghijklmnopqrstuvwxyz"
sNumbers = "0123456789"

ValidChars = ValidChars & sAlphaChrs & UCase(sAlphaChrs) & sNumbers
For i = 1 To Len(text)
If InStr(ValidChars, Mid$(text, i, 1)) Then sResult = sResult &
Mid$(text, i, 1)
Next
FilterString2 = sResult

End Function

HTH
Regards,
Garry
 
R

Ron Rosenfeld

Hello

I have 2 problems here:


1. How do you remove all the symbols from a column?


Like, all of these: !"£$%^&*()_+-={}[]:mad:~;'#<>?,./¬`|\ without
deleting them one by one from a cell?


eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"


Get only "5.99 pounds" from "£5.99 pounds"


2. How do you remove all the numbers from a cell?


eg. I only want to get the street names from a list of address:


Get only " oxford street" from "123 oxford street"
Get only "abbey road" from "2 abbey road"


Please help.

You can do this by using regular expressions.

But you need to be accurate in setting them up.

For example, in your list of symbols to be removed, above, you include the dot
'.' However, in your second example, it seems you wish to retain the dot
(5.99).

Peter's solution will remove the dot and give you "599 pounds" as a result.

To use Regular Expressions, the simplest method is to download and install
Longre's free morefunc.xll add-in from http://xcell05.free.fr/ This will only
work if your strings are less than 256 characters in length.

(If they are longer, the pattern will be the same, but you'll have to use VBA.)

You can then use the following **array** formula:

=MCONCAT(REGEX.MID(A1,Pattern,ROW(INDIRECT("1:"&REGEX.COUNT(A1,Pattern)))))

To keep all the letters, digits, spaces and any "dots" that are followed by a
number, then Pattern = (\w+|\s|\.\d)

(You can use a named cell or a Name for Pattern. If not in a cell, enclose it
in quotation marks "(\w+|\s|\.\d)"

To eliminate the symbols and numbers (i.e. keep only letters and spaces), then
Pattern = [A-Za-z ]+
(note the <space> within Pattern)


If you want to keep all the letters and spaces and also the number "5", then
Pattern = [A-Za-z5 ]+


This addin can be distributed with your workbook, if that is an issue.


--ron
 
R

Ron Rosenfeld

Hello

I have 2 problems here:


1. How do you remove all the symbols from a column?


Like, all of these: !"£$%^&*()_+-={}[]:mad:~;'#<>?,./¬`|\ without
deleting them one by one from a cell?


eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"


Get only "5.99 pounds" from "£5.99 pounds"


2. How do you remove all the numbers from a cell?


eg. I only want to get the street names from a list of address:


Get only " oxford street" from "123 oxford street"
Get only "abbey road" from "2 abbey road"


Please help.

You can do this by using regular expressions.

But you need to be accurate in setting them up.

For example, in your list of symbols to be removed, above, you include the dot
'.' However, in your second example, it seems you wish to retain the dot
(5.99).

Peter's solution will remove the dot and give you "599 pounds" as a result.

To use Regular Expressions, the simplest method is to download and install
Longre's free morefunc.xll add-in from http://xcell05.free.fr/ This will only
work if your strings are less than 256 characters in length.

(If they are longer, the pattern will be the same, but you'll have to use VBA.)

You can then use the following **array** formula:

=MCONCAT(REGEX.MID(A1,Pattern,ROW(INDIRECT("1:"&REGEX.COUNT(A1,Pattern)))))

To keep all the letters, digits, spaces and any "dots" that are followed by a
number, then Pattern = (\w+|\s|\.\d)

(You can use a named cell or a Name for Pattern. If not in a cell, enclose it
in quotation marks "(\w+|\s|\.\d)"

To eliminate the symbols and numbers (i.e. keep only letters and spaces), then
Pattern = [A-Za-z ]+
(note the <space> within Pattern)


If you want to keep all the letters and spaces and also the number "5", then
Pattern = [A-Za-z5 ]+


This addin can be distributed with your workbook, if that is an issue.


--ron

To use the above within VBA, so as to process a range, you could also use the
following macro (again after installing the morefunc add-in):

==========================================
Sub RemSymbols()
Dim c As Range, rg As Range
Dim t As String
Dim Pattern As String
Dim i As Long

'The following patterns may be useful
' "[A-Za-z5 ]+" letters, spaces & the number 5
' "(\w+|\s|\.\d)" letters, numbers, spaces, dot if _
followed by a number
' "[A-Za-z ]+" letters and spaces

Pattern = "[A-Za-z5 ]+"

Set rg = Selection
' you may wish to code the selection differently

For Each c In rg
t = ""
For i = 1 To Run([regex.count], c.Text, Pattern)
t = t & Run([regex.mid], c.Text, Pattern, i)
Next i
c.Value = t
Next c

End Sub
======================================


--ron
 
P

Peter T

Peter's solution will remove the dot and give you "599 pounds" as a
result.

Can easily be adapted to retain a dot in front a number or remove otherwise
(should the OP require)

Case 46
If i < UBound(bArr) Then
If bArr(i + 1) < 48 Or bArr(i) > 57 Then
bArr(i) = 35
End If
Else
bArr(i) = 35
End If

If incorporated in the original function I posted would need to change the
first test from

Case 33 To 47 ' !#$%&'()*+,-./
to
Case 33 To 45, 47 ' !#$%&'()*+,-/

also probably want to remove the dot if also removing numbers (not shown in
this eg)

RegExp is indeed a powerful tool. But for many things I find the byte array
method works just as well, and once understood doesn't require much in the
way of learning as does the RegExp syntax (for me that's relevant <g>).
Execution speed is on a par with RegExp, if anything faster for short to
medium length strings.

However setting the object ref to RegExp is very slow if done in repeated
calls to a function which makes the byte array significantly faster (unless
of course the object ref is static or at global level). If Laurant Longre's
function is C* then that wouldn't be an issue I suppose.

Regards,
Peter T


Ron Rosenfeld said:
Hello

I have 2 problems here:


1. How do you remove all the symbols from a column?


Like, all of these: !"£$%^&*()_+-={}[]:mad:~;'#<>?,./¬`|\ without
deleting them one by one from a cell?


eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"


Get only "5.99 pounds" from "£5.99 pounds"


2. How do you remove all the numbers from a cell?


eg. I only want to get the street names from a list of address:


Get only " oxford street" from "123 oxford street"
Get only "abbey road" from "2 abbey road"


Please help.

You can do this by using regular expressions.

But you need to be accurate in setting them up.

For example, in your list of symbols to be removed, above, you include the dot
'.' However, in your second example, it seems you wish to retain the dot
(5.99).

Peter's solution will remove the dot and give you "599 pounds" as a result.

To use Regular Expressions, the simplest method is to download and install
Longre's free morefunc.xll add-in from http://xcell05.free.fr/ This will only
work if your strings are less than 256 characters in length.

(If they are longer, the pattern will be the same, but you'll have to use VBA.)

You can then use the following **array** formula:

=MCONCAT(REGEX.MID(A1,Pattern,ROW(INDIRECT("1:"&REGEX.COUNT(A1,Pattern)))))

To keep all the letters, digits, spaces and any "dots" that are followed by a
number, then Pattern = (\w+|\s|\.\d)

(You can use a named cell or a Name for Pattern. If not in a cell, enclose it
in quotation marks "(\w+|\s|\.\d)"

To eliminate the symbols and numbers (i.e. keep only letters and spaces), then
Pattern = [A-Za-z ]+
(note the <space> within Pattern)


If you want to keep all the letters and spaces and also the number "5", then
Pattern = [A-Za-z5 ]+


This addin can be distributed with your workbook, if that is an issue.


--ron
 
R

Ron Rosenfeld

RegExp is indeed a powerful tool. But for many things I find the byte array
method works just as well, and once understood doesn't require much in the
way of learning as does the RegExp syntax (for me that's relevant <g>).
Execution speed is on a par with RegExp, if anything faster for short to
medium length strings.

However setting the object ref to RegExp is very slow if done in repeated
calls to a function which makes the byte array significantly faster (unless
of course the object ref is static or at global level). If Laurant Longre's
function is C* then that wouldn't be an issue I suppose.

Regards,
Peter T

Longre's add-in is an XLL. And it's pretty fast although I've not compared the
speed. The 255 byte limitation can be a PITA, but I'm told it's a limitation
of the data type XLOPER and there is no workaround.

If I were dealing with longer strings, I would advise setting a reference to
Microsoft VBScript Regular Expressions (Tools/References) although one can set
it with each function call.

I have been finding that learning regular expression syntax to be both
interesting and rewarding, and more flexible than other methods of parsing out
data.


--ron
 
N

Nate Oliver

Hi Peter,

Peter said:
RegExp is indeed a powerful tool. But for many things I find the byte array
method works just as well, and once understood doesn't require much in the
way of learning as does the RegExp syntax (for me that's relevant <g>).
Execution speed is on a par with RegExp, if anything faster for short to
medium length strings.

Also, there is some overhead with StrConv(), it's not the fastest
function in the World, so you can (in all likelihood) pick up some
speed by dumping it. The resulting array isn't quite as neat, but the
results are consistent and manageable, e.g.,

http://www.mrexcel.com/board2/viewtopic.php?p=901966#901966

And, I've done a little bit of time-testing here, if interested:

http://www.xtremevbtalk.com/showpost.php?p=1069150

It seems to me that you get a pretty nice gain on short strings by
dumping StrConv(), and I wouldn't think it would cost you on long
strings, either, even though the array is larger than what you might
really want. But, your best bet would be to compare the two before
drawing any conclusions, which I haven't.

The bonus to the Byte Array approach is that sure, it's probably
faster, but, it's 100% native, i.e., it ships by itself. Well, I
consider that a bonus...

Cheers,
Nate Oliver
 
P

Peter T

Well hello again Nate

Guess you noticed I blamed this method on you, er I mean credited!

I never saw the post you referred to but I recall well the thread in which
you introduced it

http://tinyurl.com/mxp2l

On my own I had done some testing chucking strConv. Like you I find

Dim bArr() as Byte
bArr = strIn
strOut = bArr
vs
bArr = StrConv(strIn, vbFromUnicode)
strOut = StrConv(bArr, vbUnicode)

is say 4 times faster, varies slightly with string length but not by much

I notice you ReDim the bArr() to 2 x string length, that works but doesn't
seem necessary.

However I don't regard StrConv as a significant overhead. Even in a long
loop I find with doing all the other stuff the overall difference is
trivial.

But - there's a nasty catch if avoiding strConv when it comes to processing
char's in the range chr(128 - 159), possibly others in non standard
char-sets, look at the dump in this range -

Sub test2()
Dim strIn As String, strOut As String, s As String
Dim bArr() As Byte
Dim i As Long, rw As Long

For i = 32 To 255
strIn = strIn & Chr(i)
Next

bArr = strIn
strOut = bArr

For i = 1 To Len(strIn)
s = Mid(strIn, i, 1)
Cells(i, 1) = Asc(s)
Cells(i, 2) = s
Next

rw = 1
For i = LBound(bArr) To UBound(bArr) Step 2
Cells(rw, 3) = bArr(i)
Cells(rw, 4) = bArr(i + 1)
rw = rw + 1
Next

End Sub

IOW, if changing those bytes in a Step 2 loop the return will be wrong.

You successfully sold me the Byte array + StrConv combination and not sure
I'm ready to cash in the StrConv!

It is worth noting that Help says the strConv method will not work in Mac.

Regards,
Peter T
 
N

Nate Oliver

Hello again Peter,

I did catch my name! <grin>

Okay, I'm going to snip a bunch of quotes in order to respond to this
properly, if I take the odd one out of context, I apologize, because I
see this as an interesting discussion versus a debate, really.
is say 4 times faster, varies slightly with string length but not by much

Good, this where we start, there's a reason to explore this
opportunity, eh.
I notice you ReDim the bArr() to 2 x string length, that works but doesn't
seem necessary.

It's absolutely necessary if you're going to go straight from Unicode,
to ASCII, to Byte Array, which you can do if you dump StrConv(). And
remember that 4x! See the following:

http://vb.mvps.org/hardcore/html/whatisunicode.htm
However I don't regard StrConv as a significant overhead. Even in a long
loop I find with doing all the other stuff the overall difference is
trivial.

I thought we just said 4x? It's significant in terms of % performance,
isn't 4x significant? 4x can't be trivial, can it? I'd have to
argue that this is the big gainer, last time around we were diddling
around with inline vs. modulated... Small potatoes, that actually
convoluted the algorithm, with all due respect... Don't diddle, make
the algorithm better, and this is a very good start.

Again, this is to make the gain on smaller strings, on longer strings
you won't see it as much.... And, we should use a timer on varying
strings, in terms of length, to be sure, we got into some trouble in
our last thread on this subject with conjecture...
But - there's a nasty catch if avoiding strConv when it comes to processing
char's in the range chr(128 - 159), possibly others in non standard
char-sets, look at the dump in this range -
IOW, if changing those bytes in a Step 2 loop the return will be wrong.

It looks like it's wrong to me, either way... Why did you add one to
the element in the loop? Check out the dump in this procedure:

Sub foo()
Dim strIn As String, strOut As String, s As String
Dim bArr() As Byte
Dim i As Long, rw As Long

Application.ScreenUpdating = True

For i = 32 To 255
strIn = strIn & ChrW$(i)
Next


Let bArr = strIn
Let strOut = bArr


For i = 1 To Len(strIn)
Let s = Mid$(strIn, i, 1)
Let Cells(i, 1) = Asc(s)
Let Cells(i, 2) = s
Next


Let rw = 1
For i = LBound(bArr) To UBound(bArr) Step 2
Let Cells(rw, 3) = bArr(i)
Let Cells(rw, 4) = ChrW$(bArr(i))
Let rw = rw + 1
Next
Application.ScreenUpdating = True

End Sub

And this one:

Sub bar()
Dim strIn As String, strOut As String, s As String
Dim bArr() As Byte
Dim i As Long, rw As Long

Application.ScreenUpdating = True

For i = 32 To 255
strIn = strIn & ChrW$(i)
Next


Let bArr = StrConv(strIn, vbFromUnicode)

Let strOut = bArr


For i = 1 To Len(strIn)
Let s = Mid$(strIn, i, 1)
Let Cells(i, 1) = Asc(s)
Let Cells(i, 2) = s
Next


Let rw = 1
For i = LBound(bArr) To UBound(bArr)
Let Cells(rw, 3) = bArr(i)
Let Cells(rw, 4) = ChrW$(bArr(i))
Let rw = rw + 1
Next
Application.ScreenUpdating = True

End Sub


It seems that you were working with the wrong element, start with 0 and
go every other element. Or am I missing something? Either way, it
doesn't look like StrConv() is going to bail you out on those
Chars...

By the way, and this is part of the point, these days I have a serious
distaste for anything resembling the following:
For i = 32 To 255
strIn = strIn & ChrW$(i)
Next

I replicated it for the sake of addressing the question, but this is
slow!! It's String Ops like this that make VB[A] appear to be very
slow, once you get to ~40 concatenations this begins to degrade in a
way where you should consider doing something else. See the following:

http://msdn.microsoft.com/library/en-us/dngenlib/html/heap3.asp

"Slowdown as a result of frequent allocs and reallocs. This is a very
common phenomenon when you use scripting languages. The strings are
repeatedly allocated, grown with reallocation, and freed up. Don't do
this. Try to allocate large strings, if possible, and use the buffer.
An alternative is to minimize concatenation operations."

Note the 'Don't do this' part. Much like using ReDim Preserve or Union
in a loop, concatenating in a haphazard fashion is pretty hard on your
resources, more important on longer procedures... If the point is to
build one seriously fast algorithm, concatenating in a Loop need not
apply. This is why we use an array, with a buffer, except even better
in this case, we know the size of the array, Chars * 2.
You successfully sold me the Byte array + StrConv combination and not sure
I'm ready to cash in the StrConv!

Get ready!! I'm not selling anything, mind you. Never was, actually...
Please go about this as you wish, I'm simply sharing my experiences
with Excel and VB[A]...

StrConv() is handy, expensive, but handy, no doubt about it. And like
in the original thread you just mentioned, I used it as a starting
point, it's easier to work with an array that isn't *2. But, if the
goal is to write something with some serious chops, then dump it,
because you can, and you'll be rewarded!

Regards,
Nate Oliver
 
P

Peter T

Hi Nate,

Thanks for your detailed follow-up. My ReDim comment was misplaced but I'm
not (yet) with you on those 'problem' char's.

1.
It's absolutely necessary if you're going to go straight from Unicode,
to ASCII, to Byte Array, which you can do if you dump StrConv(). And
remember that 4x! See the following:

I think I got confused with your timer test to make a random string, in
which clearly do need to ReDim the array first. When I first played with
this I tried

ReDim bArr2(0 To Len(strIn) * 2 - 1)
bArr2 = strIn

Though that works it seems unnecessary and faster to omit the ReDim. I
assume you agree with that, you did same in the other thread you referred
to. Or am I missing something?


2. Re my comment 4 x faster to avoid strConv yet trivial overall.
I thought we just said 4x? It's significant in terms of % performance,
isn't 4x significant? 4x can't be trivial,

When I initially did that test of course the result jumped out. But when I
put it in context with a full function overall speed was at most a few %
faster and dropping off with longer strings. Testing in different ways could
of course give very different results.

Despite impressions I may have given to the contrary I do take performance
seriously if going to impact on the user, even if merely perceptibly!


3. Re my dump arrays to cells Test2(), my previous post
Why did you add one to the element in the loop?

I didn't mean to and not sure where I have. The loop goes from lBound to
Ubound Step 2 dumping pairs of bytes in each row. I did increment the row
counter by 1 in the loop to cater for the Step 2. I think it correctly
extracts the array contents. Quite possible I can't see for looking!


4. re problem char's 128-159
Either way, it
doesn't look like StrConv() is going to bail you out on those
Chars...

But I think it does. I notice in the your Subs foo & bar you use ChrW vs
Chr. These are very different particularly in the 128 - 159 range.

Debug.Print Asc(Chr(128)), Asc(ChrW(128)) ' 128 63

I think Chr correctly places those char's into a test string. StrConv then
puts the correct numbers into the array and again correctly converts to the
string on the way out.

This highly contrived example appears to work correctly only if using
StrConv in/out, also not using strConv can accidentally change char's -

Sub TestProblemChars()
Dim bArr1() As Byte, bArr2() As Byte
Dim i As Long
Dim strIn$, strOut1$, strOut2$, sCorrect$

strIn = "€‡!‹9" ' 128 135 33 139 57

'change Euro(128) to E(69), !(33) to B(66), 9(57) to X(88)
'retain 135 and 139

sCorrect = "E‡B‹X" ' 69 135 66 139 88

bArr1 = StrConv(strIn, vbFromUnicode)
For i = LBound(bArr1) To UBound(bArr1)
Select Case bArr1(i)
Case 128
bArr1(i) = 69
Case 33
bArr1(i) = 66
Case 57
bArr1(i) = 88
End Select
Next
strOut1 = StrConv(bArr1, vbUnicode)

bArr2 = strIn
For i = LBound(bArr2) To UBound(bArr2) Step 2
Select Case bArr2(i)
Case 128
bArr2(i) = 69
Case 33
bArr2(i) = 66
Case 57
bArr2(i) = 88
End Select
Next
strOut2 = bArr2

For i = 1 To Len(sCorrect)
Debug.Print Asc(Mid$(sCorrect, i, 1)), _
Asc(Mid$(strOut1, i, 1)), _
Asc(Mid$(strOut2, i, 1))
Next
' 69 69 128
' 135 135 63
' 66 66 66
' 139 139 63
' 88 88 88
'strOut1 all correct, strOut2 3/5 wrong
End Sub


5.
By the way, and this is part of the point, these days I have a serious
distaste for anything resembling the following:

Sure, I only did that to make a test string and avoid typing all those
char's in the post, but with Chr not ChrW.

Get ready!! I'm not selling anything, mind you. Never was, actually...
Please go about this as you wish, I'm simply sharing my experiences
with Excel and VB[A]...

And much appreciated !

Regards,
Peter T
pmbthornton gmail com


Nate Oliver said:
Hello again Peter,

I did catch my name! <grin>

Okay, I'm going to snip a bunch of quotes in order to respond to this
properly, if I take the odd one out of context, I apologize, because I
see this as an interesting discussion versus a debate, really.
is say 4 times faster, varies slightly with string length but not by
much

Good, this where we start, there's a reason to explore this
opportunity, eh.
I notice you ReDim the bArr() to 2 x string length, that works but doesn't
seem necessary.

It's absolutely necessary if you're going to go straight from Unicode,
to ASCII, to Byte Array, which you can do if you dump StrConv(). And
remember that 4x! See the following:

http://vb.mvps.org/hardcore/html/whatisunicode.htm
However I don't regard StrConv as a significant overhead. Even in a long
loop I find with doing all the other stuff the overall difference is
trivial.

I thought we just said 4x? It's significant in terms of % performance,
isn't 4x significant? 4x can't be trivial, can it? I'd have to
argue that this is the big gainer, last time around we were diddling
around with inline vs. modulated... Small potatoes, that actually
convoluted the algorithm, with all due respect... Don't diddle, make
the algorithm better, and this is a very good start.

Again, this is to make the gain on smaller strings, on longer strings
you won't see it as much.... And, we should use a timer on varying
strings, in terms of length, to be sure, we got into some trouble in
our last thread on this subject with conjecture...
But - there's a nasty catch if avoiding strConv when it comes to processing
char's in the range chr(128 - 159), possibly others in non standard
char-sets, look at the dump in this range -
IOW, if changing those bytes in a Step 2 loop the return will be wrong.

It looks like it's wrong to me, either way... Why did you add one to
the element in the loop? Check out the dump in this procedure:

Sub foo()
Dim strIn As String, strOut As String, s As String
Dim bArr() As Byte
Dim i As Long, rw As Long

Application.ScreenUpdating = True

For i = 32 To 255
strIn = strIn & ChrW$(i)
Next


Let bArr = strIn
Let strOut = bArr


For i = 1 To Len(strIn)
Let s = Mid$(strIn, i, 1)
Let Cells(i, 1) = Asc(s)
Let Cells(i, 2) = s
Next


Let rw = 1
For i = LBound(bArr) To UBound(bArr) Step 2
Let Cells(rw, 3) = bArr(i)
Let Cells(rw, 4) = ChrW$(bArr(i))
Let rw = rw + 1
Next
Application.ScreenUpdating = True

End Sub

And this one:

Sub bar()
Dim strIn As String, strOut As String, s As String
Dim bArr() As Byte
Dim i As Long, rw As Long

Application.ScreenUpdating = True

For i = 32 To 255
strIn = strIn & ChrW$(i)
Next


Let bArr = StrConv(strIn, vbFromUnicode)

Let strOut = bArr


For i = 1 To Len(strIn)
Let s = Mid$(strIn, i, 1)
Let Cells(i, 1) = Asc(s)
Let Cells(i, 2) = s
Next


Let rw = 1
For i = LBound(bArr) To UBound(bArr)
Let Cells(rw, 3) = bArr(i)
Let Cells(rw, 4) = ChrW$(bArr(i))
Let rw = rw + 1
Next
Application.ScreenUpdating = True

End Sub


It seems that you were working with the wrong element, start with 0 and
go every other element. Or am I missing something? Either way, it
doesn't look like StrConv() is going to bail you out on those
Chars...

By the way, and this is part of the point, these days I have a serious
distaste for anything resembling the following:
For i = 32 To 255
strIn = strIn & ChrW$(i)
Next

I replicated it for the sake of addressing the question, but this is
slow!! It's String Ops like this that make VB[A] appear to be very
slow, once you get to ~40 concatenations this begins to degrade in a
way where you should consider doing something else. See the following:

http://msdn.microsoft.com/library/en-us/dngenlib/html/heap3.asp

"Slowdown as a result of frequent allocs and reallocs. This is a very
common phenomenon when you use scripting languages. The strings are
repeatedly allocated, grown with reallocation, and freed up. Don't do
this. Try to allocate large strings, if possible, and use the buffer.
An alternative is to minimize concatenation operations."

Note the 'Don't do this' part. Much like using ReDim Preserve or Union
in a loop, concatenating in a haphazard fashion is pretty hard on your
resources, more important on longer procedures... If the point is to
build one seriously fast algorithm, concatenating in a Loop need not
apply. This is why we use an array, with a buffer, except even better
in this case, we know the size of the array, Chars * 2.
You successfully sold me the Byte array + StrConv combination and not sure
I'm ready to cash in the StrConv!

Get ready!! I'm not selling anything, mind you. Never was, actually...
Please go about this as you wish, I'm simply sharing my experiences
with Excel and VB[A]...

StrConv() is handy, expensive, but handy, no doubt about it. And like
in the original thread you just mentioned, I used it as a starting
point, it's easier to work with an array that isn't *2. But, if the
goal is to write something with some serious chops, then dump it,
because you can, and you'll be rewarded!

Regards,
Nate Oliver
 
N

Nate Oliver

Hello again Peter,
Thanks for your detailed follow-up. My ReDim comment was misplaced but I'm
not (yet) with you on those 'problem' char's.

Nor should you be, as it turns out, I now follow the problem, back to
that in a minute.
1.

I think I got confused with your timer test to make a random string, in
which clearly do need to ReDim the array first. When I first played with
this I tried

ReDim bArr2(0 To Len(strIn) * 2 - 1)
bArr2 = strIn

Though that works it seems unnecessary and faster to omit the ReDim. I
assume you agree with that, you did same in the other thread you referred
to. Or am I missing something?

There's a difference in how we're stacking the Byte Array. With
SCase(), we want to quick-stack, go directly from String to Byte Array.
With rndStr() I was stacking a Byte Array with 1 random char at a time
from keyArr(), so in order to do that we need to dim-out the array
first, we're not coercing.
2. Re my comment 4 x faster to avoid strConv yet trivial overall.


When I initially did that test of course the result jumped out. But when I
put it in context with a full function overall speed was at most a few %
faster and dropping off with longer strings. Testing in different ways could
of course give very different results.

Despite impressions I may have given to the contrary I do take performance
seriously if going to impact on the user, even if merely perceptibly!

Not sure, my results have been different over the testing I have done
in the past, but they don't move linerally, i.e., the big-gainer on
direct String->Byte array, and I suspect the two would converge as your
String/Array grow...
3. Re my dump arrays to cells Test2(), my previous post


I didn't mean to and not sure where I have. The loop goes from lBound to
Ubound Step 2 dumping pairs of bytes in each row. I did increment the row
counter by 1 in the loop to cater for the Step 2. I think it correctly
extracts the array contents. Quite possible I can't see for looking!

Here is what threw me in your intial test:

rw = 1
For i = LBound(bArr) To UBound(bArr) Step 2
Cells(rw, 3) = bArr(i)
Cells(rw, 4) = bArr(i + 1)
rw = rw + 1
Next

I didn't follow what we were dumping in Column 4... But neither here
nor there, I now see the problem with the character range you're
talking about, and it's a real mapping issue.

Still, we might be able to dump StrConv(), if it really does have this
much overhead, and look at the elements, 2 at a time, with MidB$().
Check out the dump on this:

'--------------
Sub foo()
Dim strIn As String, strOut As String, s As String
Dim bArr() As Byte, i As Long

Application.ScreenUpdating = False

For i = 32 To 255
Let strIn = strIn & Chr$(i)
Next

Let bArr = strIn
Let strOut = bArr

Debug.Print strIn
Debug.Print strOut

For i = 1 To Len(strIn)
Let s = Mid$(strIn, i, 1)
Let Cells(i, 1).Value = Asc(s)
Let Cells(i, 2).Value = s
Next


For i = LBound(bArr) To UBound(bArr) Step 2
Let s = MidB$(bArr, i + 1, 2)
Let Cells(i \ 2 + 1, 3).Value = Asc(s)
Let Cells(i \ 2 + 1, 4).Value = s
Next

Application.ScreenUpdating = True

End Sub
'---------------

So, here we have two dumps. It seems with our Immediate Window dump,
going straight from String to Byte Array and back to String without
StrConv() works. The problem is only peaking at the first element (out
of 2) when peaking at individual chars in our Byte Array.

But, it appears that by dumping Chr()/ChrW$() and taking a peak, two
elements at a time with MidB$() we can accurately get back to the
individual chars, should we want to.

Note that MidB$() wants to peak at our Byte Array as if it were
1-based, like a String. So we still might be able to get rid of
StrConv(), and I would guess on short Strings, you should see a pretty
decent gain in performance.

Concur?

Cheers,
Nate Oliver
 
P

Peter T

Hi Nate,

Do I concur - I'm thinking about it <g>

OK so we need to cater for potential two byte ANSI char's in the range
128-255, in particular those with 2nd byte non-zero and first byte not as
expected.

Perhaps a quick recap for anyone(?) trying to follow this

Char's 0-127 are single byte, 128-255 are double byte ANSI.

bArr1 = StrConv(strIn, vbFromUnicode)
the byte array is conveniently populated with single bytes 0-255

But if we do: bArr2 = strIn

The byte array is populated with pairs of bytes for each character. If strIn
includes ansi char's we cannot simply process only the first byte in a 'Step
2' loop as the first byte may not be the expected 128-255, two problems -

1. if the first byte may be same as other char's so we could mess up other
characters with the same first byte value
2. we will fail to correctly process such a character

So back to the issue of whether to or not it's still faster to dump strConv
and include a bit of extra work. One approach might be along the lines of
this revision of the TestProblemChars routine I posted last time, which
incorrectly worked without using StrConv

Option Explicit

Public Declare Function QueryPerformanceFrequency _
Lib "kernel32.dll" ( _
lpFrequency As Currency) As Long

Public Declare Function QueryPerformanceCounter _
Lib "kernel32.dll" ( _
lpPerformanceCount As Currency) As Long


Sub TestProblemChars2()
Dim bArr1() As Byte, bArr2() As Byte
Dim i As Long, j As Long, k As Long
Dim strIn$, strOut1$, strOut2$, sCorrect$
Const C256 As Long = &H100 ' 256

Dim Ctr1 As Currency, Ctr2 As Currency, Freq As Currency
Dim Overhead As Currency

'timer from a post of Nate Oliver
QueryPerformanceFrequency Freq
QueryPerformanceCounter Ctr1
QueryPerformanceCounter Ctr2
Overhead = Ctr2 - Ctr1 ' determine API overhead


strIn = "??!<9"
'asci/chr 128 135 33 139 57
'ansi (172,32) (33,32) (33,0) (57,32) (57,0)

'change Euro(128) to E(69), !(33) to B(66), 9(57) to X(88)
'retain 135 and 139

sCorrect = "E?B<X" ' 69 135 66 139 88

For k = 1 To 2
'k=1 short string, k=2 longer string

QueryPerformanceCounter Ctr1 ' time loop

For j = 1 To 10000
bArr1 = StrConv(strIn, vbFromUnicode)
For i = LBound(bArr1) To UBound(bArr1)
Select Case bArr1(i)
Case 128
bArr1(i) = 69
Case 33
bArr1(i) = 66
Case 57
bArr1(i) = 88
End Select
Next
strOut1 = StrConv(bArr1, vbUnicode)
Next
QueryPerformanceCounter Ctr2
Debug.Print "with StrConv len: " & Len(strIn), _
(Ctr2 - Ctr1 - Overhead) / Freq


' Euro: asci=128, ansii=172,32 or 172+(32*256) = 8364

QueryPerformanceCounter Ctr1 ' time loop
For j = 1 To 10000
bArr2 = strIn
For i = LBound(bArr2) To UBound(bArr2) Step 2
Select Case bArr2(i) + (bArr2(i + 1) * C256)
Case 8364
bArr2(i) = 69: bArr2(i + 1) = 0
Case 33
bArr2(i) = 66
Case 57
bArr2(i) = 88
End Select
Next
strOut2 = bArr2

Next
QueryPerformanceCounter Ctr2
Debug.Print "omit StrConv len: " & Len(strIn), _
(Ctr2 - Ctr1 - Overhead) / Freq

'make string bigger by ^4 for the second test
If k = 1 Then
For i = 1 To 4
strIn = strIn & strIn
Next
End If

Next

For i = 1 To Len(sCorrect)
Debug.Print Asc(Mid$(sCorrect, i, 1)), _
Asc(Mid$(strOut1, i, 1)), _
Asc(Mid$(strOut2, i, 1))
Next
'both with or without StrConv work correctly

End Sub


In this contrived test I found dumping StrConv was 2 x faster with the short
string, slightly slower with the longer string. In a more realistic
repetitive call to a function the overall the difference would be much less,
I think.

A different approach might be something like

If bArr2(I + 1) then
'Ah, it's a problem char with a 2nd byte value. Don't get these often so we
can afford to handle this one in a different and much slower way
Else
'only look at the first byte

Could also try something with MidB$ but I imagine any type of string
function like that on each character in the loop would defeat the speed of
the byte array method, or are you thinking of something different ?

Regards,
Peter T
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top