Find a Pattern and Parse Text

  • Thread starter Thread starter ryguy7272
  • Start date Start date
R

ryguy7272

How can I use VBA to find a pattern such as ‘Checked Out To:’ and
everything between the quotes?

The Excel file will have 16 characters, 20 with the .xls extension.
Then there is a Return. Then ‘Checked Out To:’ with last name comma
first name and there MAY be some brackets after that. Everything will
ALWAYS be between quotes.

“9900010X10G00904.xls
Checked Out To: Jupree, Jasper [C-TO]”

“9900010X10G11208.xls
Checked Out To: Malfree, Jim”

I’m experimenting with a concept such as this:
If InStr(1, my_var, "Checked Out To: " & activeWB = "Checked Out To: "
& activeWB, vbTextCompare) > 0 Then
‘Something . . . MsgBox ()
End If

There is a pattern, as described above, but I’m not sure how to parse
everything between the quotes.

Just so you know, all data comes from a URL; the my_var variable comes
from this URL . . . all of this works fine!
(Thanks Ron . . . )

Any ideas on how to handle this?

Thanks!!
 
How can I use VBA to find a pattern such as ‘Checked Out To:’ and
everything between the quotes?

The Excel file will have 16 characters, 20 with the .xls extension.
Then there is a Return.  Then ‘Checked Out To:’ with last name comma
first name and there MAY be some brackets after that.  Everything will
ALWAYS be between quotes.

“9900010X10G00904.xls
Checked Out To: Jupree, Jasper [C-TO]”

“9900010X10G11208.xls
Checked Out To: Malfree, Jim”

I’m experimenting with a concept such as this:
If InStr(1, my_var, "Checked Out To: " & activeWB = "Checked Out To: "
& activeWB, vbTextCompare) > 0 Then
‘Something . . . MsgBox ()
End If

There is a pattern, as described above, but I’m not sure how to parse
everything between the quotes.

Just so you know, all data comes from a URL; the my_var variable comes
from this URL . . . all of this works fine!
(Thanks Ron . . . )

Any ideas on how to handle this?

Thanks!!

How about something like...

pos_1 = instr(1, my_var, "Checked Out To", vbTextCompare)

Do until pos_1 = 0
pos_2 = instrrev(my_var, "xls", pos_1, vbTextCompare)
pos_3 = instr(pos_1, my_var, """", vbTextCompare)
my_text = mid(my_var, -17 + pos_2, pos_3 - (-17 + pos_2))
' might need a line like
' my_text=replace(my_text, chr(13),"")
' in order to remove line breaks
pos_1 = instr(pos_3, my_var, "Checked Out To", vbTextCompare)
Loop

....Ron
 
How can I use VBA to find a pattern such as ‘Checked Out To:’ and
everything between the quotes?
The Excel file will have 16 characters, 20 with the .xls extension.
Then there is a Return.  Then ‘Checked Out To:’ with last name comma
first name and there MAY be some brackets after that.  Everything will
ALWAYS be between quotes.
“9900010X10G00904.xls
Checked Out To: Jupree, Jasper [C-TO]”
“9900010X10G11208.xls
Checked Out To: Malfree, Jim”
I’m experimenting with a concept such as this:
If InStr(1, my_var, "Checked Out To: " & activeWB = "Checked Out To: "
& activeWB, vbTextCompare) > 0 Then
‘Something . . . MsgBox ()
End If
There is a pattern, as described above, but I’m not sure how to parse
everything between the quotes.
Just so you know, all data comes from a URL; the my_var variable comes
from this URL . . . all of this works fine!
(Thanks Ron . . . )
Any ideas on how to handle this?

It's not clear to me, from what you write, exactly what is in my_var,
and exactly what result you want returned.  The examples you give make
it appear as if you just want to return my_var; probably without the
quotation marks, provided the sub-string Checked Out To: is present in
the string.

Is it really that simple?

If so, just use Instr to check for the presence of Checked Out To:
and, if present, return my_var (or mid(my_var,2, len(my_var)-2 if you
need to remove the quotes).

A more detailed description and examples of what you want would be
helpful.- Hide quoted text -

- Show quoted text -

Thanks guys!! I can't use a loop; it will be too slow. I was suing a
loop before and had to rethink this due to performance issues.


Here’s what I’m trying to do:
URL = "https:// . . . " < -- secure web site
Set IE = Nothing
Set IE = CreateObject("InternetExplorer.Application")

With IE
.navigate URL
.Visible = False
.Top = 50
.Left = 50
.Height = 50
.Width = 50
'Wait for page to load
While .Busy Or .readyState <> 4 Or IE.Busy = True: Wend
Set HTMLdoc = .document
End With

Dim my_var As String
Dim my_file As String
Dim my_user As String
my_var = ""
my_var = IE.document.body.innerHTML

Now, my_var is a HUGE string. It is supposed to have everything from
the URL. If I go to View > Source on my web browser, well, that’s the
‘my_var’

Your code gave the this:
‘javascript:return WebForm_OnSubmit();’

I can’t figure out why. It’s in the site, close to the top, but it
has nothing to do with what I’m looking for. I’m looking for
something like this:
8834310G10X09999.xls
Checked Out To: COLE, TIMMY

Basically, this is the text ‘Checked Out To:’ and the name of the
Excel file.

The name of the Excel file comes from here:
activeWB = strFullString & ".xls"

I need to think about this a bit. If you come up with something else
in the meantime, please do post back.

Thanks!!
 
Post from a few lines before, through a few lines after, the segment
that you wish to extract.

This is pretty much what I’m searching for:

Lines Before:
<tr class="ms-alternating">
<td class="ms-vb-icon">
<a tabIndex="-1" onclick="return
DispEx(this,event,'TRUE','FALSE','TRUE','SharePoint.OpenDocuments.
3','0','SharePoint.OpenDocuments','','','11','11','0','1','0x400001f07fff1bff')"
href="/sites/P/SharedDoc/8834310G10X09999.xls">
<img title="8834310G10X09999.xls
Checked Out To: COLE, TIMMY" alt="8834310G10X09999.xls
Checked Out To: COLE, TIMMY" src="/_layouts/images/icxls.gif"
border="0" complete="complete"/>

Line After:
<td height="100%" class="ms-vb-title">

I want to extract this, and pop it into a MsgBox:
8834310G10X09999.xls
Checked Out To: COLE, TIMMY

There’s always a hard return after the .xls

One HUGE string should be in here:
my_var = IE.document.body.innerHTML

Then, I want to look for the name of the Excel file and the name of
the person, in the 'my_var'

Does this make sense?
 
This is pretty much what I’m searching for:
Lines Before:
<tr class="ms-alternating">
 <td class="ms-vb-icon">
   <a tabIndex="-1" onclick="return
DispEx(this,event,'TRUE','FALSE','TRUE','SharePoint.OpenDocuments.
3','0','SharePoint.OpenDocuments','','','11','11','0','1','0x400001f07fff1­bff')"
href="/sites/P/SharedDoc/8834310G10X09999.xls">
   <img title="8834310G10X09999.xls
Checked Out To: COLE, TIMMY" alt="8834310G10X09999.xls
Checked Out To: COLE, TIMMY" src="/_layouts/images/icxls.gif"
border="0" complete="complete"/>
Line After:
<td height="100%" class="ms-vb-title">
I want to extract this, and pop it into a MsgBox:
8834310G10X09999.xls
Checked Out To: COLE, TIMMY
There’s always a hard return after the .xls
One HUGE string should be in here:
my_var = IE.document.body.innerHTML
Then, I want to look for the name of the Excel file and the name of
the person, in the 'my_var'
Does this make sense?

In your example, you have two (2) instances of Checked Out To preceded
by an excel file name that ends with a hard return.

The following code will return both of them:

====================================
Option Explicit
Sub GetCheckedOut()
  Dim re As Object, mc As Object, m As Object
  Dim my_var As String, s As String
  Const sPat As String = _
"\b\w+\.xls[\r\n]Checked Out To:[\sA-Z,]+\b"

my_var = Range("$A$3").Text
Set re = CreateObject("vbscript.regexp")
    re.Pattern = sPat
    re.Global = True
    re.ignorecase = False

If re.test(my_var) = True Then
    Set mc = re.Execute(my_var)
        For Each m In mc
            s = s & vbLf & vbLf & m
        Next m
        s = Mid(s, 3)
    MsgBox (s)
End If
End Sub
====================================- Hide quoted text -

- Show quoted text -


Ok, this may be it, or VERY close. However, my_var is NOT Range("$A
$3").Text
my_var = ""
my_var = IE.document.body.innerHTML

I set it to a blank first because I saw some weird stuff in there one
time.

Question: What does this do?
If re.test(my_var) = True Then

As I step through the code, that condition doesn't seem to go to True,
so the code skips to . . . End If . . . and treminates.

The Loop won't work; tried it already and there is sooooooooo much
stuff in that 'my_var' string and the performace is super-slow.

I'm trying something like this:
If InStr(1, my_var, activeWB & vbClrf & "Checked Out To: " &
UCase(username), vbTextCompare) Then
.. . . . .

This may do it too:
If InStr(1, my_var, activeWB & vbClrf & "Checked Out To: " & activeWB,
vbTextCompare) Then

Active Workbook is assigned liek this:
activeWB = strFullString & ".xls"

Can I use this concept. I already know the loop will be too
slow . . . . .

Thanks for your time and consideration with this, Ron!!
 
In message <[email protected]
s.com> of Fri, 20 Aug 2010 14:12:19 in microsoft.public.excel.programmin
g said:
On Fri, 20 Aug 2010 13:32:29 -0700 (PDT), ryguy7272
[snip]
  Const sPat As String = _
"\b\w+\.xls[\r\n]Checked Out To:[\sA-Z,]+\b"

my_var = Range("$A$3").Text
Set re = CreateObject("vbscript.regexp")
    re.Pattern = sPat
    re.Global = True
    re.ignorecase = False

If re.test(my_var) = True Then
[snip]

Question: What does this do?
If re.test(my_var) = True Then

As I step through the code, that condition doesn't seem to go to True,
so the code skips to . . . End If . . . and treminates.
[snip]

Thanks for your time and consideration with this, Ron!!

As Ryan (Is it Ryan?) did not understand re.test(my_var), I suspect
"\b\w+\.xls[\r\n]Checked Out To:[\sA-Z,]+\b" is also opaque to him/her.
cf. <http://en.wikipedia.org/wiki/Regular_expression> for a description
of Regular Expressions(RE). They are HARD, but POWERFUL.
<http://msdn.microsoft.com/en-us/library/ms974570.aspx> describes the
Microsoft dialect.
I use them a lot and they give me much grief until I get them right.
They then work well until I find an example which breaks them.

One technique in my standard arsenal is the following function:

Private Function GetSub(ByVal from As String, ByVal Match As String,
ByVal Part As String) As String
RE.Pattern = Match
GetSub = RE.Replace(from, Part)
End Function

which I use it from the Immediate window (opened with Ctrl+G) as
?GetSub(my_var, "\b\w+\.xls[\r\n]Checked Out To:[\sA-Z,]+\b", "yes")

If the RE matches, this will write
yes
It may write "fooyesbar".
If the RE does not match, it will write nothing or the value of my_var -
I can't remember which.
A different RE is needed.
I would try a complete match of my_var, with codes to skip to the match
and from the match. i.e. with
?GetSub(my_var, "^.*\b\w+\.xls[\r\n]Checked Out To:[\sA-Z,]+\b.*$", _
"yes")

That won't match. Find something that will. e.g.
?GetSub(my_var, "^.*Checked.*$", "yes")

To see what matches, use
?GetSub(my_var, "^.*(Checked).*$", "$1")

Do a binary chop between small things which match and large things which
don't until the desired match is found.
Then apply that pattern to Ron's code.

I will ignore "What does this do?" when the answer is RTFM <http://en.wi
kipedia.org/wiki/Rtfm>

I wish Ryan? luck.
 
In message <[email protected]> of Sat, 21 Aug
2010 06:47:29 in microsoft.public.excel.programming, Ron Rosenfeld
As Ryan (Is it Ryan?) did not understand re.test(my_var), I suspect
"\b\w+\.xls[\r\n]Checked Out To:[\sA-Z,]+\b" is also opaque to him/her.

Given this comment from him:
my_var = ""
my_var = IE.document.body.innerHTML

I set it to a blank first because I saw some weird stuff in there one
time

I suspect there are other areas about VB programming that Ryan needs
to study.
I use them a lot and they give me much grief until I get them right.
They then work well until I find an example which breaks them.

If you aren't aware of it, I would suggest a program called RegexBuddy
(www.regexbuddy.com). It is very helpful in trying to devise regular
expressions that work.

[snip]

Thanks for that Ron. I don't think I had heard of RegexBuddy.
Looks good. RegexMagic <http://www.regexbuddy.com/regexmagic.html> looks
even better for generating REs. Ryan might buy you a copy. ;) ~30USD.

I am very impressed they seem to support a wide variety of RE dialect.
e.g. both the primitive REs of sed and the sophisticated ones of perl.
If I was still being paid to develop software, I would buy both.
Meanwhile, I have noted the reference.
 
I'm not so sure about Regex Magic.  I've been trying it out a bit, but
I don't know it well enough to do much more than generate regex's that
will accept only a defined range of numbers.  Probably if I became
more familiar with it, it would be more useful to me.

Hello everyone! Thanks for all the help with this. I think Regular
expressions may be the key to solving this puzzle. I've got a bit
more work to do on this. I'll try to get my arms around it by mid
week and post back with a solution as soon as possible.

Thanks again!
Ryan--
 
Back
Top