Web Query? Is This Possible?

  • Thread starter Thread starter ryguy7272
  • Start date Start date
R

ryguy7272

I am wondering how easy, or difficult, it will be to do a web query on this
site:
http://www.dsireusa.org/

I’m not sure if there is a pattern or not; doesn’t really seem to be one and
that may prevent this project from getting off the ground. Anyway, I wanted
to try to get Excel to ‘drill down’ to each state, such as CA, where you
would see this
http://www.dsireusa.org/incentives/index.cfm?re=1&ee=1&spv=0&st=0&srp=1&state=CA

Then drill down to each link, between ‘Financial Incentives’ and ‘Related
Programs & Initiatives’ (but not below this). In each of those links I want
to Import .WebTables = "6,7,8", offset 2 or 3 rows and then back out of the
page, drill down to the next link, and so on and so forth. This may be a
massive undertaking, and simply not worth the effort, or it may not be bad at
all. I can’t tell, but if someone could assess, and even give me an idea of
how to get started, I’d be very appreciative.

Thank in advance!
Ryan---
 
there is nothing to download on the main page. Is there another page you
want to download. what method do you want to used. Just a Data -Import
External Import Data - New Web Query or do you want to open a n internet
explorer and download the daa using the internet explorted. have you tried
the Qery. Have you tried the query while recording a macro?

I need a little bit more info to be able to answer your question. if the
worksheet menu option works then it is simple. If you don't get the results
you are looking for then it is harder.
 
Thanks for taking an interest in this Joel. I've seen your work with web
queries; quite remarkable. If anyone can do this, it's probably you.

Ok, this is the main page for the site:
http://www.dsireusa.org/

If you click on 'CA', for instance, you end up here:
http://www.dsireusa.org/incentives/index.cfm?re=1&ee=1&spv=0&st=0&srp=1&state=CA

What I want to do on this page, is get Excel to drill down to each link,
between 'Financial Incentives' and 'Related Programs & Initiatives' (not
including the information below this).

Clicking on the first hyperlink takes you here:
http://www.dsireusa.org/incentives/incentive.cfm?Incentive_Code=CA164F&re=1&ee=1

I want to import the date from .WebTables = "6,7,8", offset 2 or 3 rows and
then back out of the page, drill down to the next link, and so on and so
forth. I've done lots of web queries and recorded macros many times to
automate the process of importing this data from the web, but the loop is the
tricky thing here. I don't know how to handle the looping part. Does it
make sense so far?

First, I'd like to import the date from here:
http://www.dsireusa.org/incentives/incentive.cfm?Incentive_Code=CA164F&re=1&ee=1

The link is named: 'Costa Mesa - Fee Waiver for Green Building'

After importing data from .WebTables = "6,7,8" and offsetting offset 2 or 3
rows, I'd like Excel to go back one web page, to get back here:
http://www.dsireusa.org/incentives/index.cfm?re=1&ee=1&spv=0&st=0&srp=1&state=CA

Then, move to the next hyperlink, which is this:
http://www.dsireusa.org/incentives/incentive.cfm?Incentive_Code=CA08F&re=1&ee=1

The link is named: 'Marin County - Green Building Incentive Program'

Then, I'd like excel to import the date from .WebTables = "6,7,8", offset 2
or 3 rows and then back out of the page, drill down to the next link, and so
on and so forth. Does it make sense? I'm pretty sure there is a way to do
this, but it may be more complicated than it is worth. What do you think?
You are the expert in these matters!!

Thanks,
Ryan---
 
I'm thinking that you may need to open two internet explorer applications.
the links will probably href properties. I probably won't get to look at
this until late tonight or tomorrow morning.

to get to each state webpage I would simply create a table of each atates
abbreviation and use it as part of the URL

California webpage is this

http://www.dsireusa.org/incentives/index.cfm?re=1&ee=1&spv=0&st=0&srp=1&state=CA

New York is this

http://www.dsireusa.org/incentives/index.cfm?re=1&ee=1&spv=0&st=0&srp=1&state=NY

So you can put all the state abbreviation on a spreadsheet

Then your loop would be

BaseName = "http://www.dsireusa.org/incentives/index.cfm?" &
"_re=1&ee=1&spv=0&st=0&srp=1&state="
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

for RowCount = 1 to 50
URL = BaseName & Range("A" & RowCount)
IE.Navigate2 URL
Do While IE.busy = True And _
IE.readystate <> 4

DoEvents
Loop
Set document = IE.document
'add your code here
next rowCount
 
This is what I have so far. I'm only doing California and NY as a test.
California is working. New York I 'm having problems with one of the links.
It is not working like the others and giving me an error. Will Look at it
tomorrow.

You need to create a worksheet called USA.

Sub USA()

'define states in searching webpage
Const FindCategories = 0
Const ExtractCategory = 1

USAStates = Array("CA", "NY")
BaseName = "http://www.dsireusa.org/incentives/index.cfm?" & _
"re=1&ee=1&spv=0&st=0&srp=1&state="
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

With Sheets("USA")
.Cells.ClearContents
RowCount = 1

For Each USAState In USAStates
URL = BaseName & USAState
IE.Navigate2 URL
Do While IE.busy = True And _
IE.readystate <> 4

DoEvents
Loop

.Range("A" & RowCount) = USAState
RowCount = RowCount + 1

Set document = IE.document
'Call Dump(document)
'code for extracting table
State = FindCategories
For Each itm In IE.document.all
Select Case itm.classname

Case "categorytype"
Select Case Trim(itm.innertext)
Case "Financial Incentives", _
"Rules, Regulations & Policies"

.Range("B" & RowCount) = itm.innertext
RowCount = RowCount + 1
State = ExtractCategory

Case Else
State = FindCategories
End Select

End Select

If State = ExtractCategory Then
Select Case itm.classname
Case "copybold"
.Range("C" & RowCount) = itm.innertext
RowCount = RowCount + 1

Case "copy"
.Range("D" & RowCount) = itm.innertext
.Range("E" & RowCount) = _
itm.FirstChild.href
RowCount = RowCount + 1
End Select

End If

Next itm

.Columns.AutoFit

Next USAState
End With
IE.Quit
Set IE = Nothing

End Sub

Sub Dump(document)

With Sheets("dump")
.Cells.ClearContents
RowCount = 1
For Each itm In document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.classname
.Range("C" & RowCount) = itm.ID
.Range("D" & RowCount) = Left(itm.innertext, 1024)

RowCount = RowCount + 1
Next itm
End With
End Sub
 
Hre is the final results. Use two macros. I wrote the 1st which gets you
all the URL's. Then write the 2nd macro which gets the actual data .

Sub USA()

'define states in searching webpage
Const FindCategories = 0
Const ExtractCategory = 1

USAStates = Array("AK", "AL", "AR", "AZ", _
"CA", "CO", "CT", "DC", "DE", "FL", "GA", _
"HI", "IA", "ID", "IL", "IN", "KS", "KY", _
"LA", "MA", "MD", "ME", "MI", "MN", "MO", _
"MS", "MT", "NC", "ND", "NE", "NH", "NJ", _
"NM", "NV", "NY", "OH", "OK", "OR", "PA", _
"RI", "SC", "SD", "TN", "TX", "UT", "VA", _
"VT", "WA", "WI", "WV", "WY")

BaseName = "http://www.dsireusa.org/incentives/index.cfm?" & _
"re=1&ee=1&spv=0&st=0&srp=1&state="
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

With Sheets("USA")
.Cells.ClearContents
RowCount = 1
.Range("A" & RowCount) = "State"
.Range("B" & RowCount) = "Category"
.Range("C" & RowCount) = "Topic"
.Range("D" & RowCount) = "SubTopic"
.Range("E" & RowCount) = "URL"

RowCount = 2

For Each USAState In USAStates
URL = BaseName & USAState
IE.Navigate2 URL
Do While IE.busy = True Or _
IE.readystate <> 4

DoEvents
Loop

Set document = IE.document
'Call Dump(document)
'code for extracting table
State = FindCategories
For Each itm In IE.document.all
Select Case itm.classname

Case "categorytype"
Select Case Trim(itm.innertext)
Case "Financial Incentives", _
"Rules, Regulations & Policies"

Category = Trim(itm.innertext)
State = ExtractCategory

Case Else
State = FindCategories
End Select

End Select

If State = ExtractCategory Then
Select Case itm.classname
Case "copybold"
Topic = Trim(itm.innertext)

Case "copy"
.Range("A" & RowCount) = USAState
.Range("B" & RowCount) = Category
.Range("C" & RowCount) = Topic
Subtopic = Trim(itm.innertext)
.Range("D" & RowCount) = Subtopic
HREF = itm.FirstChild.HREF
.Range("E" & RowCount) = HREF
RowCount = RowCount + 1
End Select

End If

Next itm

.Columns.AutoFit

Next USAState
End With
IE.Quit
Set IE = Nothing

End Sub

Sub Dump(document)

With Sheets("dump")
.Cells.ClearContents
RowCount = 1
For Each itm In document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.classname
.Range("C" & RowCount) = itm.ID
.Range("D" & RowCount) = Left(itm.innertext, 1024)

RowCount = RowCount + 1
Next itm
End With
End Sub
 
Thanks for the Effort Joel!!! This is awesome!!!

The code looks right, but does actually this work for you? When I run it
here IE opens and hangs for a long time. Then, I close the browser and this
line is yellow:
Do While IE.busy = True Or IE.readystate <> 4

No data was imported in about 7 minutes or so. I have a cable connection;
pretty fast. I imagine it will take a while to run, but I don't think it was
actually doing anything. unless, everything has to import and then update
all at onece. however, I don't think that's what it is doing. Any thoughts?
Suggestions? I'm going to try to troubleshoot, but if you can think of
something that may prevent this from running (maybe a setting in IE), please
let me know.

Kindest of regards,
Ryan---
 
Ah! There WAS something weird with my IE settings. Ok, got it working
now!!! This is totally awesome!!! The only things is…I wanted the code to
drill down one more level. If you run the code, you will see a link in cell
E2. When E2 is selected, hit F2, then hit enter, now click on the
link…that’s what I wanted; that data right there. I wanted to import
everything there; looks like .WebTables = "4". With all you’ve done, you’ve
made me believe that this is actually possible!! Can you please modify the
code to go one level deeper, import that data, then continue until all
imports are complete?

Please send me an email: (e-mail address removed)

Thanks a ton for all of this!!
You have totally made my weekend!!
This is going to save me so much time!!


Ryan---
 
I'm using 2003 on a high speede dsl line and the code runs in under 2 minutes
(much faster than I thought it would). The webpage may be out of service.
did you try using just the internet explorer to get to the page. If the IE
explorer works manually then the code should also run. did a IE application
run and did the page get to the correct site? I've used this code with IR 6,
7, and 8 in the past. I'm know using 7.

The code will hang on that line when the browser is closed since you killed
the IE object manually. while the code is running you should be able to
switch betwen the webpage and the worksheet and actually watch the
spreadsheet filling up as the macro is running. I have a clearcontents
msthod in the code so if you restart it will clear the old data.
 
I just retested the code and it works perfectly. I also notice if you want
the US teritories add "TR" to the array definiation. I already had the
District of columbia in the List.
 
yes, yes, yes. Did you see my last post? There WAS something weird with my
IE settings. Ok, got it working now!!! This is totally awesome!!! The only
things is...I wanted the code to drill down one more level. If you run the
code, you will see a link in cell E2. When E2 is selected, hit F2, then hit
enter, now click on the
link...that’s what I wanted; that data right there. I wanted to import
everything there; looks like .WebTables = "4". With all you’ve done, you’ve
made me believe that this is actually possible!! Can you please modify the
code to go one level deeper, import that data from there, then continue
until all
imports are complete?

Please send me an email: (e-mail address removed)

Thanks a ton for all of this!!
You have totally made my weekend!!
This is going to save me so much time!!


Ryan---
 
How do you want the data. when I've done this before I put all the data into
one worksheet. In this case adding the data starting at column G of the
present worksheet. I think you need a two row header. The first Row would
be the 3 main categories
1) Last Desire Review
2) Summary
3) Contact

the 1st and 3rd category has types which are the items to the right of the
Colon on each line or group of lines like this

1) State - skip since it is already in column A
2) Amount
3) Terms
4) WebSite

And the details to the right of the column will be in the actual row cell.


The other choice is to put data into one cell or multiple cells. I just
need to know which column you need and how you need them orgainized.
 
I think the first choice is better, but I don't completely understand what
you mean. I think data in separate cells is best; this is for a database
that I'm developing. Basically, this is going into a SQL Server DB. I need
to be able to query by different criteria. I don't know how to do this kind
of thing using SQL Server, or even Access, so I opted for Excel. I think if
everything is in one sheet, with distinct data in distinct cells, it will
just be easier to Import into SQL Server; one single 'flat file'. Do you
know if this can be done using SQL Server or Access? Excel is perfectly fine
as an interface; just one more step to transfer to SQL Server, and this is
going to save me soooo much time.

Can you send me an email? (e-mail address removed)


Thanks for everything!!
Ryan--
--
 
Back
Top