Parsing xml-rss weather feeds using perl

  • Thread starter Thread starter Richard in Va.
  • Start date Start date
R

Richard in Va.

Hello,

Where might I find assistance with the Parsing of xml/rss weather feeds using Perl
Compatible Regular Expressions?

I've downloaded the desktop widget called Rainy's Rainmeter, found at...

http://www.ipi.fi/~rainy/index.php?pn=projects&project=rainmeter

One of the widgets provides the current weather conditions. (reconfigure for your
location).

I've edit it to make it a little bigger and to show more current information like wind
speed, wind gust, wind chill and so on.

My problem is that I can't parse the xml/rss feed beyond current conditions, The feed I
download includes the 5-day forecast so I know it's available. I just don't know how to
parse the 5-day forecast using perl regular expressions.

Here is the rss feed...

http://xoap.weather.com/weather/local/24522?cc=*&unit=m&dayf=5

Below is the code I'm using to parse the current weather conditions down to "Dew Point".
This is working well for me. I assume this is an uncompiled "long-hand" approach. But even
I can understand and edit it later if needed. I'm sure someone will ask what's with all
the extra (.+)'s. I read somewhere in the Rainy forums that this method might prevent some
parsing issues. So I tried it and it works, the web parser just skips a number when it
assigns values to the string indexes.

RegExp="(?siU)<weather
ver="(.+)">(.+)<dnam>(.+)</dnam>(.+)<tm>(.+)</tm>(.+)<sunr>(.+)</sunr>(.+)<suns>(.+)</suns>(.+)<lsup>(.+)</lsup>(.+)<obst>(.+)</obst>(.+)<tmp>(.+)</tmp>(.+)<flik>(.+)</flik>(.+)<t>(.+)</t>(.+)<icon>(.+)</icon>(.+)<r>(.+)</r>(.+)<d>(.+)</d>(.+)<s>(.+)</s>(.+)<gust>(.+)</gust>(.+)<d>(.+)</d>(.+)<t>(.+)</t>(.+)<hmid>(.+)</hmid>(.+)<vis>(.+)</vis>(.+)<dewp>(.+)</dewp>"

The RegExp above is treated all as one (1) line.

The Rainy Webparser will generate a log file I can view to see what value it's assigned to
the string indexes.

If I could get help with parsing " Today's Forecast ", which is Day #0 (day d="0") in the
rss feed. I could continue the code myself to parse day #1-4 for the 5-day forecast.

<dayf>

<lsup>8/24/06 1:08 PM EDT</lsup>

<day d="0" t="Thursday" dt="Aug 24">

<hi>33</hi>

<low>18</low>

and so-on.

The "<day d="0" t="Thursday" dt="Aug 24"> " is what I can't seem to get past.

I would like to parse ALL the available information for the given day. The Rainy web
parser will assign a string index # for each element parsed. Then I can pick-and-choose
the ones I want displayed on the widget.

By the way, the clock is a piece of art. love it! Also, the calendar, found from the Rainy
web site. Look to the left under Projects|Rainlendar. That is a must have in my opinion!

Nice stuff.!

Yes I know, alittle OT. But I've read through the Rainy manuals, read through his
discussion forums, google searched and have tried several of the on-line code testers with
little luck.

I'm not a programmer, I'm a tinker-er. So I know little to nothing about scripting.

Is there a newsgroup for Perl scripting help?

Best regards,

Richard in VA.

++++++++++++++++++++++++++++++++
 
I spent several hours one night trying to modify the Rainy weather widget. It's not complex, but it's quite difficult to get
working.

I've got to go out now, but I'll take a look at it and see what I can do later today.
 
Hello Alec,

I worked on it some more last weekend and I think I might have solved most
of my problem.

I reviewed the Rainy News widget (ini) file, it's installed along with the
Tranquil group so you should have this too.
It uses (3) "RegExp=" to parse 3 different rss feeds.

So here is what I did...

[Variables]
URL1=http://xoap.weather.com/weather/local/24522?cc=&dayf=3
URL2=http://xoap.weather.com/weather/local/24522?cc=&dayf=3
URL3=http://xoap.weather.com/weather/local/24522?cc=&dayf=3
URL4=http://xoap.weather.com/weather/local/24522?cc=&dayf=3
(url has changed slightly since original post)
("24522" = zip code for location)
("dayf=3" = current condition + 3-day forecast)

Url=#URL1#
RegExp="(?siU)<weather ver="(.+)">(.+)<dnam>(.+)</dnam>(.+).....thru
to......<t>(.+)</t>"
StringIndex=1
Debug=2

(This parsed the current conditions and created StringIndex #1-45 or so)

I then...
Defined my Measures and Meters for the current conditions.

I then...

Url=#URL2#
RegExp="(?siU)<day d="0" t="(.+)"(.+)<hi>(.+)</hi>(.+)......thru
to......<hmid>(.+)</hmid>"
StringIndex=1
Debug=1

(This parsed the Forecast for Today Day "0", Parts Day and Night and created
StringIndex #1-45 or so)

I then...
Defined my Measures and Meters for Today Day "0".

Repeated the same for days "1" thru "2".

This approach downloads and parses the same rss feed 4 times which is not
good. There is a way around this, I suppose this will be my next hurdle.
There is supposed to be a way to have RegExp #2, #3, #4 reparse what was
downloaded by RegExp #1.

Haven't tried it yet, but maybe it's as simple as parsing
"C:\WebParserDump.txt" that was saved by "Debug=2" in RegExp #1.

The actual url I'm testing with references a local xml file (downloaded and
saved rss feed) so that I don't upset weather.com, they don't like you to
request a rss feed more than once per hour.

Also, I'm not sure where I left "FinishAction=!RainmeterRedraw". Instead of
Redrawing at the end of each RegExp, I may have moved it to the end of the
last one. (Away from that PC right now)

Rainy has a great product (love the desktop calendar)

Found a way (Rainy's Forum) to have a small (2"x2") window display a public
webcam image that updates every 30 minutes. (know anyone that maintains a
traffic or beach cam? I modified it to show a local weather radar map, been
wanting to do that for a long time.

Thanks for your interest!

Best regards,

Richard in Va.

######################
 
Richard In Va. said:
Hello Alec,

I worked on it some more last weekend and I think I might have solved most
of my problem.

I reviewed the Rainy News widget (ini) file, it's installed along with the
Tranquil group so you should have this too.
It uses (3) "RegExp=" to parse 3 different rss feeds.


Yup, it took me a while too. Like I said, it's not complex but it sure is difficult to get working. :)

Hopefully you've got it worked out.
 
Thank you Alec,

Yes, I am further along now than I was last week. But I am still parsing a
local "C:\24522.xml" file. This week I'll plan on re-directing it to the
on-line rss feed.

Seems that last week, while it was parsing from the rss feed, it had a
tendency to get the stringIndex numbers out of order. Hmmm...!

Anyway, I can only spend so much time on it between other (real world)
obligations.

So maybe later this week I'll put it back on the rss feed.

Net stuff!

Best regards,

Richard in VA.
###################
 
Back
Top