J
JimTheAverage
I want to write myself a little app that grabs rss feeds from google
and makes them human readable for me - sending me text messages of
words that I list as interesting in the app.
The problem is all of the special characters and HTML tags in the
data. Here is an example....
<table border="0" cellpadding="2" cellspacing="7" style="vertical-
align:top;"><tr><td width="80" align="center" valign="top"><font
style="font-size:85%;font-family:arial,sans-serif"><a href="http://
news.google.com/news/url?fd=R&sa=T&url=http%3A%2F
%2Fwww.google.com%2Fhostednews%2Fafp%2Farticle
%2FALeqM5gUDc1OjvV66ZWAJ0sVkcvxEAeC_g&usg=AFQjCNHOZ9w9BRN3suaUKX5NtYyvXwu1Hg"><img
src="http://nt0.ggpht.com/news/tbn/JId-xOOMV8PujM/6.jpg" alt=""
border="1" width="80" height="80" /><br /><font size="-2">AFP</font></
a></font></td><td valign="top" class="j"><font style="font-size:
85%;font-family:arial,sans-serif"><br /><div style="padding-top:
0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a
href="http://news.google.com/news/url?fd=R&sa=T&url=http:/
%2Fwww.bloomberg.com%2Fapps%2Fnews%3Fpid%3D20601087%26sid
%3DaTDY1sk77rm0%26pos%3D6&usg=AFQjCNFpZEDNQOe3IGg-ywaHRs4QbhY0-
Financial Group Inc. may announce Japan's biggest secondary share
sale this week as it prepares for stricter global capital rules,
according to a survey of analysts. The nation's largest bank by
<b>...</b></font><br /><font size="-1"><a href="http://news.google.com/
news/url?fd=R&sa=T&url=http%3A%2F%2Fwww.reuters.com%2Farticle
%2FrbssFinancialServicesAndRealEstateNews
%2FidUST33395120091116&usg=AFQjCNEmza8jguldmVwYZHQbCgrNWeEr9g">MUFG
shares fall nearly 5 pct on share issue plan</a><font size="-1"
color="#6f6f6f"><nobr>Reuters</nobr></font></font><br /><font
size="-1"><a href="http://news.google.com/news/url?
fd=R&sa=T&url=http%3A%2F%2Fonline.wsj.com%2Farticle
%2FSB10001424052748704431804574537433259423484.html%3Fmod
%3Dgooglenews_wsj&usg=AFQjCNHnBOh1rt7oohVtBHRwwRmTYEr4Qg">Mitsubishi
%2F10626902%2F1%2Fmitsubishi-ufj-ponders-stock-sale.html%3Fcm_ven
%3DGOOGLEFI&usg=AFQjCNGhb0g_PzOJYIRcpSDeA6Xv7hBu8w">Mitsubishi UFJ
Ponders Stock Sale</a><font size="-1"
color="#6f6f6f"><nobr>TheStreet.com</nobr></font></font><br /><font
size="-1" class="p"><a href="http://news.google.com/news/url?
fd=R&sa=T&url=http%3A%2F%2Fwww.marketwatch.com%2Fstory
%2Fmitsubishi-ufj-plans-11-bln-share-issue-
report-2009-11-14&usg=AFQjCNHCJpoOUHe_gaVZIV1gk0j42MkDPg"><nobr>MarketWatch</
nobr></a> -<a href="http://news.google.com/news/url?
fd=R&sa=T&url=http%3A%2F%2Ftopnews.us%2Fcontent%2F28359-mufg-
looking-raise-11-billion-through-public-offering-common-
shares&usg=AFQjCNESYgvq1nwyiLyos9-_bdhvmHn18w"><nobr>TopNews
United States</nobr></a> -<a href="http://news.google.com/news/
url?fd=R&sa=T&url=http%3A%2F%2Fwww.forexyard.com%2Fen
%2Freuters_inner.tpl%3Faction
%3D2009-11-16T013439Z_01_T122800_RTRIDST_0_MARKETS-JAPAN-STOCKS-
UPDATE-2&usg=AFQjCNFXGg95rPlOBIA2bKc0jd2Q3yFk2g"><nobr>Forexyard</
nobr></a></font><br /><font class="p" size="-1"><a class="p"
href="http://news.google.com/news/more?
ned=us&topic=b&ncl=dPy1T0LK7URGP1MyRL2FQL4se2MkM"><nobr><b>all
69 news articles »</b></nobr></a></font></div></font></td></
tr></table>
I want to make this human readable. To do so, I first need to replace
all HTML special chars with their human-readable equiv and then remove
all HTML tags.
A quick look at http://www.degraeve.com/reference/specialcharacters.php
shows that removing all of these special chars is no simple task.
Is there an easy way of doing this that I am overlooking?
and makes them human readable for me - sending me text messages of
words that I list as interesting in the app.
The problem is all of the special characters and HTML tags in the
data. Here is an example....
<table border="0" cellpadding="2" cellspacing="7" style="vertical-
align:top;"><tr><td width="80" align="center" valign="top"><font
style="font-size:85%;font-family:arial,sans-serif"><a href="http://
news.google.com/news/url?fd=R&sa=T&url=http%3A%2F
%2Fwww.google.com%2Fhostednews%2Fafp%2Farticle
%2FALeqM5gUDc1OjvV66ZWAJ0sVkcvxEAeC_g&usg=AFQjCNHOZ9w9BRN3suaUKX5NtYyvXwu1Hg"><img
src="http://nt0.ggpht.com/news/tbn/JId-xOOMV8PujM/6.jpg" alt=""
border="1" width="80" height="80" /><br /><font size="-2">AFP</font></
a></font></td><td valign="top" class="j"><font style="font-size:
85%;font-family:arial,sans-serif"><br /><div style="padding-top:
0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a
href="http://news.google.com/news/url?fd=R&sa=T&url=http:/
%2Fwww.bloomberg.com%2Fapps%2Fnews%3Fpid%3D20601087%26sid
%3DaTDY1sk77rm0%26pos%3D6&usg=AFQjCNFpZEDNQOe3IGg-ywaHRs4QbhY0-
font><br /><font size="-1">Nov. 16 (Bloomberg) -- Mitsubishi UFJg"> said:<font size="-1"><b><font color="#6f6f6f">Bloomberg</font></b></
Financial Group Inc. may announce Japan's biggest secondary share
sale this week as it prepares for stricter global capital rules,
according to a survey of analysts. The nation's largest bank by
<b>...</b></font><br /><font size="-1"><a href="http://news.google.com/
news/url?fd=R&sa=T&url=http%3A%2F%2Fwww.reuters.com%2Farticle
%2FrbssFinancialServicesAndRealEstateNews
%2FidUST33395120091116&usg=AFQjCNEmza8jguldmVwYZHQbCgrNWeEr9g">MUFG
shares fall nearly 5 pct on share issue plan</a><font size="-1"
color="#6f6f6f"><nobr>Reuters</nobr></font></font><br /><font
size="-1"><a href="http://news.google.com/news/url?
fd=R&sa=T&url=http%3A%2F%2Fonline.wsj.com%2Farticle
%2FSB10001424052748704431804574537433259423484.html%3Fmod
%3Dgooglenews_wsj&usg=AFQjCNHnBOh1rt7oohVtBHRwwRmTYEr4Qg">Mitsubishi
fd=R&sa=T&url=http%3A%2F%2Fwww.thestreet.com%2FstoryUFJ Weighs Raising $11 Billion said:<font size="-1"><a href="http://news.google.com/news/url?
%2F10626902%2F1%2Fmitsubishi-ufj-ponders-stock-sale.html%3Fcm_ven
%3DGOOGLEFI&usg=AFQjCNGhb0g_PzOJYIRcpSDeA6Xv7hBu8w">Mitsubishi UFJ
Ponders Stock Sale</a><font size="-1"
color="#6f6f6f"><nobr>TheStreet.com</nobr></font></font><br /><font
size="-1" class="p"><a href="http://news.google.com/news/url?
fd=R&sa=T&url=http%3A%2F%2Fwww.marketwatch.com%2Fstory
%2Fmitsubishi-ufj-plans-11-bln-share-issue-
report-2009-11-14&usg=AFQjCNHCJpoOUHe_gaVZIV1gk0j42MkDPg"><nobr>MarketWatch</
nobr></a> -<a href="http://news.google.com/news/url?
fd=R&sa=T&url=http%3A%2F%2Ftopnews.us%2Fcontent%2F28359-mufg-
looking-raise-11-billion-through-public-offering-common-
shares&usg=AFQjCNESYgvq1nwyiLyos9-_bdhvmHn18w"><nobr>TopNews
United States</nobr></a> -<a href="http://news.google.com/news/
url?fd=R&sa=T&url=http%3A%2F%2Fwww.forexyard.com%2Fen
%2Freuters_inner.tpl%3Faction
%3D2009-11-16T013439Z_01_T122800_RTRIDST_0_MARKETS-JAPAN-STOCKS-
UPDATE-2&usg=AFQjCNFXGg95rPlOBIA2bKc0jd2Q3yFk2g"><nobr>Forexyard</
nobr></a></font><br /><font class="p" size="-1"><a class="p"
href="http://news.google.com/news/more?
ned=us&topic=b&ncl=dPy1T0LK7URGP1MyRL2FQL4se2MkM"><nobr><b>all
69 news articles »</b></nobr></a></font></div></font></td></
tr></table>
I want to make this human readable. To do so, I first need to replace
all HTML special chars with their human-readable equiv and then remove
all HTML tags.
A quick look at http://www.degraeve.com/reference/specialcharacters.php
shows that removing all of these special chars is no simple task.
Is there an easy way of doing this that I am overlooking?