Y
yonido
hello,
my goal is to get patterns out of email files - say "message
forwarding" patterns (message forwarded from: xx to: yy subject: zz)
now lets say there are tons of these patterns (by gmail, outlook, etc)
- and i want to create some rules of how to get them out of the mail's
html body.
so at first i tried using regular expressions: for example - "any
pattern that starts with a <p> and contains "from:"..." etc.
then i understood that its not that simple, because different engines
change the content of them html - and i cant expect spefic tags (what
if a <p> is added? or a <span>)
then ive been guided to use an html parser, heard of GOLD and ANTLR.
but no clue how that can help.
html parsing sounds better - because i really care for what the final
SEEN result is, and not the STRUCTURE of it.
any slightest light of how this problem would be appreceated.
my goal is to get patterns out of email files - say "message
forwarding" patterns (message forwarded from: xx to: yy subject: zz)
now lets say there are tons of these patterns (by gmail, outlook, etc)
- and i want to create some rules of how to get them out of the mail's
html body.
so at first i tried using regular expressions: for example - "any
pattern that starts with a <p> and contains "from:"..." etc.
then i understood that its not that simple, because different engines
change the content of them html - and i cant expect spefic tags (what
if a <p> is added? or a <span>)
then ive been guided to use an html parser, heard of GOLD and ANTLR.
but no clue how that can help.
html parsing sounds better - because i really care for what the final
SEEN result is, and not the STRUCTURE of it.
any slightest light of how this problem would be appreceated.