Parse a html file as a XML file

  • Thread starter Thread starter Stan SR
  • Start date Start date
S

Stan SR

Hi,

I need to read a html file and parse it as a XML File.

All my html file have this structure.
<html>
<head>
<title>
</title>
<script language="javascript">
</script>
</head>
<body>
</body>
</html>

My code has to read some sections (title, script, body).
Everything works when the script language (javascript code) section has not
code or not a lot, but sometimes it fails when there are characters like ;
(especially in "for" statement).
So for that works, I had to add "decorate" the script section with
<![CDATA[ ]]> and it looks like

<script language="javascript">
<![CDATA[

]]>
</script>

Is there a way to parse the file without using the <![CDATA[ ]]> tag ?

Stan
 
Try <!-- and -->, which is a standard practice. I imagine some parsers will
still puke on this methodology, but it should solve the major issue.

Can you solve this without doing anything? Probably not. It is the nature of
freeform sections, which XML does not understand the same way HTML parsers
do, as the rules are more strict.

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
 
Back
Top