Reading XML document with blank lines at top

  • Thread starter Thread starter Ryan S
  • Start date Start date
R

Ryan S

I am trying to read an XML document generated by a web server using the
XMLTextReader class, but the document generated appears to have some blank
lines at the top that are causing problems.

If I connect directly to the URL, when I call the ".Read" method on the
XMLTextReader, I get the message:

"The XML declaration is unexpected. Line 8, Column 3"

If I open the URL in my browser and copy and paste the XML stuff (starting
with <?xml...) to a text file, save it locally and then access that file
using the XMLText Reader, all is well. It appears as if a bunch (7,
actually) of blank lines are prepended to the top of the file from the web
server so that they appear before the <?xml... declaration.

Is this an an example of a non-well-formed document?

Is there a way to have the XMLTextReader skip over these lines, or is there
a way for me to intercept the stream and only start parsing at the
appropriate point?

Thanks,
Ryan

BTW, I am specifically trying to call the APIs from BMC's Patrol Express
product which return an XML document.
 
Ryan said:
I am trying to read an XML document generated by a web server using
the XMLTextReader class, but the document generated appears to have
some blank lines at the top that are causing problems.

If I connect directly to the URL, when I call the ".Read" method on
the XMLTextReader, I get the message:

"The XML declaration is unexpected. Line 8, Column 3"

If I open the URL in my browser and copy and paste the XML stuff
(starting with <?xml...) to a text file, save it locally and then
access that file using the XMLText Reader, all is well. It appears
as if a bunch (7, actually) of blank lines are prepended to the top
of the file from the web server so that they appear before the
<?xml... declaration.

Is this an an example of a non-well-formed document?

It should work, as far as I know.

Are you sure they're blank lines, and not HTTP headers? Headers would
confuse the hell out of an XmlReader, and would also indeed not show if you
copy/past the URL in a browser.
 
Sven,

It does not appear that they are HTTP headers. When I view the source
(using both IE and Mozilla). I definetly see 7 blank lines above the <?xml
.... > line. I also packet sniffed the transaction between the server and my
broswer, I definetly see a bunch of CrLf characters being sent just prior to
<?xml. I also some HTTP packet headers like Server:, Content-Type: and
others. Will these headers cause trouble? If so, any suggestions on
passing over them to get to the XML stuff?

-Ryan
 
Ryan said:
If I open the URL in my browser and copy and paste the XML stuff (starting
with <?xml...) to a text file, save it locally and then access that file
using the XMLText Reader, all is well. It appears as if a bunch (7,
actually) of blank lines are prepended to the top of the file from the web
server so that they appear before the <?xml... declaration.

Is this an an example of a non-well-formed document?

Yep. In a well-formed XML document nothing (but unicode byte-order mark)
can precede XML declaration. It must be the very first thing in the
document.
 
Thanks for everyone's help and comments. I solved my problem by reading the
HTTP stream and dumping it, starting with "<?xml", into a temp file which I
then call with DataSet.ReadXML on.

-Ryan
 
Back
Top