J
Jon Davis
I have a software application I've written called PowerBlog (PowerBlog.net)
that takes the editing capability of the Internet Explorer WebBrowser
control (essentially a DHTMLTextBox), extracts the user-typed HTML, assigns
it as an XML node's InnerText property (using C#: System.Xml.XmlDocument
obj; obj.InnerText = myHTML). Then I later get the InnerText as a string and
write to disk.
When this text is displayed in a web browser, special characters that are
beyond the standard ASCII charset are not rendered correctly. Frequently, I
have copied text from a web site, pasted in the DHTMLTextbox, saved, and
published it, and my published output has corrupt characters. However, prior
to publishing, when previewing my document it looks fine -- it is only when
it is published (extracted, written to disk, uploaded to the server via FTP,
downloaded via HTTP) that the corruption occurs.
There are several places where this problem could be occurring, and I don't
know how to figure it out.
- A "design feature" in the XmlNode's InnerText property that converts the
&###; encoding into an actual character.
- An encoding flaw when written to disk (currently I'm using the default,
UTF-8 I guess).
- A flaw in the FTP client class where the file is being corrupted during
upload (I think I'm using binary upload format but perhaps I should
double-check).
- A flaw in IIS (no known strange settings exist)
I still need to do some homework on this but I was wondering if anyone has
any bright ideas before I continue searching this out?
Thanks,
Jon
that takes the editing capability of the Internet Explorer WebBrowser
control (essentially a DHTMLTextBox), extracts the user-typed HTML, assigns
it as an XML node's InnerText property (using C#: System.Xml.XmlDocument
obj; obj.InnerText = myHTML). Then I later get the InnerText as a string and
write to disk.
When this text is displayed in a web browser, special characters that are
beyond the standard ASCII charset are not rendered correctly. Frequently, I
have copied text from a web site, pasted in the DHTMLTextbox, saved, and
published it, and my published output has corrupt characters. However, prior
to publishing, when previewing my document it looks fine -- it is only when
it is published (extracted, written to disk, uploaded to the server via FTP,
downloaded via HTTP) that the corruption occurs.
There are several places where this problem could be occurring, and I don't
know how to figure it out.
- A "design feature" in the XmlNode's InnerText property that converts the
&###; encoding into an actual character.
- An encoding flaw when written to disk (currently I'm using the default,
UTF-8 I guess).
- A flaw in the FTP client class where the file is being corrupted during
upload (I think I'm using binary upload format but perhaps I should
double-check).
- A flaw in IIS (no known strange settings exist)
I still need to do some homework on this but I was wondering if anyone has
any bright ideas before I continue searching this out?
Thanks,
Jon