P
Peter Webb
I have to do some simple text editing to large-ish (2 Mbyte) html files
generated by Word. They are, I believe, in UTF-8.
It is the !%^&* problem where single apostrophes become sequences of funny
characters, some spaces are shown as unprintable characters, etc.
The following code just reads and writes a file, and shows the problem. It
applies whether I use a String or a StringBuilder, and whether or not I
explicitly force UTF-8 encoding.
Can somebody just tell me basically how to copy an html file by reading it
in and then writing it out, which is all the following methods are supposed
to do:
private StringBuilder getHtml(string fullfilename)
{
StringBuilder concatenated = new StringBuilder();
string line;
// Read the file and display it line by line.
System.IO.StreamReader htmlFile =
new
System.IO.StreamReader(fullfilename,System.Text.Encoding.UTF8);
while ((line = htmlFile.ReadLine()) != null)
concatenated.Append(line + "\r\n")
htmlFile.Close();
return concatenated;
}
private void writehtmltofile(string outputfilenamewithpath,
StringBuilder HTMLstring)
{
StreamWriter sw = new StreamWriter(outputfilenamewithpath
,false, System.Text.Encoding.UTF8);
{
sw.WriteLine(HTMLstring.ToString());
sw.Close();
};
sw.Dispose();
}
Any assistance greatly appreciated.
generated by Word. They are, I believe, in UTF-8.
It is the !%^&* problem where single apostrophes become sequences of funny
characters, some spaces are shown as unprintable characters, etc.
The following code just reads and writes a file, and shows the problem. It
applies whether I use a String or a StringBuilder, and whether or not I
explicitly force UTF-8 encoding.
Can somebody just tell me basically how to copy an html file by reading it
in and then writing it out, which is all the following methods are supposed
to do:
private StringBuilder getHtml(string fullfilename)
{
StringBuilder concatenated = new StringBuilder();
string line;
// Read the file and display it line by line.
System.IO.StreamReader htmlFile =
new
System.IO.StreamReader(fullfilename,System.Text.Encoding.UTF8);
while ((line = htmlFile.ReadLine()) != null)
concatenated.Append(line + "\r\n")
htmlFile.Close();
return concatenated;
}
private void writehtmltofile(string outputfilenamewithpath,
StringBuilder HTMLstring)
{
StreamWriter sw = new StreamWriter(outputfilenamewithpath
,false, System.Text.Encoding.UTF8);
{
sw.WriteLine(HTMLstring.ToString());
sw.Close();
};
sw.Dispose();
}
Any assistance greatly appreciated.