Bug in My.Computer.FileSystem.WriteAllText

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I've found that My.Computer.FileSystem.WriteAllText in .Net 2.0 writes three
additional, non-text bytes in front of the string it's supposed to be
writing. These bytes are not in the normal text range of values, and don't
show up if you load the file into Notepad or other text viewers, but can be
easily seen with Visual Studio 2005's Binary editor. Additionally, if you use
this method to write out an HTML file to be embedded in other HTML (like an
Outlook signature), these bytes get rendered. Finding this bug was a pain, to
say the least. I have an example VS2005 solution folder which shows the bug,
and would be happy to send it to someone at MS if anyone is interested, but
you can verify in about 2 minutes with a simple "Hello World!" output to a
new text file using the above method.
 
Bo said:
I've found that My.Computer.FileSystem.WriteAllText in .Net 2.0 writes three
additional, non-text bytes in front of the string it's supposed to be
writing. [...]

My guess is that you are seeing the text-encoding signature (the "byte
order mark") for the text file. It's not a bug, it's by design. If you
don't want the bytes, you need to make sure that the encoding is a plain
ASCII file.

From http://msdn2.microsoft.com/en-us/library/27t17sxs.aspx

"When no encoding is specified, UTF-8 is used. The byte order mark (BOM)
for the encoding is written to the file unless you specify
System.Text.Encoding.Default, which uses the system's current ANSI code
page."

Pete
 
Missed that, thanks. Too bad about the effect on things like Outlook, if my
opinion counted the straightforward option (you know, text + standard
whitespace in a text file) would be the default. Thanks for the response, it
helps to know to look out for these things. If I'd seen it I'd have saved
myself a lot of time.

Bo.

Peter Duniho said:
Bo said:
I've found that My.Computer.FileSystem.WriteAllText in .Net 2.0 writes three
additional, non-text bytes in front of the string it's supposed to be
writing. [...]

My guess is that you are seeing the text-encoding signature (the "byte
order mark") for the text file. It's not a bug, it's by design. If you
don't want the bytes, you need to make sure that the encoding is a plain
ASCII file.

From http://msdn2.microsoft.com/en-us/library/27t17sxs.aspx

"When no encoding is specified, UTF-8 is used. The byte order mark (BOM)
for the encoding is written to the file unless you specify
System.Text.Encoding.Default, which uses the system's current ANSI code
page."

Pete
 
Hi Bo,

The My.Computer.FileSystem.WriteAllText has two variations:

Public Sub WriteAllText( _
ByVal file As String, _
ByVal text As String, _
ByVal append As Boolean _
)
' -or-
Public Sub WriteAllText( _
ByVal file As String, _
ByVal text As String, _
ByVal append As Boolean, _
ByVal encoding As System.Text.Encoding _
)

The first one is using System.Text.Encoding.UTF8 as the default encoding.

The System.Text.Encoding.UTF8 is an instance of
System.Text.Encoding.UTF8Encoding.

The System.Text.Encoding.UTF8Encoding has a constructor which accepts a
boolean parameter named "encoderShouldEmitUTF8Identifier" (see
http://msdn2.microsoft.com/en-us/library/s064f8w2(VS.80).aspx): when it's
true (which is the case for the instance returned by
System.Text.Encoding.UTF8), it means that a Unicode byte order mark (short
for BOM) is provided.

Basically a BOM at the beginning of a file will help the applications that
are reading the file to determine what's the correct encoding of the file.

You may find more information about BOM here:

#Byte-order mark - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Byte_Order_Mark

#FAQ - UTF-8, UTF-16, UTF-32 & BOM
http://unicode.org/faq/utf_bom.html

#Byte Order Mark
http://msdn2.microsoft.com/en-us/library/ms776429.aspx

In .NET, when a encoding has specified BOM, the IO classes will output
those extra bytes in the beginning of a file when writing the file.

For notepad, it will try to detect the encoding from the BOM:

#The Old New Thing : Some files come up strange in Notepad
http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx

Hope this helps.


Sincerely,
Walter Wang ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
For MSDN subscribers whose posts are left unanswered, please check this
document: http://blogs.msdn.com/msdnts/pages/postingAlias.aspx

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express/Windows Mail, please make sure
you clear the check box "Tools/Options/Read: Get 300 headers at a time" to
see your reply promptly.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
Back
Top