System.IO writers and the BOM getting in the way of integration pr

Guest · Jul 6, 2005

I'm working on an integration project to move data from my own XML format to
a legacy file format, so this file may be manually imported into the
downstream app.

The downstream app internally recognizes Unicode characters. However, the
import engine on the downstream app chokes on the byte-order-marks BOM in the
files created with anything but ASCII encoding when I write the file.

I have no means for upgrading or fixing this issue in the downstream app.

If I use Excel to save the Unicode / UTF8 file as ANSI, the lone unicode
character in the file is preserved but the BOM is removed. The ANSI file
imports without problem in the downstream app, and the unicode character is
recognized and is processed as expected.

In the System.IO classes, every file I create gets a BOM. How can I avoid
this, or, how can I strip off the BOM? If I write the file with the ASCII
encoding, I get ? instead of my unicode character. How does Excel preserve
the character but strip the BOM?

I realize this is not an elegant design ... but legacy format is critically
important to my customer. Thanks to all in advance,

-David

Joerg Jooss · Jul 6, 2005

dbaldi said:
I'm working on an integration project to move data from my own XML
format to a legacy file format, so this file may be manually imported
into the downstream app.

The downstream app internally recognizes Unicode characters. However,
the import engine on the downstream app chokes on the
byte-order-marks BOM in the files created with anything but ASCII
encoding when I write the file.

I have no means for upgrading or fixing this issue in the downstream
app.

If I use Excel to save the Unicode / UTF8 file as ANSI, the lone
unicode character in the file is preserved but the BOM is removed.

There's no concept like BOMs in Windows-125x (aka ANSI) nor in ASCII,
so this isn't really surprising.

The ANSI file imports without problem in the downstream app, and the
unicode character is recognized and is processed as expected.

In the System.IO classes, every file I create gets a BOM.

What classes/methods do you use?

How can I
avoid this, or, how can I strip off the BOM? If I write the file with
the ASCII encoding, I get ? instead of my unicode character. How does
Excel preserve the character but strip the BOM?

Writing a BOM is purely optional for UTF-8. You can suppress it by
creating your own UTF8Encoding object:

Encoding utf8 = new UTF8Encoding(false); // false suppresses the BOM

Note that the default UTF8Encoding instance exposed as Encoding.UTF8
*does* emit a BOM. That's probably the root cause for all your problems.

Cheers,

Guest · Jul 6, 2005

Duh. I can't beleive I didn't see that constructor overload. thanks!!!

Guest · Jul 7, 2005

Still having issues... not able to exactly reproduce what Excel does, which
suggests I misinterpreted the issue in the first place.

When I use this to create my StreamWriter

_writer = new System.IO.StreamWriter( file, false, new UnicodeEncoding());

Then my "bullet" character comes out OK. But, of course, its got the BOM so
the file fails to import into the downstream app.

BUT, if I use this, to avoid the BOM:

_writer = new System.IO.StreamWriter( file, false, new
UnicodeEncoding(false, false));

Then the bullet character is replaced by

[space] in the output. I
tried big and little endian on a lark, same results.

streamwriter and pound character	1	Jan 31, 2007
Unicode character in non-unicode text file	6	Jul 7, 2005
C# and encodings	30	Feb 3, 2009
BOM and Unicode CSV's	2	May 3, 2005
How can I make Excel save Unicode CSV data correctly?	0	Jul 17, 2008
Help!! Convert file encoding	2	Sep 2, 2008
Unicode Character Issue	10	Jun 4, 2008
System.IO.StreamWriter uses two bytes for ASCII characters with UT	5	Aug 30, 2006

System.IO writers and the BOM getting in the way of integration pr

Guest

Joerg Jooss

Guest

Guest

Ask a Question

Similar Threads