System.Text.StreamWriter encoding issue (Framework 1.1)

  • Thread starter Thread starter dm_dal
  • Start date Start date
D

dm_dal

Just wondering if anyone has any suggestions.

I'm writing out a csv file that needs to be opened in Excel. If I set the
StreamWriters encoding to UTF8 (default) or ASCII, some of the decimal
fields in the file are written out incorrectly.

Example: 10.00 gets written as 1000 (the decimal is removed but trailing
zeros are left.

This doesn't happen to all decimal numbers in the file so it appeares to be
a random issue. I have noticed that if I change the StreamWriters encoding
to Unicode, the issue disappears, but, the file doesn't parse correctly when
opened by Excel by double clicking the file.

Also, when I run this process on another app server (not production), the
decimals do not get dropped (when using UTF8 or ASCII), which makes me think
it's a configuration issue on the production server.

Here's some sample code:

string filePath = "c:\\temp\\myfile.csv";
System.IO.StreamWriter w = new
System.IO.StreamWriter(filePath,false,System.Text.Encoding.Unicode);
w.NewLine = "\r\n";
StringBuilder crsb = new StringBuilder();
foreach(DataColumn dc in ds.MyTable.Columns)
{
crsb.Append(dc.ColumnName + ",");
}
w.WriteLine(crsb.ToString());
foreach(SampleDataSet.MyTableRow row in ds.MyTable.Rows)
{
StringBuilder sb = new StringBuilder();
foreach(DataColumn dc in row.Table.Columns)
{
if(row[dc] != null)
{
sb.Append(row[dc].ToString() + ",");
}
else
{
sb.Append(",");
}
}
w.WriteLine(sb.ToString());
}
w.Flush();
w.Close();

I'd really like to be able to use the UTF8 encoding (works better with
Excel) but I'm not sure where to look on the server configuration side.
 
dm_dal said:
Just wondering if anyone has any suggestions.

I'm writing out a csv file that needs to be opened in Excel. If I set the
StreamWriters encoding to UTF8 (default) or ASCII, some of the decimal
fields in the file are written out incorrectly.

Example: 10.00 gets written as 1000 (the decimal is removed but trailing
zeros are left.

I find that hard to believe, to be honest.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

Have you looked at the file produced using notepad instead of just
Excel? The file not opening properly in Excel doesn't necessarily mean
that it's the encoding which is at fault.
 
I'll work on putting that together. However, all that's missing is a
populated strongly typed dataset with some decimal columns.

Also, just in case you missed it in my original post, It's random and
arbitrary. If I run this program on my desktop, it doesn't happen. If I
run this program on our development server, it doesn't happen. If I run
this program on our staging server, it doesn't happen. It only happens when
it's running on the production server. Also, when we test it on the
development and staging environments, we are running it against the
production database, but still, it only happens when the program is run on
production.

We've verified the data in the column does have the decimal point. If you
Debug.Write the value out to the debugger window before writing to the file,
you get the decimal point. But, when you look at the file after writing,
the decimal point is omitted. Again, it doesn't happen to all decimal
columns and it doesn't always happen with the same decimal column.

David Young
 
dm_dal said:
I'll work on putting that together. However, all that's missing is a
populated strongly typed dataset with some decimal columns.

You shouldn't need to have a dataset at all - just feeding it actual
decimals should do just as well.
Also, just in case you missed it in my original post, It's random and
arbitrary. If I run this program on my desktop, it doesn't happen. If I
run this program on our development server, it doesn't happen. If I run
this program on our staging server, it doesn't happen. It only happens when
it's running on the production server. Also, when we test it on the
development and staging environments, we are running it against the
production database, but still, it only happens when the program is run on
production.

If this only happens on one server, you might want to have a look at
what culture that server is using. Is it the same as on the other
servers, or does it maybe have a different decimal point character?
We've verified the data in the column does have the decimal point. If you
Debug.Write the value out to the debugger window before writing to the file,
you get the decimal point. But, when you look at the file after writing,
the decimal point is omitted. Again, it doesn't happen to all decimal
columns and it doesn't always happen with the same decimal column.

Does it always happen to the same actual values? i.e. do all (say)
10.00s end up as 1000, or can you get one 10.00 correctly and one
messed up?

Is there any threading in the app which could be having strange
effects?

If you write to a StringWriter and then print the contents out in a
message box, do the same problems happen?
 
Jon Skeet said:
You shouldn't need to have a dataset at all - just feeding it actual
decimals should do just as well.

Thanks, I'll just do that.
If this only happens on one server, you might want to have a look at
what culture that server is using. Is it the same as on the other
servers, or does it maybe have a different decimal point character?

I originally thought about that, but, since it doesn't happen to all of the
decimal values, I abandoned that idea. I will look though.
Does it always happen to the same actual values? i.e. do all (say)
10.00s end up as 1000, or can you get one 10.00 correctly and one
messed up?

No, it doesn't alwasy happen to the same actual values. 10.00 is sometimes
10.00 and sometimes it's 1000
Is there any threading in the app which could be having strange
effects?

Threading is not used.
If you write to a StringWriter and then print the contents out in a
message box, do the same problems happen?

I just gave this a shot and here what happened. I instrumented my code with
a MessageBox.show at three points.
a) Where the data was being put into the dataset, and;
b) Where the data was being pulled out of the dataset to be written, and;
c) Just prior to the call to StringBuilder.WriteLine [ used
MessageBox.Show(StringBuilder.ToString()) ]

On this particular record, the value in the database was 10000.00. In all
three locations, the messagebox showed a value of 10000.00 However, when I
open the resulting csv file with my text editor and look at that row, it's
value is written as 1000000 Again, with the decimal missing.
 
Jon Skeet said:
You shouldn't need to have a dataset at all - just feeding it actual
decimals should do just as well.
I created a console app that builds a decimal array with 32 decimal
elements. Pumped them through the same StreamWriter and all the decimal
values were written out correctly. I'll try building a dataset within the
code and using that instead of an array, If the decimal values are written
incorrectly there, then I'll post the code.

David Young
 
dm_dal said:
If you write to a StringWriter and then print the contents out in a
message box, do the same problems happen?

I just gave this a shot and here what happened. I instrumented my code with
a MessageBox.show at three points.
a) Where the data was being put into the dataset, and;
b) Where the data was being pulled out of the dataset to be written, and;
c) Just prior to the call to StringBuilder.WriteLine [ used
MessageBox.Show(StringBuilder.ToString()) ]

On this particular record, the value in the database was 10000.00. In all
three locations, the messagebox showed a value of 10000.00 However, when I
open the resulting csv file with my text editor and look at that row, it's
value is written as 1000000 Again, with the decimal missing.

Could you mail me one of these sample files, and tell me where exactly
the problem is? I'm just wondering if the problem lies somewhere else -
it certainly sounds bizarre to be where it is.

Do you have many non-ASCII characters in the file? Is there any pattern
with them appearing "near" the problematic numbers?
 
dm_dal said:
Two sample files are included in the zip file.

Encoding.UTF8.csv - written using the StreamWriter and by setting the
StreamWriter's encoding property to System.Text.Encoding.UTF8
Encoding.Unicode.csv - written using the StreamWriter and by setting the
StreamWriters's encoding property to System.Text.Encoding.Unicode

If you look at line 736, character 228 of Encoding.UTF8.csv, the value
written is 1000000. That same record written to Encoding.Unicode.csv (shown
on line 735, character 228) is written as 10000.00

Another interesting anomolie is the fact that line 723 is split at character
124. That line is continued on line 724. This only occurs when the
encoding is set to UTF8.

Both of these files were created by the same program, the only difference
being the encoding used when writing the file.

Hmm. That's certainly very odd. Have you had any luck in duplicating
the results using the "dodgy" server but with a short but complete
program which we can see in its entirety?

Out of interest, is there any difference in the number of processors
between the development server and the "dodgy" one?
 
Jon Skeet said:
Hmm. That's certainly very odd. Have you had any luck in duplicating
the results using the "dodgy" server but with a short but complete
program which we can see in its entirety?

I'm working on that now, If you saw my post from 5:03 yesterday, you'll see
that I added 32 decimal values to an array and used the StringBuilder and
StreamWriter to write them out in csv format (writing 200 lines), but that
didn't produce any of the same errors. My next step is to create a DataSet
and populate it with decimal values and use the same approach to write out
the files. I'll have that today.
Out of interest, is there any difference in the number of processors
between the development server and the "dodgy" one?

Yes
The suspect server has 4 - 2.4Ghz Xeon procs, 1 Gb Ram, Win2k ver.
5.00.2195, Sp 3
The test server has 2 - 2.4Ghz. Xeon procs, 1 Gb Ram, Win2k ver. 5.00.2195,
Sp 4

That's interesting....I just noticed that the test server, dev server and my
workstation (it's working fine on all of these) are at SP4, while the
production server (where the errors are) is at SP3....hummmm Guess I'll
troll the Service Pack release notes to see if there's anything of interest
there.

David Young
 
dm_dal said:
I'm working on that now, If you saw my post from 5:03 yesterday, you'll see
that I added 32 decimal values to an array and used the StringBuilder and
StreamWriter to write them out in csv format (writing 200 lines), but that
didn't produce any of the same errors. My next step is to create a DataSet
and populate it with decimal values and use the same approach to write out
the files. I'll have that today.

Great. Apologies in advance if it doesn't help in the end - it just
seems the best way of going.
Yes
The suspect server has 4 - 2.4Ghz Xeon procs, 1 Gb Ram, Win2k ver.
5.00.2195, Sp 3
The test server has 2 - 2.4Ghz. Xeon procs, 1 Gb Ram, Win2k ver. 5.00.2195,
Sp 4

Hmm. I was hoping for more of a "dual proc" vs "single proc" split, but
it *might* still be relevant. Not sure I'd like to say exactly how
though...
That's interesting....I just noticed that the test server, dev server and my
workstation (it's working fine on all of these) are at SP4, while the
production server (where the errors are) is at SP3....hummmm Guess I'll
troll the Service Pack release notes to see if there's anything of interest
there.

Ah - definitely worth a try.
 
Back
Top