wirting special characters out to excel in HTML

  • Thread starter Thread starter Matthew Shaw
  • Start date Start date
M

Matthew Shaw

We have a web-based reporting application written in J2EE
that writes out to excel using response.setContentType
("application/vnd.ms-excel; ")….

The problem is that where we have any special characters
in our report result set E.g umlauts and accents ( ASCII
values 128 to 165 ) this data is corrupted, and does not
appear correctly.

The standard font family used throughout our Web Reports
is Arial,
I have seen this handled by using Verdana, however we are
reluctant to change the fonts in all our reports.

I believe this relates to Excel performing a unicode
translation, unfortunately we require Excel functionality
to enable users to perform operations on the finished
reports.

Is there a font family similar in appearance to Arial that
will handle the unicode character set?
Or is there a mechanism to tell Excel not to perform this
conversion?

thanks.
 
Matthew Shaw said:
We have a web-based reporting application written in J2EE
that writes out to excel using response.setContentType
("application/vnd.ms-excel; ")….

The problem is that where we have any special characters
in our report result set E.g umlauts and accents ( ASCII
values 128 to 165 ) this data is corrupted, and does not
appear correctly.

There *are* no ASCII values 128-165. ASCII is 7-bit.

Now, when you say the data is "corrupted", what exactly happens?
Perhaps it's writing it out in UTF-8 or something similar? How are you
writing out the data in the first place, exactly? (i.e what file
format, etc.)
The standard font family used throughout our Web Reports
is Arial,
I have seen this handled by using Verdana, however we are
reluctant to change the fonts in all our reports.

I believe this relates to Excel performing a unicode
translation, unfortunately we require Excel functionality
to enable users to perform operations on the finished
reports.

Is there a font family similar in appearance to Arial that
will handle the unicode character set?

I'd be very surprised if it were the font which was at fault here -
althoguh I could certainly be wrong.
Or is there a mechanism to tell Excel not to perform this
conversion?

How exactly are you exporting from Excel? Or are you only *importing*
into Excel? If you can specify somewhere which character encoding to
use, and make sure you use the same one everywhere, you should be okay.
 
We are only importing into Excel. You can explicitly provide a character
encoding...

E.G application/vnd.ms-excel;charset=ISO-8859-1

which I believe is default, others include charset=windows-1251.

They do appear to be producing slightly different results, however none
of the ones I have tried can handle umlauts...

thanks.
 
Matthew Shaw said:
We are only importing into Excel. You can explicitly provide a character
encoding...

E.G application/vnd.ms-excel;charset=ISO-8859-1

Right - but if you explicitly provide the charset there, do you also
make sure your J2EE app is actually *using* that character set?
which I believe is default, others include charset=windows-1251.

Ah - if you're using 1251 that may well give different results to
ISO-8859-1. If you can get both sides to use UTF-8 I believe that's the
most likely to work for everything in a simple fashion.
They do appear to be producing slightly different results, however none
of the ones I have tried can handle umlauts...

Hmm... well, I hope the above is helpful...
 
I have tried the following
"application/vnd.ms-excel;charset=windows-1251",1250,1252

I believe the default is charset=ISO-8859-1

they do look as though they are altering the imported characters,
although they appear either as . , or ? , or just those wierd square
things that you get when you open a file using an editor that doesn't
support the file format.
 
Matthew Shaw said:
I have tried the following
"application/vnd.ms-excel;charset=windows-1251",1250,1252

I believe the default is charset=ISO-8859-1

they do look as though they are altering the imported characters,
although they appear either as . , or ? , or just those wierd square
things that you get when you open a file using an editor that doesn't
support the file format.

Hmm. Thing is, if it's really writing an Excel spreadsheet then it's a
binary file to start with, which is part of what confuses me - unless
it's actually just writing CSV data and using the content-type to
direct it to Excel...
 
Back
Top