Page encoding and browsers (IE in particular)

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi...

Have another thread going on in scripting.jscript trying to work around some
deficiencies in the way IE and IIS interact.

The nub of it is this: ASP.Net explicitly sets an output encoding header
which IE seems to want to ignore most of the time. At the same time, ASP.Net
emits a properly encoded stream *but* doesn't output the BOM on utf-* output
which IE would use and respect if it had it.

We already derive a lot of things from System.Web.UI.Page, and I was
wondering if there was any combination of properties that could be overridden
so that when the page's encoding settings are made/changed we could take a
look and output a BOM if it's utf-*? I looked at Page.ResponseEncoding
(string, doesn't appear virtual), Page.CodePage (int, doesn't appear
virtual), etc. Didn't see anything that would be easy to grab onto that
would control the interaction.

To give a little more detail, we found that (by the usual default) our
ASP.Net pages output utf-8. We're also serving client-side script via an
aspx page. The output is properly-encoded utf-8 and the ; charset=utf-8
header is right, but when the client-side script is executed, IE is ignoring
that and interpreting the script in latin1, so accented characters in string
literals are coming out as a lot of A-enyas and copyright symbols.

I found that if the output stream included the BOM, IE was smarter and
interpreted the script right. I'm trying to implement a more rigorous answer
- insert some handling so that I can tell when a Page encoding is being set I
can output the BOM when it's utf-*.

Still a little perplexed why ASP.Net and IE aren't playing better with each
other, but Firefox doesn't need the BOM for extra help. Having the BOM
doesn't appear to hurt Firefox, though, so it seems like a decent work-around.

Thanks
Mark
 
Hello Mark,
Thanks for your investigation.

I agree with you. This seems like an issue with IE. IE ignore the encoding
header (UTF-8) rendered by ASP.NET page, but interpret the script in
latin1. Thereby, some characters are getting garbled.
I think the workaround of this issue is to specify encoding in HTML page
directly. For example: We can add charset="UTF-8" into Script tag. After
that, IE will interpret the script in UTF-8 ecnding.

<script src="Default2.aspx" language="JavaScript"
*charset="UTF-8"*></script>

Is it possible for you to suggest your customer to specify the charset in
his page? I think this maybe a workaround. Could you please try my
suggestion and let me know if this works on your side?

Hope this helps. Please feel free to update here again, if there is
anything we can help with. We are glad to assist you.

Have a great day,
Best regards,

Wen Yuan
Microsoft Online Community Support
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Hi Wen Yuan...

Specifying the charset in the <script> tag as well does appear to work, but
I don't know if we can get all of our customers to modify all of their pages.

That's why I was wondering if there was any wiring in the ASP.Net Page
infrastructure through which all of the encoding options ran. If we can
override that function, we can make it issue the BOM whenever the encoding is
set to a utf-* family.

Not that I have great hope that there is such a spot. I mean, one can
always set Response.Charset in the code after the page is well under way, but
I wanted to give at least a good default behavior.

Thanks
Mark
 
Hello Mark,
Thanks for your reply.

I tried it again. Cool! You are right. Adding BOM in ASPX page could
resolve this issue.

We needn't to override any function. If we want to add BOM at beginning of
ASPX page., we can use Response.BinaryWrite function to output EFBBBF (if
encoding is UTF-8) in the page.

Below is the code snippet which I used to achieve that. I tried it and it
works fine.

byte[] b = new byte[] { 0xEf, 0xBB, 0xBF };
Response.BinaryWrite(b);
Response.Write ("document.write('<div>");
.....

Hope this helps. Please let me know if there is anything unclear. We are
glad to assist you.

Have a great day,
Best regards,

Wen Yuan
Microsoft Online Community Support
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Hi Wen Yuan...

I was hoping to implement something in a direct descendant of Page that
would react intelligently to a change of the page encoding (say overriding
the .CodePage or .ResponseEncoding properties). When the set encoding is
going to be a utf-* derivative I could output a BOM; that sort of thing.

It doesn't look like those properties are overridable, so I suppose I could
put it in Page.Load or something like that.

Given that someone can set Page.Response.Charset="" anywhere along the line
there's no way to button it down completely, but I'm trying to get the best
default behavior.

Thanks
Mark
 
Hi Wen Yuan...

Just FYI, here's the incantation I came up with:

if (Response.ContentEncoding.WebName.StartsWith("utf-"))
{
char BOM = (char)0xfeff;
Response.Write (BOM);
}

The BOM char will be serialized either as 0xEf, 0xBB, 0xBF for utf-8 and as
appropriate for utf-16 and utf-32. It will also *not* output a BOM unless
the page encoding is set to something utf.

Mark
 
Hello Mark,
Thanks for your reply.

I'm sorry, but I don't think we can override encoding properties of Page
class to output BOM.
Just as what you do now, putting it in Page.Load/Init event seems like the
correct way.
My suggestion is that you may add it in Page.init event. Thereby, we can
make sure BOM will be outputted at beginning of page.

Hope this helps. Please feel free to update here again, if there is
anything we can help with. We are glad to assist you.

Have a great day,
Best regards,

Wen Yuan
Microsoft Online Community Support
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Hi Wen Yuan,

Is there any way to tell if an HttpResponse output stream has been written
to? I tried looking at HttpResponse.OutputStream.Length and .Position but
both throw not-implemented errors.

The problem with issuing a BOM in Page.Init or load is that some pages don't
issue just text. If the page is going to, say, do a Response.BinaryWrite,
putting a BOM in front of it is a bad thing.

If someone has already *done* a Response.Write somewhere above, putting out
a BOM would be bad too.

That's why I was looking for some way of telling if a stream has already
been used.

Thanks
Mark
 
Hello Mark,

Length and Position property of Stream class is not implemented by default.
As a result, we cannot attempt to read the length of the response output in
the page events. This is limitation.

The workaround is to write a customer class derived from Steam, calculate
the length in Write method by ourselves. Then, set it to
HttpResponse.Filter. Thereby, all HTTP output sent by Write will pass
through this filter.
You may refer to the following document.
http://www.ericis.com/posts/default.aspx?id=244
[Obtaining Response.OutputStream.Length..]

By the way, if you have enabled Response.Buffer, please call
Response.Flush() method before check the length. Thanks.

Hope this helps, please feel free to update here again, if you have any
more concern. We are glad to assist you.

Have a great day,
Best regards,

Wen Yuan
Microsoft Online Community Support
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Hello Mark,

This is Wen Yuan again. How is the issue going now?
I just want to check if there is anything we can help with. Please feel
free to update here again.
We are glad to assist you.

Have a great day,
Best regards,

Wen Yuan
Microsoft Online Community Support
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Back
Top