Which Character Set to Specify (UTF-8 or ISO-8859-1) in ASP.NET Pages?

  • Thread starter Thread starter Jordan S.
  • Start date Start date
J

Jordan S.

I have observed that there are a couple of character sets that are
frequently specified in the meta tags. From my cursory review of public Web
sites, it appears that the character set is either UTF-8 or ISO-8859-1

I am wondering what is the importance of this meta tag, and what are the
important implications of specifying one character set over the other in
ASP.NET pages? What does it matter to the server, if anything? What does it
matter to the browser, if anything?

FWIW: Here is what I have observed from different Web sites:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" >
CodeProject.com

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" />
NBA.COM

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
ASP.NET

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
microsoft.com

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
redhat.com

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
projectseven.com

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
oracle.com

<meta http-equiv="content-type" content="text/html; charset=UTF-8">
sun.com

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
wikipedia.org

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
amamotocross.com

<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
bankofamerica.com


Thanks!
 
Hi Jordan,
The character sets are primarily used to let the browser know the encoding
type that has been used for the web page being rendered. If it is not
specified then the browsers use the reader's preferred encoding when there is
no explicit charset parameter.

regards,
Joy
 
I have observed that there are a couple of character sets that are
frequently specified in the meta tags. From my cursory review of public Web
sites, it appears that the character set is either UTF-8 or ISO-8859-1

I am wondering what is the importance of this meta tag, and what are the
important implications of specifying one character set over the other in
ASP.NET pages? What does it matter to the server, if anything? What does it
matter to the browser, if anything?

Very short answer: that meta tells the browser what the encoding of the web
page is. If the browser gets the code page wrong, some characters will
apear damaged.

ISO-8859-1 can only be used for western-european languages, but even there is
lacking (no copyright, trademark, smart quotes, m-dash, n-dash, etc.)
(it is possible to use any character in a ISO-8859-1 page by using html
entities, some of them symbolic (think &copy;), some with Unicode values
(think ᦍ))

UTF-8 can support any language, no problems. Recently utf-8 passed 50% for
all the web pages: http://news.cnet.com/8301-13580_3-9936329-39.html

In the ASP.NET world, everything is Unicode. In the Windows
NT/2000/Vista/XP/2003 world, everything is Unicode again.
So going thru a iso-8859-1 has a performace penalty both on server side,
and on client side. Plus risk of corrupted characters.

At this time and age, there is no good reason to stay with iso-8859-1, unless
you are forced for some bad legacy stuff (and even there, you should consider
going unicode and converting "at the edge").

And you will be happy with the decision when marketing will come with the
first request for a language not supported by iso-8859-1 :-)
 
For both redhat.com and amamotocross.com the server HTTP header response
declare Content-Type to be "text/html; charset=UTF-8"
That overrides the meta in the page (according to the web standards)
(In my opinion bad standard idea. Why the web server administrator
at some ISP should have the power to override what I declare in my
file about my file, I don't know!)


As for bankofamerica.com and NBA.COM, they are both clearly American,
and clearly non-technical. I would say "they don't care, and don't have a
clue"

And same for the ones that are inconsistent between the server and the page.
Their stuff works "by mistake" :-)
 
Mihai N. said:
For both redhat.com and amamotocross.com the server HTTP header response
declare Content-Type to be "text/html; charset=UTF-8"
That overrides the meta in the page (according to the web standards)
(In my opinion bad standard idea. Why the web server administrator
at some ISP should have the power to override what I declare in my
file about my file, I don't know!)


I don't agree. The purpose of a HTTP-EQUIV meta tag is to substitute for
the header when the resource is being loaded from a non-HTTP source.

For example its useful if your intention for your HTML is that it can be
opened from the windows file system. In such case HTTP headers that might
modify what the browser does (such as how it interprets the octets as
characters) are not available. The HTTP-EQUIV headers allow you to fill in
those details.

Unless you really want to allow the user to save the HTML for reloading
independantly of your web site (a rare requirement) there is no need to use
HTTP-EQUIV headers in ASPX pages.

Instead use real HTTP headers. These can be set using the object exposed by
the ASPX page's Response property.

In the specific case of ContentType and CharSet these can be configured in
the <% @Page line and this is the best place to do this.
 
Jordan S. said:
I have observed that there are a couple of character sets that are
frequently specified in the meta tags. From my cursory review of public Web
sites, it appears that the character set is either UTF-8 or ISO-8859-1

I am wondering what is the importance of this meta tag, and what are the
important implications of specifying one character set over the other in
ASP.NET pages? What does it matter to the server, if anything? What does it
matter to the browser, if anything?

FWIW: Here is what I have observed from different Web sites:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" >
CodeProject.com

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" />
NBA.COM

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
ASP.NET

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
microsoft.com

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
redhat.com

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
projectseven.com

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
oracle.com

<meta http-equiv="content-type" content="text/html; charset=UTF-8">
sun.com

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
wikipedia.org

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
amamotocross.com

<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
bankofamerica.com


Don't bother with it. By default ASP.NET sends Content-Type: "text/html;
charset=utf-8" and you should leave it as that.
 
<snip>

Regarding:
Instead use real HTTP headers. These can be set using the object exposed
by
the ASPX page's Response property.

How can I view these "real HTTP headers"? Do they appear in the rendered
page's markup? I would like to be able to verify the values sent to the
browser.

I'm working with a Page-derived class served via custom http handler
factory - so there is no physical aspx page to speak of, much less set @Page
directives on the traditional way. So I'm setting properties directly on the
Response object and want to verify that what I set server-side is what the
client is getting.

Thanks.
 
I don't agree. The purpose of a HTTP-EQUIV meta tag is to substitute for
the header when the resource is being loaded from a non-HTTP source.

You are entitled to your opinion.
I guess this was also the thinking of the W3C.

But it is bad, and this is why:
The HTTP response header is decided by the HTTP server, bases usually on a
global configuration, controled by the ISP.

The HTTP-EQUIV meta tag is something I add, it applies to the content of
a file that I created, it's mine, and I know way better what's inside.
So why would the ISP decision override mine?

Use case:
1. I have international customers fully Unicode aware.
2. For them I save the html files as UTF-8 (less pain overall)
3. My ISP configured it's server to send ISO-8891-1 in the HTTP response
Why should the ISP decision take precedence over mine?

What is the solution here? Ask the ISP to switch to UTF-8?
They might have thousands of customers that don't care, or the change might
break something for them.
Change the ISP? Ok, possible.

But why all this pain? Because of a bad decision in the standard.
 
In the specific case of ContentType and CharSet these can be configured in
the <% @Page line and this is the best place to do this.

And by the way: read again the specs.
<% @Page ContentType and CharSet do not affect the HTTP response in any way.
Just affect the generated .html only.

In order to change that "you will need the appropriate administrative rights"
(http://www.w3.org/International/O-HTTP-charset)
 
Mihai N. said:
And by the way: read again the specs.
<% @Page ContentType and CharSet do not affect the HTTP response in any
way.
Just affect the generated .html only.

In order to change that "you will need the appropriate administrative
rights"
(http://www.w3.org/International/O-HTTP-charset)


FWIW: I am the ISP (and therefore have complete administrative access...
IIS7 on Windows Server 2008).

-J
 
FWIW: I am the ISP (and therefore have complete administrative access...
IIS7 on Windows Server 2008).

Then you have no problem :-)
But the discution was in principle, why I think the standard got it wrong.
 
Also note that by default the ASPX compiler converts all text literals
to UTF-8 so that no UTF-16 to UTF-8 conversion has to waste cpu
resources at runtime - literals are streamed directly out from
resources in DLL to output stream

Which is another reason to try and stick to UTF-8 for web pages.

And copyright *is* part of ISO-8859-1 aka Latin-1

http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT

The next concern is the UTF-8 BOM byte order mark - some browsers seem
to handle this incorrectly - some web servers do not send it etc.

Sometimes you can see the BOM by forcing a UTF-8 page to display as
Latin-1
 
Back
Top