which unicode encoding to use for saving ?

  • Thread starter Thread starter THY
  • Start date Start date
T

THY

Hi,

I am developing a website in english & chinese both language. whenever I
save, it required I set the encoding in advanced save options. But I found
there are 4 related to unicode, can anyone tell me what's the different and
which to choose ?

Unicode (UTF-8 without signature) - Codepage 65001
Unicode (UTF-8 with signature) - Codepage 65001
Unicode - Codepage 1200
Unicode (Big-Endian) - Codepage 1201

thanks,
Tee
 
Joerg Jooss said:

That is, indeed, a very nice article. It does have one problem, however. It
implies that using UTF-8 for all web pages is an OK thing to do because most
browsers have UTF-8 support. This is true, but UTF-8 causes huge bloat in
the byte count for some languages. Chinese is a great example. In my opinion
page size still matters, and you can greatly optimize the page size in many
cases if you customize the encoding to match the primary language of the
page.

For example, a typical block of Chinese text will take three times as much
space in UTF-8 as it will using Big5. Characters that don't exist in Big5
can be encoded as &# entities. Browsers that people use to read Chinese are
very likely to support Big5, so in my opinion you should use Big5 encoding
for Chinese pages. ASP.NET makes this very easy to do. This conserves
Internet bandwidth, saves space in proxy servers, saves space in your local
cache, reduces download times for those unfortunate modem and ISDN users,
etc.

In web pages there's going to be a lot of ASCII characters (HTML tags and so
forth) mixed in with the Chinese, so your actual savings will be less than
3-to-1, but for the bulk of Chinese content pages there will be a
significant savings. I'm just using Chinese as an example--pick any
non-European language and the result will likely be similar.
 
Back
Top