Convert to UTF-8

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

hello,

I'd like to convert an existing database to UTF-8 encoding because the
contents don't display properly on my new hosting provider's server.

They told me I have to convert the database to UTF-8.
How can I do this?

I searched for some kind of option/setting for language and encodings but
but had no luck.

Thanks
 
Current versions of Access use UTF-16 (or something very similar)
natively and I don't think you can change that.

When exporting and importing data you can select other encodings
including UTF-8. This isn't my area of expertise; the best I can suggest
is that you search groups.google.com for something like
"microsoft access" internationalization OR i18n
and/or
"access" utf-8 externaldata
 
I think that data is stored in an Access database
as UTF-2, or as compressed UTF-2 I think that UTF-2
is the native format for Windows. There is no way
to change the internal representation, and it is
entirely unrelated to what you see anyway.

When it comes out of the database, it is translated
into a Font (for screen display or printing) or into
a string variable, or into UTF-8 (ANSI for Americans)

For Americans, UTF-8 is ANSI: ANSI is UTF-8. This
is because the encoding for the characters used by
Americans is exactly the same in UTF-8 and ANSI.

As soon as you want to do anything international
(for example, the euro symbol), you start using characters
which are mapped differently in ANSI, and UTF-8, and in
the old windows/dos code pages. (we don't do code pages
any more).

You don't care what your contents display like on your
hosting provider's server. What you do care about
is what is displayed on your users Internet Browser.

Internet Browsers convert characters into Fonts. The
Font is controlled by the Style Sheet and the Browser.
The character type mapping is indicated by a line at
the top of each web page:

meta
http-equiv = "Content-Type"
content="text/html;
charset = utf-8"

If you have a line like that, the server must send the
characters to the browser in utf-8.

If you are sending a plain HTML file, that means you must
have saved and created the file in utf-8 (for Americans,
any text editor which saves in ANSI format).

If you are using Access to create DAP's with non-American
characters, it means that the DAP encoding must be set to
utf-8.

If you are using server side vbScript in ASP to create your
web pages, it means you must ask a vbScript/ASP person.


(david)
 
I think that data is stored in an Access database
as UTF-2, or as compressed UTF-2. I think that UTF-2
is the native format for Windows.

I'm happy to be proved wrong, but AIUI UTF-2 is the same as UTF-8.
Inspection of MDB files suggests that text is stored as UTF-16 if
Unicode compression is not enabled, and a mix of UTF-8 and UTF-16 (or
possibly of ANSI and UTF-16) if Unicode compression is enabled.
There is no way
to change the internal representation, and it is
entirely unrelated to what you see anyway.

Entirely agree.
When it comes out of the database, it is translated
into a Font (for screen display or printing) or into
a string variable, or into UTF-8 (ANSI for Americans)

For Americans, UTF-8 is ANSI: ANSI is UTF-8. This
is because the encoding for the characters used by
Americans is exactly the same in UTF-8 and ANSI.

"characters most commonly used": for instance, current versions of Arial
include glyphs for 1/3, 1/8, 3/8 etc., which are of use to Americans but
aren't in the US ANSI character set.

<snip>
 
I'm happy to be proved wrong, but AIUI UTF-2 is the same as UTF-8.

I'm no expert: I think I read something that was wrong. I meant
UTF-16, and I thought UTF-2 = UTF-16, rather than UTF-8. On further
reading, I wonder if Windows natively uses something like UCS-2?
ie, a version of UTF-16 limited to 2 byte characters?

(david)
 
I'm inclined to agree with your speculation about UCS-2. To test it I
suppose one would need to obtain or create a font containing some
characters that don't have 2-byte Unicode representations.

As for Access's Unicode compression: I wonder whether that's just a
smart term for using UTF-8 for strings that don't contain characters
that don't have 8-bit UTF-8 representations, and UTF-16 (or UCS-2)
otherwise.

I'm happy to be proved wrong, but AIUI UTF-2 is the same as UTF-8.

I'm no expert: I think I read something that was wrong. I meant
UTF-16, and I thought UTF-2 = UTF-16, rather than UTF-8. On further
reading, I wonder if Windows natively uses something like UCS-2?
ie, a version of UTF-16 limited to 2 byte characters?

(david)
 
.... a big thanks to both of you for the detailed clarifications/explanations!
... much appreciated.


John Nurick said:
I'm inclined to agree with your speculation about UCS-2. To test it I
suppose one would need to obtain or create a font containing some
characters that don't have 2-byte Unicode representations.

As for Access's Unicode compression: I wonder whether that's just a
smart term for using UTF-8 for strings that don't contain characters
that don't have 8-bit UTF-8 representations, and UTF-16 (or UCS-2)
otherwise.
 
Back
Top