Unicode/UTF-8 decoding

  • Thread starter Thread starter Bill Nguyen
  • Start date Start date
B

Bill Nguyen

Below are sometext I extracted from a mySQL database. How can I decode them
so that I can read them in Unicode?
Thanks

Bill
 
Bill Nguyen said:
Below are sometext I extracted from a mySQL database. How can I decode
them so that I can read them in Unicode?


If you have VS for .Net 2003 or 2005, then you can go to Help/Index/Visual
Basic and
enter Unicode UTF-8 in the Search box. It will give you the whole section
with program examples of how to do UTF-8, UTF-16, etc, etc
encoding/decoding.
 
Bill said:
Below are sometext I extracted from a mySQL database. How can I decode
them so that I can read them in Unicode?
Thanks

Bill

This text looks as it has been decoded with a different encoding than
was used to encode it. It might be possible to recreate the data if you
know what encodings was used to encode and decode it. Then you might be
able to encode it back to it's prevois state and use the proper encoding
to decode it. There is a great risk that some data has been lost,
though, and that you can't recreate the original data from this stage.

If you want to store unicode strings in the MySQL database, it has to be
set up to use unicode as character set.
 
I set UTF-8 as the default encoding in mySQL.
I don't really know how this work, but IE or Firefox browser can decode
easily.
This is the test:
I put the lines below in an HTML document and viewed it in IE, and it
worked. (make sure to set encoding to UTF-8 in VIEW).
I include the test.htm for your testing. (The text is in Vietnamese).
So I think what I need is to find a utility that has the same function that
might already be available out there. Any help is greatly appreciated.

Bill

----------------
<html>

<head></head>

<body>


Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh há»"n thÆ¡ Việt trên sân ga Tokyo chiá»Âu cuối năm


</body>

</html>
 
Bill said:
I set UTF-8 as the default encoding in mySQL.
I don't really know how this work, but IE or Firefox browser can decode
easily.
This is the test:
I put the lines below in an HTML document and viewed it in IE, and it
worked. (make sure to set encoding to UTF-8 in VIEW).
I include the test.htm for your testing. (The text is in Vietnamese).
So I think what I need is to find a utility that has the same function
that might already be available out there. Any help is greatly appreciated.

Bill

----------------
<html>

<head></head>

<body>





</body>

</html>




sân ga Tokyo chiá»u cuối năm

You are doing exactly what I was talking about. If you read the data
using the wrong encoding, then save it using the same encoding, you can
then open it using the corrent encoding, provided that the process
hasn't removed any data.

If you have set up your MySQL database to use unicode, and still get the
string out in that manner, the error is before you even saved the string
in the database in the first place. What you have done is basically:

unicode -> bytes -> wrong encoding -> MySQL -> wrong encoding -> html ->
bytes -> browser -> unicode

While this gives the correct result for some strings, some byte codes
used in UTF-8 doesn't represent a single character by themselves, so if
you contine to store mis-decoded strings as unicode, you will sooner or
later experience corrupted strings.
 
Göran ;

I think you are correct. However, not much I can do since I can not change
the host server parameters.
I am using SQLyog to access mySQL remotely. What I need is to be able to
read the data in its correct format/encoding scheme. Is it possible with
..NET ?

Thanks

Bill
 
Bill said:
Göran ;

I think you are correct. However, not much I can do since I can not change
the host server parameters.
I am using SQLyog to access mySQL remotely. What I need is to be able to
read the data in its correct format/encoding scheme. Is it possible with
.NET ?

Thanks

Bill

Yes, it's possible in .NET.

Strictly speaking you can't read it using the correct encoding, as it's
not stored using the correct encoding. You can only read it the same way
it's stored, then you have to reverse the process by encoding it using
the same wrong encoding and decoding it using the correct encoding.

As I said earlier, this will not work for all strings, so if you want a
system that works correctly, you have to change how the data is stored
in the database.
 
Back
Top