problem with unicode

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frédéric".My problem is that when i
read this file .txt,the bytes of é are like this : 101,180.

é length are 2 in utf-8 file.how can i change this 2 length to 1.

my problem that i want to use é like 233 byte and not like 2 bytes 101,180

please help me

can i correct this problem after read the .txt file
 
Thus wrote tvin,
Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frédéric".My problem is that when
i
read this file .txt,the bytes of é are like this : 101,180.
é length are 2 in utf-8 file.how can i change this 2 length to 1.

That's how UTF-8 works.
my problem that i want to use é like 233 byte and not like 2 bytes
101,180

Then use an 8 bit encoding like Windows-1252 or ISO-8859-1 or -15.

But why on earth do you want *one* byte? It's not 1976 anymore, and no 8
bit encoding on this planet has a similar coverage as Unicode. BTW, all functionality
in the BCL to process text files uses UTF-8 by default.

Cheers,
 
Joerg

Joerg Jooss said:
Thus wrote tvin,


That's how UTF-8 works.


Then use an 8 bit encoding like Windows-1252 or ISO-8859-1 or -15.

But why on earth do you want *one* byte? It's not 1976 anymore, and no 8
bit encoding on this planet has a similar coverage as Unicode. BTW, all functionality
in the BCL to process text files uses UTF-8 by default.

Cheers,
joerg
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,

the lenght of frédéric should be 8 to insert correctly in sql database .
please help me jeorg
 
Thus wrote tvin,
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,
the lenght of frédéric should be 8 to insert correctly in sql database
. please help me jeorg

You're confusing bytes and characters. Frédéric has 10 characters, but it
may have 10 or more bytes depending on the character encoding being used
-- if you were using UTF-32, it would be a whopping 40 ;-)

Doesn't your database support the nvarchar type for Unicode characters?
 
Hi joerg

Joerg Jooss said:
Thus wrote tvin,


You're confusing bytes and characters. Frédéric has 10 characters, but it
may have 10 or more bytes depending on the character encoding being used
-- if you were using UTF-32, it would be a whopping 40 ;-)

Doesn't your database support the nvarchar type for Unicode characters?
jeorg ,my parameters are nvarchar in sql database.but
Frédéric which was insert in sql is like this:"Frédéric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Frédéric in sql like this "Frédéric".
i feel that the problem is when i open the .txt file to bring Frédéric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database
 
tvin said:
jeorg ,my parameters are nvarchar in sql database.but
Fr?d?ric which was insert in sql is like this:"Fr?d?ric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Fr?d?ric in sql like this "Fr?d?ric".
i feel that the problem is when i open the .txt file to bring Fr?d?ric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database

When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.
 
hi jon

Jon Skeet said:
When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.
i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()
I think that the problem is when i am reading the file because the length of
"frédéric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.

after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()

Dim encodedBytes As Byte() = uni.GetBytes(words)

Dim decodedString As String = uni.GetString(encodedBytes)

please help me
 
tvin said:
i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()

Okay - that will be reading it as a UTF-8 file.
I think that the problem is when i am reading the file because the length of
"fr?d?ric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.

I suspect the problem isn't the accents at all - my guess is that
you've got a carriage return and line feed at the end of the file, and
*that's* what's making the length 10.
after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(words)
Dim decodedString As String = uni.GetString(encodedBytes)

No, you don't need to do that. The above is effectively a no-op - *all*
strings in .NET are in Unicode.
 
jon thx for aiding me
but
when i use another word instead of "frédéric" i don't have any problem.
when i insert "frédéric" in sql,its like this "frédéric "

well,i don't know wath is the problem.

thx jon
 
Thus wrote Joerg,
Thus wrote tvin,

You're confusing bytes and characters. Frédéric has 10 characters, but
it may have 10 or more bytes depending on the character encoding being
used

Oops, the old counting issue crept up again... make that 8 characters ;-)
 
Back
Top