utf-8 and ascii

  • Thread starter Thread starter Meenu Mehta
  • Start date Start date
M

Meenu Mehta

I have a question. how to generate two files, one in UTF-8, the other in
ASCII with the same column length
SO that when i do the conversion from utf-8 to ascii or vice versa, the
column length does not change . any help is appreciated
thanks
 
I have a question. how to generate two files, one in UTF-8, the other in
ASCII with the same column length
SO that when i do the conversion from utf-8 to ascii or vice versa, the
column length does not change . any help is appreciated
thanks

It is quite important to get the terminology right:

The two files are identical, because (by definition) ASCII uses characters
in the range 0-127. In that range, UTF8 is identical with ASCII.

If what you want is not ASCII, but ANSI, we have to make clear what you mean
by "column length." You mean number of bytes, or number of characters?

If you mean number of characters, this is again not changed by ANSI - UTF8
conversion.

If you want number of bytes, this is not possible, because a character
above 127 in ANSI encoding can take between 2 and 4 bytes in UTF8
(depending on the ANSI code page), but never 1 byte.
So the number of bytes in UTF8 is guaranteed to be higher or equal to the
one in ANSI encoding.
 
Sorry for the cofusion. Here is what I meant to say.
I am genrating a file(.txt file, which is being opened with notepad),
the file has some data from some tables. The tables has fixed column
length, yet When i open in the notepad the column length changes. For ex
the data in one of the column is Republique Française. now the field
length in the table ( FoxPro database) is suppose 75. Yet when i open it
in the notepad it becomes 74. My problem is that when the encoding
changes from ASCII to UTF-8 , the field length ( or the column length )
for that value also changes. I know it is happening because no of bits
used in ASCII & UTF-8 are different. Is there soem way I can keep the
column length fixed to 75 only
Any help is appreciated
 
Meenu Mehta said:
Sorry for the cofusion. Here is what I meant to say.
I am genrating a file(.txt file, which is being opened with notepad),
the file has some data from some tables. The tables has fixed column
length, yet When i open in the notepad the column length changes. For ex
the data in one of the column is Republique Française.

Then it's not ASCII, to start with. There's no cedilla in ASCII.
now the field length in the table ( FoxPro database) is suppose 75. Yet
when i open it in the notepad it becomes 74. My problem is that when the
encoding changes from ASCII to UTF-8 , the field length ( or the column length )
for that value also changes. I know it is happening because no of bits
used in ASCII & UTF-8 are different. Is there soem way I can keep the
column length fixed to 75 only

I suspect you just want to create the file using Encoding.Default
instead of either Encoding.ASCII or Encoding.UTF-8.
 
I am genrating a file(.txt file, which is being opened with notepad),
the file has some data from some tables. The tables has fixed column
length, yet When i open in the notepad the column length changes.
If only visualy the column changes, this might be just a font issue.
Try using a fixed-width font (like Courier)
For ex
the data in one of the column is Republique Française. now the field
length in the table ( FoxPro database) is suppose 75. Yet when i open it
in the notepad it becomes 74. My problem is that when the encoding
changes from ASCII to UTF-8 , the field length ( or the column length )
for that value also changes. I know it is happening because no of bits
used in ASCII & UTF-8 are different. Is there soem way I can keep the
column length fixed to 75 only
If you compare exporting from an ANSI database (correct term to use instead
ASCII) versus a UTF8 database, then the cause is deeper.
The size of the database field counts bytes, notepad (and the users) count
characters. There is no solution here except padding the columns with spaces
to the desired width.

Example:
-
Database field: 8
XXX = 3 characters, 3 bytes
Field contains "58 58 58 20 20 20 20 20"
Output to text "XXX "

ççç = 3 characters, 6 bytes
Field contains "c3 a7 c3 a7 c3 a7 20 20"
Output to text "ççç "
 
Back
Top