Writing German characters to sequential file

  • Thread starter Thread starter Martin
  • Start date Start date
M

Martin

Hi all,

I've made an application for Windows Mobile 5.0, .Net framework 3.5.
This app reads and writes sequential files, with the streamreader and
writer.
My problem is that some of the files contain (german) names. They may
contain some special characters, such as a e o with points on top of them.
One input file is created in another programming platform, and there the
names contain these characters in correct format. When I display such a
string in the VB app, it also schows correct. However, when I write it in
another sequential file, with the streamwriter, the geman characters get
replaced by squares.
Does anyone know how I can preserve the original characters?

Thanks,
Martin
 
Martin said:
I've made an application for Windows Mobile 5.0, .Net framework 3.5.
This app reads and writes sequential files, with the streamreader and
writer.
My problem is that some of the files contain (german) names. They may
contain some special characters, such as a e o with points on top of
them. One input file is created in another programming platform, and
there the names contain these characters in correct format. When I
display such a string in the VB app, it also schows correct. However,
when I write it in another sequential file, with the streamwriter, the
geman characters get replaced by squares.
Does anyone know how I can preserve the original characters?

How exactly do you write the file, which encoding is used? Is it ensured
that the same encoding is used to read the file?
How do you look at the file when you see the "squares"?
 
From your description, I'd suspect that the "problem" lies with the program
that you are using to view the file. If you can read it correctly with your
VB program, then the data is OK. If you are using Notepad, for example, and
the Windows Codepage isn't set to use a German character set... What you
are seeing might not be unexpected.

Dick

--
Richard Grier (Microsoft MVP - Visual Basic) Hard & Software 12962 West
Louisiana Avenue Lakewood, CO 80228 303-986-2179 (voice) Homepage:
www.hardandsoftware.net Author of Visual Basic Programmer's Guide to Serial
Communications, 4th Edition ISBN 1-890422-28-2 (391 pages) published July
2004, Revised July 2006.
 
Hi Martin and Dick,

Thanks for your responses. The viewer can't be the problem. I use a file
comparison program to compare the before and after file, and it shows
exactly where the differences are, always in those german characters.

The commands I use to read and write are these:

Read:
------
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(DataPath &
"mdeusr.dat", System.Text.Encoding.Default)

Do
Line = sr.ReadLine
AddItem(Line)
Loop Until Line Is Nothing

sr.Close()


Write:
------
Dim sw As System.IO.StreamWriter = New System.IO.StreamWriter(DataPath &
FileName, False, System.Text.Encoding.Default)

For Each ThisUser As User In Users
sw.WriteLine(ThisUser.CSVString)
Next

sw.Close()

Thanks again,
Martin
 
Martin said:
Hi Martin and Dick,

Thanks for your responses. The viewer can't be the problem. I use a file
comparison program to compare the before and after file, and it shows
exactly where the differences are, always in those german characters.

The commands I use to read and write are these:

Read:
------
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(DataPath &
"mdeusr.dat", System.Text.Encoding.Default)

Do
Line = sr.ReadLine
AddItem(Line)
Loop Until Line Is Nothing

sr.Close()


Write:
------
Dim sw As System.IO.StreamWriter = New System.IO.StreamWriter(DataPath &
FileName, False, System.Text.Encoding.Default)

For Each ThisUser As User In Users
sw.WriteLine(ThisUser.CSVString)
Next

sw.Close()

Is this the only code that writes the file? Or has the file been written
by other code before you read it? If it is, which character encoding has
been used? Have a look at the file with a hex editor and tell us the
character code of one of the "Umlaute" (äöü).


Armin
 
Martin said:
Thanks for your responses. The viewer can't be the problem. I use a
file comparison program to compare the before and after file, and it
shows exactly where the differences are, always in those german characters.

The commands I use to read and write are these:

Read:
------
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(DataPath &
"mdeusr.dat", System.Text.Encoding.Default)

Do
Line = sr.ReadLine
AddItem(Line)
Loop Until Line Is Nothing

sr.Close()


Write:
------
Dim sw As System.IO.StreamWriter = New System.IO.StreamWriter(DataPath &
FileName, False, System.Text.Encoding.Default)

For Each ThisUser As User In Users
sw.WriteLine(ThisUser.CSVString)
Next

sw.Close()

Is read and write done on the same system so that it is ensured that
Encoding.Default is the same when reading and writing?
 
Hi Armin,

No, the original file has been written on a different platform.
Unfortunately I don't know enough about this other platform, therefore I
thought that "default" would be a safe bet. And as long as VB doesn't
rewrite the file all german characters are correct. They get screwed up at
the moment I use the streamwriter to rewrite the file.

I checked both files with a binary comparison, here are the results:

Ascii 252 (U-umlaut) in the original file
Ascii 63 (Question mark) in the VB-rewritten file

Thank you for your effort,
Martin
 
Hi Martin,

No, the original file has been written on a different platform.
Unfortunately I don't know enough about this other platform, therefore I
thought that "default" would be a safe bet. And as long as VB doesn't
rewrite the file all german characters are correct. They get screwed up at
the moment I use the streamwriter to rewrite the file.

I checked both files with a binary comparison, here are the results:

Ascii 252 (U-umlaut) in the original file
Ascii 63 (Question mark) in the VB-rewritten file

Thank you for your effort,
Martin
 
Martin said:
No, the original file has been written on a different platform.
Unfortunately I don't know enough about this other platform, therefore I
thought that "default" would be a safe bet. And as long as VB doesn't
rewrite the file all german characters are correct. They get screwed up
at the moment I use the streamwriter to rewrite the file.

I checked both files with a binary comparison, here are the results:

Ascii 252 (U-umlaut) in the original file
Ascii 63 (Question mark) in the VB-rewritten file

Well you will need to find out the code page/encoding the file is
written with on one platform and then use the same code page/encoding on
the other platform to read the file in.
Encoding.Default takes the "operating system's current ANSI code page"
and that can differ on different systems.
 
Martin said:
Hi Armin,

No, the original file has been written on a different platform.
Unfortunately I don't know enough about this other platform, therefore I
thought that "default" would be a safe bet. And as long as VB doesn't
rewrite the file all german characters are correct. They get screwed up at
the moment I use the streamwriter to rewrite the file.

I checked both files with a binary comparison, here are the results:

Ascii 252 (U-umlaut) in the original file
Ascii 63 (Question mark) in the VB-rewritten file

Thank you for your effort,
Martin


I also don't know which character encoding is used in the file.
If I convert 252 using the default codepage, which is "Westeuropäisch
(Windows)" (codepage 1252) on my system, it correctly results in "ü".
Within several codpages #252 is the "ü" or "Ü":

Public Class Main
Shared Sub Main()
Dim encinfos = System.Text.Encoding.GetEncodings()

Dim b(0) As Byte
b(0) = 252
For Each encinfo In encinfos
Dim enc = System.Text.Encoding.GetEncoding(encinfo.CodePage)
Debug.Print(enc.GetString(b) & " " & _
enc.EncodingName & " " & enc.CodePage)
Next
End Sub
End Class


I can not imagine that the character is read or written incorrectly if
you always specify the default codepage. Which one is your default codepage?

Debug.Print(System.Text.Encoding.Default.EncodingName)
 
Martin said:
Ascii 252 (U-umlaut) in the original file

The captial letter Ü or lower case ü?

If it's the former (Ü), it is (very probable) an IBM EBCDIC or IBM Latin
encoding. If you don't find any documentation about the codepage used,
you must try different codepages til all characters are read correctly.

However, you're saying that everything's ok after reading the file the
first time, right? Then again, an Ü can impossibly be converted to a
question mark if the destination codepage does contain this character.
But I wait until you tell me your Default codepage and reassure me that
it is _always_ used when writing the file.


Armin
 
Hi Armin,

I meant a lower case u-unlaut. But I think I am beginning to understand this
whole code page stuff (sorry, have never used them before).
The target device runs Windows Mobile 5.0 German. But when i look at the
settings, I see that 'locale' is 'None'. Does this mean that I need to
specify a German code page in the stream writer command?

Could you tell me how to do that in this statement?:
Dim sw As System.IO.StreamWriter = New System.IO.StreamWriter(FileName,
False, System.Text.Encoding.?????)

I appreciate your help,
Martin
 
Martin said:
Hi Armin,

I meant a lower case u-unlaut. But I think I am beginning to understand this
whole code page stuff (sorry, have never used them before).
The target device runs Windows Mobile 5.0 German. But when i look at the
settings, I see that 'locale' is 'None'.

Again, what's the name or number of your Default code page?
(System.Text.Encoding.Default.EncodingName or .CodePage)
Does this mean that I need to
specify a German code page in the stream writer command?

Could you tell me how to do that in this statement?:
Dim sw As System.IO.StreamWriter = New System.IO.StreamWriter(FileName,
False, System.Text.Encoding.?????)

It depends on which code page you want to (or have to) use. I don't know
that. You must choose one that contains the Umlaut, for example codepage
1252. To create it, call System.Text.Encoding.GetEncoding(1252). Then
you can pass it to the Streamwriter. Another one is the Unicode
codepage that also contains these character. Use the UTF-8 encoding
(System.Text.Encoding.UTF8) for storage which is used by default if you
don't pass an Encoding object to the StreamWriter.

In any case, every application that will have to read the file, also
must know the encoding and code page used.
 
Hi Armin,

Looks like the .Net Compact framework has some limitations here. I cannot
run your code.
GetEncodings() is not a member of system.text.encoding

Martin
 
Martin said:
Hi Armin,

Looks like the .Net Compact framework has some limitations here. I cannot
run your code.
GetEncodings() is not a member of system.text.encoding

Damn! Right, it's not available. And what does
system.text.encoding.default.codepage say?
Have you tried the Unicode code page (with UTF-# encoding)? Can you
successfully create one of the other code pages containing the Umlaut?
Which code pagea and encoding do you have to use to write the file? I
still don't know.
 
Back
Top