"Dir >myfile.txt" on command prompt: Umlaute and special chars are NOT written correctly to file?

  • Thread starter Thread starter Wolfgang Hercker
  • Start date Start date
W

Wolfgang Hercker

When I type in a command prompt on a Win2000/WinXP the well known "dir" command
and redirect the output to a file I got a problem.

I have files in the directory with german umlaute (öäü). When the output is shown at the
command prompt (stdout) they are displayed correctly. But when I open the created text file with
the file listing they are not displayed correctly (e.g. in Notepad or other editors).

Why ?

Do I have to switch an option ?

Is there 3rd party tool which is able to write the filenames correctly to the file ?

Wolfgang
 
Wolfgang said:
When I type in a command prompt on a Win2000/WinXP the well known "dir" command
and redirect the output to a file I got a problem.

I have files in the directory with german umlaute (öäü). When the output is shown at the
command prompt (stdout) they are displayed correctly. But when I open the created text file with
the file listing they are not displayed correctly (e.g. in Notepad or other editors).

(Appologies, this has got long and complicated, but I'll post it anyway incase
it is of use)

Long file names in VFAT filesystems are stored in 16-bit unicode.
File names in NTFS are stored in 16-but unicode.

Data output by DIR is in 8-bit ascii, with extensions.

Windows uses a different character set mapping to DOS. The MS-DOS EDIT command
will show the accented/umlauted characters correctly. EDIT is a MS-DOS based
editor. NOTEPAD is a windows application. MS-DOS applications use the CP437
character set (derived from the 1981 IBM PC spec) . Windows uses (IIRC) an
ISO-8859-1 character set, commonly confused with Windows-1252.

These two conflicting standards rarely agree for any character above character
126 (~)

In CP437, an umlauted lower-case a is character 132 (which appears as a lower
double quote in ISO-8859-1). In ISO-8859-1, the same character is represented
by character 228 (which appears as a greek upper-case sigma in PC-8)

Both are correct, for their respective environments.

Consider yourself lucky that MS-DOS didn't use EBCDIC

To get around this problem, considering that windows knows the names of the file
and will do translation where necessary (but dos doesn't which is the problem),
you need something that uses the windows API to get file names. The functions
you need are FindFirstFile and FindNextFile, something like (in C):

FILE *OutputFile = fopen("SomeOutputFile.txt","w");
char *sFileSpec = "*.txt";
LPWIN32_FIND_DATA FoundFile;
HANDLE hSearchHandle = FindFirstFile(sFileSpec,FoundFile);
if( hSearchHandle != INVALID_HANDLE_VALUE )
do
{
printf(OutputFile,"%s",FoundFile->cFileName);
}
while( FindNextFile(hSearchHandle,FoundFile) );
FindClose(hSearchHandle);

(although I have not even throught about compiling that to see if it actually works)
There's probably something similar available in visual basic, but I have no
plans to be found anywhere near that particular language, thanks.
 
When I type in a command prompt on a Win2000/WinXP the well known "dir" command
and redirect the output to a file I got a problem.

I have files in the directory with german umlaute (öäü). When the output is shown at the
command prompt (stdout) they are displayed correctly. But when I open the created text file with
the file listing they are not displayed correctly (e.g. in Notepad or other editors).

Why ?

Do I have to switch an option ?

Is there 3rd party tool which is able to write the filenames correctly to the file ?

Have you tried to start CMD.EXE with /U? See cmd /? for details.
 
Wolfgang Hercker said:
When I type in a command prompt on a Win2000/WinXP the well known "dir"
command
and redirect the output to a file I got a problem.

I have files in the directory with german umlaute (öäü). When the output
is shown at the
command prompt (stdout) they are displayed correctly. But when I open the
created text file with
the file listing they are not displayed correctly (e.g. in Notepad or
other editors).

Why ?

Do I have to switch an option ?

Is there 3rd party tool which is able to write the filenames correctly to
the file ?

Wolfgang

You need to select a different font.

The three characters you mention would be displayed correctly in COURIER for
instance. In Lucida (which I use for DOS) the characters are shown as three
completely different characters.

Try different fonts in Start>Programs>Accessories>System Tools>Character Map
and look at characters F6,E4 and FC

Select courier and all will be fine. Boring, but fine.
 
Have you tried to start CMD.EXE with /U? See cmd /? for details.

/U Causes the output of internal commands to a pipe or file to be
Unicode

Egad. Genius! (Kicks self for not thinking of that)

Breaks DOS EDIT though, because it cannot cope with multibyte characters;
support in other MS-DOS style applications may be patchy or nonexistant.

I tend to use vi, rather than EDIT, and that's broken too. Notepad seems to
understand both. Programs expecting ANSI/ascii/PC-8/etc. will undoubtedly get
extremely confused.
 
Wolfgang said:
I have files in the directory with german umlaute (öäü). When the
output is shown at the command prompt (stdout) they are displayed
correctly. But when I open the created text file with the file
listing they are not displayed correctly (e.g. in Notepad or other
editors).

You are mixing the MS-DOS and the Latin1 character sets. Use en
editor that applies the former like the SemWare editor. Notepad
applies the latter. Better still, limit yourself to the base ASCII
33-127 set in your file names.

All the best, Timo
 
Wolfgang,

The DIR command output and the command promot window use either
Codepage 437 or 850 depending on your country (MS-DOS character sets).

For example:
a-umlaut in these character sets is codepoint 132 decimal (hex '84').

Notepad and other editors use ISO-8859-1 (Windows character set).

For example:
a-umlaut in this character set is codepoint 228 decimal (hex 'E4').

Some editors (e.g. Textpad) allow you to display a page using the MS-DOS
character set, which would allow you to display correctly the output of
the DIR command.

I am not aware of an option to make the DIR command put out the Windows
character set.

I assume there must be third-party filters to convert text files between
character sets (I will probably write one myself soon because I too had
this same problem).

- Rich
 
Type CMD/U and press [Enter]

Now you will be able to output Unicode characters.

Austin M. Horst
 
Back
Top