iostream question: How to open unicode file name.

  • Thread starter Thread starter Charles F McDevitt
  • Start date Start date
C

Charles F McDevitt

I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream only
takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and call
attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the user's
default codepage might not allow for all the characters that are in the
unicode string.

I can of course convert the unicode string to utf-8, but I can't find any
way to get the iostream open() to believe that the string is utf-8.

trying to ibue() a stream with a custom locale that has a custom codecvt
seems to only affect data written to the stream, not how file names passed
to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a unicode named
file?

Why doesn't ifstream have a wopen() ?
 
Charles said:
I'm trying to upgrade some old code that used old iostreams.

At one place in the code, I have a path/filename in a wchar_t string
(unicode utf-16).

I need to open an ifstream to that file. But the open() on ifstream
only takes char * strings (mbcs?).

In old iostreams, I could _wopen() the file, get the filedesc, and
call attach() on the ifstream.
But filedescs and attach() don't exist in standard iostreams.

I can't just convert the unicode string to a mbcs string, because the
user's default codepage might not allow for all the characters that
are in the unicode string.

I can of course convert the unicode string to utf-8, but I can't find
any way to get the iostream open() to believe that the string is
utf-8.

trying to ibue() a stream with a custom locale that has a custom
codecvt seems to only affect data written to the stream, not how file
names passed to open are interpreted.

Does anyone know how I can accomplish opening an ifstream to a
unicode named file?

Why doesn't ifstream have a wopen() ?

Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since
C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

The library implementation supplied with VC does provide a way to do it,
however. Open the file using the C runtime library _wfopen, then create the
ifstream by passing the FILE* that _wfopen() returns to the ifstream
constructor.

#include <fstream>

void foo(const wchar_t* pwsz)
{
std::ifstream stm(_wfopen(pwsz,L"rb"));

// Do stuff with the stream
}

-cd
 
Carl Daniel said:
Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since
C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

The library implementation supplied with VC does provide a way to do it,
however. Open the file using the C runtime library _wfopen, then create the
ifstream by passing the FILE* that _wfopen() returns to the ifstream
constructor.

#include <fstream>

void foo(const wchar_t* pwsz)
{
std::ifstream stm(_wfopen(pwsz,L"rb"));

// Do stuff with the stream
}

-cd

Thanks.. I guess that's my only choice.
 
Carl Daniel said:
[...]
Why doesn't ifstream have a wopen() ?

Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since
C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

I consider that a weak argument. After all,
C++ also tries to address systems where there's
no file system at all.
(Carl, I know you're not the one to argue with
about that. However, I just couldn't let that
pass uncommented. said:
[...]
-cd

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers org

"And why should I know better by now/When I'm old enough not to?"
Beth Orton
 
Hendrik said:
Carl Daniel said:
[...]
Why doesn't ifstream have a wopen() ?

Because the C++ community can't agree on how such a function should
work. For a system such as Windows that has a Unicode file system,
the desired behavior seems obvious: just pass the string to the
filesystem. But since C++ tries to address a much wider range of
systems, many of which don't support unicode filesystems, we're left
with a C++ standard in which there is no standard compliant,
portable way to open a file given a unicode file name.

I consider that a weak argument. After all,
C++ also tries to address systems where there's
no file system at all.
(Carl, I know you're not the one to argue with
about that. However, I just couldn't let that
pass uncommented. <g>)

Quite so - I consider it a very weak argument as well, but that's the state
of affairs, unfortunately. At least there is a workaround, however ugly and
non-portable it might be.

-cd
 
Hendrik Schober said:
Carl Daniel said:
[...]
Why doesn't ifstream have a wopen() ?

Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since
C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

I consider that a weak argument. After all,
C++ also tries to address systems where there's
no file system at all.
(Carl, I know you're not the one to argue with
about that. However, I just couldn't let that
pass uncommented. said:

One of the reasons more and more people are moving to Java and C#.... At
least, those languages work in International environments... Unlike C++,
where it is "implementation defined" if things will work sensibly.
 
Charles F McDevitt said:
[...]
One of the reasons more and more people are moving to Java and C#.... At
least, those languages work in International environments... Unlike C++,
where it is "implementation defined" if things will work sensibly.


Well, I don't know much about Java or C#.
But AFAK, in Java, the Unicode char is fixed
to 16bit. While that certainly is platform
independend, it isn't very good either.

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers org

"And why should I know better by now/When I'm old enough not to?"
Beth Orton
 
Hendrik Schober said:
Charles F McDevitt said:
[...]
One of the reasons more and more people are moving to Java and C#.... At
least, those languages work in International environments... Unlike C++,
where it is "implementation defined" if things will work sensibly.


Well, I don't know much about Java or C#.
But AFAK, in Java, the Unicode char is fixed
to 16bit. While that certainly is platform
independend, it isn't very good either.

It's only fixed 16-bits UTF-16 internal to your Java Code.
You can write or read it from there anyway you want,
although the default is to keep it 16-bit UTF-16.
Java has built-in conversion routines for most character sets you'd want.

In C++, if your characters are wchar_t, it's implementation
defined if they are 16-bit, 32-bit, or some other size,
and implementation defined if they are Unicode or not.
They could be any conceviable character set that fits in the size.

When writing out, say with a wofstream, C++ mandates that the
default behaviour is to convert to narrow characters.

But what the conversion is, is implementation defined.
Even if you happen to have unicode in your wchar_t string,
the default conversion via wofstream could convert to any
character set, and again, it's implementation defined.

Microsoft's choice is the convert to the local code page
(makes sense on Windows except when the local code page can't handle the
unicode characters)
and Linux seems to convert to UTF-8,
and different UNIXes do whatever they think is sensible,
but you can't count on any consistent behavour.


All of this makes it a pain to write portable C++ code.
 
Carl Daniel said:
Because the C++ community can't agree on how such a function should work.
For a system such as Windows that has a Unicode file system, the desired
behavior seems obvious: just pass the string to the filesystem. But since
C++ tries to address a much wider range of systems, many of which don't
support unicode filesystems, we're left with a C++ standard in which there
is no standard compliant, portable way to open a file given a unicode file
name.

The library implementation supplied with VC does provide a way to do it,
however. Open the file using the C runtime library _wfopen, then create the
ifstream by passing the FILE* that _wfopen() returns to the ifstream
constructor.

#include <fstream>

void foo(const wchar_t* pwsz)
{
std::ifstream stm(_wfopen(pwsz,L"rb"));

// Do stuff with the stream
}

-cd

One issue with this approach (other than the need to rewrite a lot of code):

It is OK to call .imbue() after the constructor opens the file?

I need a custom locale (well, custom codecvt facet). Normally, I would
construct
the stream, call imbue(), and then call open().

But, if the only way to open to a unicode named file is by opening in the
constructor,
I need to then imbue() my locale after the open has happened.
Is that legal?
 
Charles F McDevitt said:
[...]
It's only fixed 16-bits UTF-16 internal to your Java Code.

If it really is UTF-16, it's all right.
I was told it only takes Unicode < 2^16
(as Windows does).

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers org

"And why should I know better by now/When I'm old enough not to?"
Beth Orton
 
Back
Top