Can fopen tell me the coding of a file?

PLS · Mar 28, 2008

When I use fopen for reading with the CCS options, the library will look
at the byte order mark to determine the file type and read accordingly.

I have a more compilcated case. I want to open a file for appending I
want the appended data to always be in UTF-8. If the existing file is in
another encoding I will have to copy and convert it before appending. In
the interest of speed I would prefer not to open the file separately
just to determine what the BOM is.

Is it possible to open for appending with CCS=UTF-8 and then determine
what the existing coding is? Then if the existing is wrong I can close
the file and convert it. This way I'm only doing the extra effort if the
file actually needs conversion.

I think the existing encoding is stored in the FILE structure. Is this
documented anywhere?

Thanks,
++PLS

Jeroen Mostert · Mar 29, 2008

PLS said:
When I use fopen for reading with the CCS options, the library will look
at the byte order mark to determine the file type and read accordingly.

I have a more compilcated case. I want to open a file for appending I
want the appended data to always be in UTF-8. If the existing file is in
another encoding I will have to copy and convert it before appending. In
the interest of speed I would prefer not to open the file separately
just to determine what the BOM is.

How is opening a file and reading the first few bytes going to slow anything
down? You're going to be appending a whole lot more.

That said, not opening a file twice has other benefits which are more
important than any putative speed gain (such as not being caught by surprise
if the file changes during calls).

Note that the behavior you're trying to avoid (reopening) is exactly what
the CRT will do anyway if you use mode "a" -- it will first open the file
for reading to determine the BOM and then reopen it for writing.

Is it possible to open for appending with CCS=UTF-8 and then determine
what the existing coding is?

Sure. If you open the file with "a+", you can both read and append. Rewind
the file to the beginning and read the BOM.

Then if the existing is wrong I can close the file and convert it. This
way I'm only doing the extra effort if the file actually needs
conversion.

I think the existing encoding is stored in the FILE structure.

It's not, at least not directly. The file structures are internal to the CRT.

Is this documented anywhere?

No, and to the best of my knowledge there's no documented way of getting the
encoding used to open the file (or the actual encoding). The CRT support
isolates you from encoding issues. If you want to handle encoding issues
explicitly, you're going to have to handle them explicitly.

Can fopen tell me the coding of a file?

PLS

Jeroen Mostert