How to get the Encoding of a FileStream or MemoryStream?

  • Thread starter Thread starter darin dimitrov
  • Start date Start date
D

darin dimitrov

I have a Stream object created after parsing some
"multipart/form-data" HTTP content in a server application and I need
to obtain its encoding. When I save the Stream to a file I can see its
encoding: in most cases it is UTF-8 but it can also be ISO-8859-1
(Western European) or other. How can I obtain the Encoding of this
Stream? Is this possible?
 
darin dimitrov said:
I have a Stream object created after parsing some
"multipart/form-data" HTTP content in a server application and I need
to obtain its encoding. When I save the Stream to a file I can see its
encoding: in most cases it is UTF-8 but it can also be ISO-8859-1
(Western European) or other. How can I obtain the Encoding of this
Stream? Is this possible?

You can't. For instance, every possible UTF-8 file is also an ANSI code
page 1252 file.

There are ways you can *guess*, but you can't know for sure. The server
should tell you the encoding in the headers.
 
You can't. For instance, every possible UTF-8 file is also an ANSI code
page 1252 file.

OK, I understand this.
There are ways you can *guess*, but you can't know for sure.
On what heuristic is this *guessing* based?
The server should tell you the encoding in the headers.
You mean the client?

In fact I am developping a Microsoft BizTalk pipeline component which
receives input stream from an HTTP adapter and the worst thing is
that this adapter cuts the headers sent by the client and passes only
the body of the request, so I cannot read the encoding from the
headers :(
 
darin dimitrov said:
OK, I understand this.

On what heuristic is this *guessing* based?

Well, if the first 3 bytes of the stream are 0xEF 0xBB 0xBF, for
example, that's likely to be a UTF-8 stream - that's the BOM for UTF-8.
It *might* not be a UTF-8 stream though. Those could just be the first
three characters of a CP1252-encoded stream. Do you see what I mean?
You mean the client?

In fact I am developping a Microsoft BizTalk pipeline component which
receives input stream from an HTTP adapter and the worst thing is
that this adapter cuts the headers sent by the client and passes only
the body of the request, so I cannot read the encoding from the
headers :(

Yuk. Does it genuinely give you different encodings at different times,
too?
 
Back
Top