strange character when receiving mail

  • Thread starter Thread starter Tony Johansson
  • Start date Start date
T

Tony Johansson

Hi!

Once a received a mail that had a lot of strange character and once I looked
at a web page that also had a lot of strange characters. With strange
character I mean character that is not readable it could be some graphical
character.
I know that this somehow depends of encodings.

So my question is if someone could descibe a scenario where this actually
could appear.

//Tony
 
Tony said:
Hi!

Once a received a mail that had a lot of strange character and once I looked
at a web page that also had a lot of strange characters. With strange
character I mean character that is not readable it could be some graphical
character.
I know that this somehow depends of encodings.

So my question is if someone could descibe a scenario where this actually
could appear.

You can get "strange characters" for two reasons:

• someone sent you strange characters
• someone sent you familiar characters, but you are interpreting them
using the wrong encoding

That's _two_ scenarios where this actually could appear.

Make sure that your C# program properly uses whatever encoding
description/specification is found in your data, and offers the user the
opportunity to specify an encoding explicitly for situations where the
encoding is not specified in the data itself, and you should be fine.

Hope that helps.

Pete
 
Once a received a mail that had a lot of strange character and once I looked
at a web page that also had a lot of strange characters. With strange
character I mean character that is not readable it could be some graphical
character.
I know that this somehow depends of encodings.

So my question is if someone could descibe a scenario where this actually
could appear.

Both emails and web pages uses MIME types.

Which means that "something" consists of:
- a content-type specifying the charset
- the actual data

The typical reason for seeing garbage is if those two
are inconsistent.

If the content-type says "text/plain; charset=ISO-8859-1" but
the content is really UTF-8 then you will see 2 letters for
each 0x80-0xFF character.

If the content-type says "text/plain; charset=UTF-8" but
the content is really ISO-8859-1 then you will see some
error like question marks for each 0x80-0xFF character.

If we goes to more exotic stuff like chinese and japanese,
then you can really see something weird.

Arne
 
Back
Top