OK, First of all, thnx for reading the spec for me
*a little ashamed*
I'll just recap and give you my new status / interpretation of my
restrictions.
First of all, I've been using the wrong file extention for vCalendar files,
I've been using the extention for iCalendar files, which can be encoded with
UTF-8. Now that I've changed to the vCalendar format, only different ANSI
codepages is accepted, but are apparently read as ASCII. (am I at least
getting better at this?
)
Anyway - using iso-8859-1 encoding with codepage 1252 which is the
encoding/codepage my outlook uses when exporting to .vcf files, and set the
encoding parameter to quoted-printable for the summary/description property
of the vCalendar object - I'm able to use =0D, =3A etc. for special
characters, but =E5 (å) is stripped when I try to open the file in outlook.
There's no apparent relevant difference between a file outlook exports and
reads perfectly with æøå in it, and the files I generate. Both use
iso-8859-1 encoding with codepage 1252. This is how they look:
Outlook's (displayed correctly):
BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook 11.0 MIMEDIR//EN
VERSION:1.0
BEGIN:VEVENT
DTSTART:20040310T070000Z
DTEND:20040310T080000Z
UID:
[email protected]
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:M=E5l- og resultatsamtale mellom Lars=
-Erik Aabech og Lars-Erik Aabech=0D=0ATid: 10.03.2004 08:00=0D=0ASam=
taletype: Resultatsamtale=0D=0AKanskje dette funker..=0D=0A
SUMMARY;ENCODING=QUOTED-PRINTABLE:Invitasjon til m=E5l- og resultatsamtale
PRIORITY:3
END:VEVENT
END:VCALENDAR
Mine:
BEGIN:VCALENDAR
VERSION:2.0
METHOD
UBLISH
BEGIN:VEVENT
UID:
[email protected]
LOCATION:
DTSTART:20040310T070000Z
DTEND:20040310T080000Z
DTSTAMP:20040303T134702Z
SUMMARY;ENCODING=QUOTED-PRINTABLE:Invitasjon til m=E5l- og resultatsamtale
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:M=E5l- og resultatsamtale mellom
Lars-Erik Aabech og Lars-Erik Aabech=0DTid: 10.03.2004 08:00=0DSamtaletype:
Resultatsamtale=0D=0DKanskje dette funker..
CLASS
UBLIC
END:VEVENT
END:VCALENDAR
So, I'm at a complete loss as far as vCalendar files go.
But I found out that iCalendar and vCalendar files use appx. the same syntax
(although I have to admit I haven't read the specs good enough to describe
the differences), and if I export an iCalendar file from outlook it is
encoded with UTF-8 using codepage 65001 - æøå is saved as plain text, and
line-shifts are saved as \n in plain text
So, for now I'm gonna change encoding, codepage, syntax & file extention and
pray iCalendar is easier than vCalendar
Thanks a lot for your help, Jon! I've learnt a lot today, although I'm
partly giving up
Lars-Erik
-
BTW.. here's the file from outlook in iCalendar format
BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook 11.0 MIMEDIR//EN
VERSION:2.0
METHOD
UBLISH
BEGIN:VEVENT
DTSTART:20040310T070000Z
DTEND:20040310T080000Z
TRANSP:OPAQUE
SEQUENCE:0
UID:
[email protected]
DTSTAMP:20040303T134702Z
DESCRIPTION:Mål- og resultatsamtale mellom Lars-Erik Aabech og Lars-Erik
Aabech\nTid: 10.03.2004 08:00\nSamtaletype: Resultatsamtale\n\nKanskje
dette funker..\n
SUMMARY:Invitasjon til mål- og resultatsamtale
PRIORITY:5
X-MICROSOFT-CDO-IMPORTANCE:1
CLASS
UBLIC
END:VEVENT
END:VCALENDAR
Lars-Erik Aabech said:
There's obviously a lot I don't know about encoding (although I'd like to).
Do you know some place on the net (not to academic) where I can learn
more?
See
http://www.pobox.com/~skeet/csharp/unicode.html - that's my best
explanation, and it's got some other things in as well.
I though ANSI and ASCII were more or less the same 256 bytes set, except the
last x bytes represent different special characters depending on the
codepage specified.
ASCII is only 7-bit to start with.
Different ANSI code pages tend to share the first 128 values with
ASCII, and then have different values for the last 128 values. That's
what I mean when I say there's no such thing as "the ANSI encoding".
Anyway - I'm creating a vCalendar file (
http://www.imc.org/pdi/) which will
be mailed as an attachment to Outlook users (hopefully it will work with
other apps too). Outlook complains if the file isn't encoded correctly. So I
tried to open one of the generated files with notepad, saved it as ANSI
instead of UTF-8, and then it works. These are the facts I based my
statements on
(works, doesn't work, ansi etc)
Looking at the specification, it seems Outlook is being a little too
generous, but that there's a way you can get round it anyway. From the
spec, section 2.1.5:
<quote>
The default character set is ASCII. The default character set can be
overridden for an individual property value by using the "CHARSET"
property parameter. This property parameter may be used on any
property. However, the use of this parameter on some properties may not
make sense.
Any character set registered with the Internet Assigned Numbers
Authority (IANA) can be specified by this property parameter. For
example, ISO 8859-8 or the Latin/Hebrew character set is specified by:
DESCRIPTION;CHARSET=ISO-8859-8:...
Some transports (e.g., MIME based electronic mail) may also provide a
character set property at the transport wrapper level. This property
can be used in these cases for transporting a vCalendar data stream
that has been defined using a default character set other than ASCII
(e.g., UTF-8).
</quote>
I would suggest that you should output ASCII without any CHARSET= tag
where there are no non-ASCII characters, and use UTF-8 otherwise,
specifying CHARSET=UTF-8.
I would certainly *hope* that would work.
Note section 2.1.4, however, which specifies the encoding for the whole
object - it defaults to only 7 bit.
I'm getting closer at least..
I've tried the following, and all the types was accepted by outlook 2003,
with assorted presentations of the norwegian characters: (?, +, empty, etc
)
System.Text.Encoding enc = System.Text.Encoding.GetEncoding(865);
System.Text.Encoding enc = System.Text.Encoding.GetEncoding(1252);
System.Text.Encoding enc = System.Text.Encoding.GetEncoding(20127);
System.Text.Encoding enc = System.Text.Encoding.GetEncoding("iso-8859-1");
etc. etc.
Which means that outlook don't give a **** what codepage I use.
It must, because you're *potentially* creating different data. What you
might have seen is either Outlook guessing (which means it might guess
it wrong) or you picking encodings which use the same mappings for
those particular characters.
I've exported a calendar element from outlook with special characters (while
writing this post) and it appears I have to replace the special characters
with '=E6' etc. and insert some more parameters in the vCalendar file.
Example:
SUMMARY;ENCODING=QUOTED-PRINTABLE:V=E6ret er r=F8tent i =E5r =C6=D8=C5
instead of
SUMMARY:Været er røtent i år ÆØÅ
That would be due to using quoted printable, as specified in section
2.1.4.
So, the last question I have would be... Anyone got a magic way to do this
or do I have to do string.replace("æ", "=E6").replace("ø", "=xx").... ???
(Maybe a loop using String.charCodeAt or such, but still....)
Basically you'd want to look through the created byte array, and any
byte greater than 127 should be quoted - along with '=' presumably (I
haven't checked the quoted printable spec for a while).