Spamcop and How Outlook Handles Standards

  • Thread starter Thread starter McWebber
  • Start date Start date
M

McWebber

There's an ongoing discussion in the news://news.spamcop.net/spamcop
newsgroup about how Outlook, because it is stripping certain elements from
email, is breaking standards, which Spamcop is now using to eliminate
errors. An example of an email is below.

The headers contain:
Content-Type: multipart/alternative;
boundary="0.1.2159DC2"

Yet the boundaries are being stripped out by Outlook XP. A test was posted
and when the identical mail was retrieved with Netscape, the boundaries were
intact.

In Outlook, the message source, (from the end of the headers and the
beginning of the HTML source), appeared as:

Content-Type: multipart/alternative;
boundary="0.1.2159DC2"
X-Priority: 3
X-MSMail-Priority: Normal

<HTML><HEAD>
<BODY bgColor=#ffffff>


In Netscape, the same message appeared as:

Content-Type: multipart/alternative;
boundary="0.1.2159DC2"
X-Priority: 3
X-MSMail-Priority: Normal


--0.1.2159DC2
Content-Type: text/html;
Content-Transfer-Encoding: quoted-printable

<HTML><HEAD>
<BODY bgColor=3D#ffffff>

and the end of the message appeared as:


<P></P></A></BODY></HTML>

--0.1.2159DC2--

Whereas in Outlook, it simply ended with </HTML>

Does anyone have any answer as to why Outlook XP does this? If the headers
state:
Content-Type: multipart/alternative;
boundary="0.1.2159DC2"

That boundary should be there in the source.
 
I know of no stripping of boundaries in Outlook. What I'm wondering is how
whoever is posting these supposed snippets of the message source actually
got them. Outlook does *not* store the MIME for a message (other than the
headers) - it converts the MIME message to a MAPI message and stores it that
way, so you're not going to be able to see the source. What is the symptom
that folks are complaining about? There may be another cause that they're
overlooking.
 
Jeff Stephenson said:
I know of no stripping of boundaries in Outlook. What I'm wondering is how
whoever is posting these supposed snippets of the message source actually
got them. Outlook does *not* store the MIME for a message (other than the
headers) - it converts the MIME message to a MAPI message and stores it that
way, so you're not going to be able to see the source. What is the symptom
that folks are complaining about? There may be another cause that they're
overlooking.

They got them by using an email client that doesn't strip them out, such as
Outlook Express or Netscape.

The problem being complained about is that when the headers of the message
say:

Content-Type: multipart/alternative;
boundary="0.1.2159DC2"

Or if it was base 64, that those boundaries should be there in the message
source. How do you see them in Outlook, or forward the message with those
boundary points intact from Outlook? Once Outlook gets the message those
dissapear. (Even forwarding a message as an attachment in Outlook changes
the original message.) They're kept intact apparently in Outlook Express and
in other mail programs. If the header indicated Base 64, without the
boundary another program can't accurately parse the data to find what it's
supposed to.

As posted by someone in the news.spamcop.net newsgroup:
"Spam comes in with

Content-Type: multipart/mixed;
boundary="----=_NextPart_000_3076_00001D63.00007B64"

in the headers. The original message as received contains:

------=_NextPart_000_3076_00001D63.00007B64
Content-Type: text/html;
charset="iso-8859-1"

in it which tells SC that the email is HTML.
The original message also contains the following:

<a href="http://www.herbaltrials.com/609.html">
http://www.<someinnocentdomain>.com/</a>

If the message is forwarded/pasted into SC intact (ie it contains
all of its boundries and formatting) SC is going to pick up the
real URL *http://www.herbaltrials.com/609.html* and will
completely ignore *http://www.<someinnocentdomain>.com*.

However, if a *broken* email client strips out the

------=_NextPart_000_3076_00001D63.00007B64
Content-Type: text/html;
charset="iso-8859-1"

boundry, SC has absolutley no way to know it should ignore the
*http://www.<someinnocentdomain.com* link.
</quote>

Now, the question is, if Outlook Express and other mail programs are leaving
those boundaries intact, what do you have to do with Outlook 2002/XP to get
them as part of the message source?
 
I'm still not clear on what the problem is. Yes, it is true that Outlook
will not generate exactly the same MIME body for a message as it received.
As I said in my original reply, Outlook does not actually store the MIME
that it receives, it converts the MIME body to MAPI properties and stores
them in a MAPI store (Outlook is based on MAPI). This differs from most
other mail programs (like OE and Netscape), which store messages in their
originally received MIME format because that's all they have to support.
Outlook, however, supports many more types of mail servers than OE or
Netscape - POP3, IMAP4, Exchange, cc:Mail, Lotus Notes, Hotmail, you can
even write third-party mail providers that plug into it (people have) - and
its common-denominator storage format is MAPI, not MIME.

When Outlook is asked for a message in MIME format, it will re-generate MIME
from the MAPI, but that MIME it will not be exactly the MIME that was
received. If asked to generate the message in Plain Text format, it will
not have a text/html body part, just text/plain, even if the original
message was text/html. If asked for HTML format, it will generate both
text/plain and text/html body parts in a multipart/alternative message.

It sounds like the SpamCop discussion has a fixation on the boundaries used
between body parts and is missing some real issue - Outlook does not
generate bad MIME, which is what the original message with the boundaries
stripped would be. So what *is* the real issue here? Get past the fact
that the MIME is not exactly the same as what was received - it won't be,
and that's that. I'd like that to be different, as I work on the Internet
protocols for Outlook, but that's the way it is. So the problem is that
Outlook receives a piece of MIME mail, and then somebody does something
(what?) in order to accomplish something (what?) and gets a bad result
(what?). It sounds like people need to step back and look at the big
picture, rather than focussing on the brush-strokes in the painting...
 
Jeff Stephenson said:
It sounds like people need to step back and look at the big
picture, rather than focussing on the brush-strokes in the painting...

Thanks Jeff. That's basically the point I was making in the Spamcop group.
Spamcop does work fine if you remove the content type from the header, so
it's not as if Outlook was producing some HTML that is causing false reports
to be generated. I think it is mostly an anti-MS issue.
 
What I understood from this (and another thread in microsoft.public.outlook)
was that SpamCop had changed something and suddenly wasn't working against
the same version of Outlook that it used to work with, and folks were
blaming Microsoft. Is that accurate? If so, it wouldn't be the first time,
sadly...
 
Jeff said:
What I understood from this (and another thread in
microsoft.public.outlook) was that SpamCop had changed something and
suddenly wasn't working against the same version of Outlook that it
used to work with, and folks were blaming Microsoft. Is that
accurate? If so, it wouldn't be the first time, sadly...

While its true that these battles often form along those lines, here's
the problem:

In the spamfighting world, "Everyone" traditionally submits to the
appropriate desk the "exact" spam /as received/ in the form of standard
email client type smtp headers continuous with the mime format as
described in the RFCs [eg 1521], which also have strict rules for what
the meanings of the various Content-Type: headerlines outlined in those
RFCs, eg multipart/alternative and such and boundary definitions.

"No one" submits to those desks RFC compliant headers, including a
mime-type Content-Type line attached to a mapi re-rendering or facsimile
generation of part, but not all, of the original mime. For example that
compliant Content-Type header line is or may be describing non-existent
boundaries.

"Everyone" and "no one" here really means except for Outlook submitters.

But, you are correct that SpamCop made a change in its algorithm that
aggravated a problem that had been sailing by "without notice" before.
I think the spam submission should just include the necessary language
about how the submitted spam differs from the mime original. Or at
least that it does, "handled by Outlook". The debate is about the
handling of the notify on the way to the desk from SpamCop overseen by
the spam recipient. Spams are sorta like evidence in some dicey issues,
contract battles, money, stuff like that. This algorithm change now
involves or necessitates user changes in the mime describing header and
the body's already been OL changed.
 
Jeff said:
So the problem is that if the user sends both the Internet headers
from the View->Options for the message *and* the message itself, the
boundaries/content-type of the message are not the same as in the
original Internet headers?

If it is useful to you to conceptualize the process, imagine how it
happens for the OE user. S/he accesses the message properties,
consisting of continuous headers and mime body. That is "fed" into
SpamCop, which is a clever little algorithm which parses the header and
determines the source while comparing information obtained during the
parse with databases to determine listings of open proxies, MX servers
for ips, correlation of from and by in the headers, open relay
databases, etc. So, open relay databases are notified of relays, and
appropriate addresses are posited for the user to potentially notify
about the source. Currently it is also taking note of information from
the Content-Type line to help it with the body part.

Now to the body, remember we're thinking OE here. SC is able to peer
into base64 encoding and de-obfuscate html obfuscation and mild
redirection, yahoo type. It wants to provide another set of notify
addresses for the various spamvertiser providers. When it has gathered
up all of the addresses it wants to notify, it presents them to the spam
recipient to approve, because nothing is going to happen without that
part.
The headers, content-type, and boundaries
in the message itself should be entirely consistent internally,
though not the same as the originally received message, so I assume
that complaints about boundaries differing must come from trying to
correlate two different renderings.

The results of OL re-rendering that we are seeing is that
multipart/alternative Content-Type which would appear to the OE
recipient as [borrowed from Michael Lefevre's post]

message header (multipart/mixed)
part header (multipart/alternative)
part header (text/plain)
(plain text body here)
part header (text/html)
(html body here)
part header (image/jpg)
(encoded image here)

appears to the reconstructed OL recipient as

message header (multipart/mixed - from the original)
(html body here)

The good news is that all of those urls are in that html body. The bad
news is that SpamCop is influenced in how it handles the body by the
information in the header about multipart mixed or alternative which
involves the boundaries described above in the "message header" ie
Content-Type line. Recently this conflict between the header and the OL
body bollox the parsing of the body. It can be mitigated by changing
the Content-Type to text/html [Richard Nixon, "But, that would be wrong"
;-) ] So, even tho' the OL spam recipient doesn't /really/ get the old
mime body back, the really essential information is in the re-rendered
mapi => new mime.
In Outlook 2003, we have added a registry setting that will stream the
entire message, as received, into the Internet headers property of the
message (the stuff shown in the View->Options pane), but that value
is off by default because it would pretty much double the size of the
user's message store (both MIME body and MAPI body would be stored).
It would be relatively easy for an anti-spam company to provide a
program that (1) set the registry value to do this and (2) pulled the
entire received body of subsequent messages from the
PR_INTERNET_HEADERS MAPI property.

That is terrific! Everyone is trying to figure out how to get to the
original. The assumption is that it is long gone once OL converts to
mapi. But if it can be diverted before it is lost that would be the
most wonderful result of all. Simply changing a registry value could
divert the /original/ mime, before anything is lost? Does that mean the
various attachments and all? The same stuff which is to be found in
OE's message source?

Thank you very much for your involvement in this almost purely SpamCop
and spamreporting angle, which isn't /really/ an OL "bug" but a MAPI way
of doing things which happen to be in a mime world.
 
Jeff Stephenson said:
If asked to generate the message in
Plain Text format, it will not have a text/html body part, just
text/plain, even if the original message was text/html. If asked for
HTML format, it will generate both text/plain and text/html body
parts in a multipart/alternative message.

This is another real pain since there is no way to know in advance what
is the message format the only way to do this is ask for HTMLBody and
search for:

<!-- Converted from text/plain format -->

If found ask for Body... Case you have not noticed BodyFormat does not
work unless the message has been previously open (and facing the know
security limitation of Outlook we want to avoid that don't we?).

Best regards
José Rui


--
========================================================================
"All sensible men are of the same religion. But what religion that is,
no sensible man will ever say". Benjamin Disraeli
========================================================================
mailto://[email protected]/ Crawler baith. UnCaps me to reply.
Contact information: http://homepage.esoterica.pt/~jrfsousa/contact.html
========================================================================
 
Jeff Stephenson said:
In Outlook 2003, we have added a registry setting that will stream the
entire message, as received

Finally!!!

And now that you have your hands in it add a way to get the message UIDL
wich would provide a nice way around all those storage problems.

Best regards
José Rui


--
========================================================================
"All sensible men are of the same religion. But what religion that is,
no sensible man will ever say". Benjamin Disraeli
========================================================================
mailto://[email protected]/ Crawler baith. UnCaps me to reply.
Contact information: http://homepage.esoterica.pt/~jrfsousa/contact.html
========================================================================
 
Jeff Stephenson said:
...
In Outlook 2003, we have added a registry setting that will stream the
entire message, as received, into the Internet headers property of the
message (the stuff shown in the View->Options pane), but that value is off
by default ...

This is very good news! Could you give us some hints where this information
can be found (which registry entry), or even give us the information
directly here?

Thanks,
Pat Willener
 
In Outlook 2003 (only), you can stream all MIME to the Internet headers
property by creating the DWORD registry value SaveAllMimeNotJustHeaders
under the key HKCU\SOFTWARE\Microsoft\Office\11.0\Outlook\Options\Mail and
setting it to a non-zero value. You can then get this body by opening
PR_TRANSPORT_MESSAGE_HEADERS (property 0x007d) as a stream.
 
Jeff Stephenson said:
In Outlook 2003 (only), you can stream all MIME to the Internet headers
property by creating the DWORD registry value SaveAllMimeNotJustHeaders
under the key HKCU\SOFTWARE\Microsoft\Office\11.0\Outlook\Options\Mail and
setting it to a non-zero value. You can then get this body by opening
PR_TRANSPORT_MESSAGE_HEADERS (property 0x007d) as a stream.

Thanks!
 
Jeff Stephenson [MSFT] wrote in message news:#[email protected]...
In Outlook 2003 (only), you can stream all MIME to the Internet headers
property by creating the DWORD registry value SaveAllMimeNotJustHeaders
^^^^
Is the registry entry case sensitive? I've seen SaveAllMIMENotJustHeaders
where ...MIME... is UC.
http://www.poremsky.com/view_source.htm
under the key HKCU\SOFTWARE\Microsoft\Office\11.0\Outlook\Options\Mail and
setting it to a non-zero value. You can then get this body by opening
PR_TRANSPORT_MESSAGE_HEADERS (property 0x007d) as a stream.

Just curious
 
As far as I know, registry value names are case insensitive (I've always
assumed they were, but don't see any discussion of that issue in the
documentation of the functions). In any case, the value I gave is copied
straight from the code, so if names are case-sensitive it is the correct one
to use.

--
Jeff Stephenson
Outlook Development
This posting is provided "AS IS" with no warranties, and confers no rights


Robi said:
Jeff Stephenson [MSFT] wrote in message news:#[email protected]...
In Outlook 2003 (only), you can stream all MIME to the Internet headers
property by creating the DWORD registry value SaveAllMimeNotJustHeaders
^^^^
Is the registry entry case sensitive? I've seen SaveAllMIMENotJustHeaders
where ...MIME... is UC.
http://www.poremsky.com/view_source.htm
under the key HKCU\SOFTWARE\Microsoft\Office\11.0\Outlook\Options\Mail and
setting it to a non-zero value. You can then get this body by opening
PR_TRANSPORT_MESSAGE_HEADERS (property 0x007d) as a stream.

Just curious
 
Back
Top