duplicate emails

  • Thread starter Thread starter Martin
  • Start date Start date
M

Martin

Hi all,

I noticed some relevant posts in this group, so thought I'd post my
query here.

I have two MSG files, both exported from PST files. They are the same
message, same sent time, same sent date, To, From, CC, BCC, Importance
etc. In fact all the fields are the same.

However, I am trying to de-duplicate these two messages using an MD5's
based partially on the body of the email.

Viewing the source of the messages (the examples are HTML messages) and
comparing them using a diffing tool reveals no differences, but viwing
the text version of the message, in one there is an extra carriage
return.

I think this could be due to different versions of outlook, or
something that has happened to the messages along the way, but I am
unsure how to tell. Does anyone have any experience of this particular
problem?

Many thanks,

Martin.
 
Don't use any kind of hash to compare MSG files. Would you consider two
message different if one MSG file stores first the sender name, then the
subject and the second MSG file first stores the subject then the sender
name? To an end user, the order in which thee properties are stored is
irrelevant, but changing the order throws any hashing off.
You need to define "sameness" of 2 MSG files. If I were you, I'd extract the
properties that you care about (minus trailing carriage returns, etc) then
compare them either separately or as a concatenated string.

Dmitry Streblechenko (MVP)
http://www.dimastr.com/
OutlookSpy - Outlook, CDO
and MAPI Developer Tool
 
Martin said:
However, I am trying to de-duplicate these two messages using an MD5's
based partially on the body of the email.

Viewing the source of the messages (the examples are HTML messages) and
comparing them using a diffing tool reveals no differences, but viwing
the text version of the message, in one there is an extra carriage
return.

I think this could be due to different versions of outlook, or
something that has happened to the messages along the way, but I am
unsure how to tell. Does anyone have any experience of this particular
problem?

Hello:

Try to compare just the message-ids, if present.

HTH
 
Back
Top