MB_ said:
From time to time, I get a bunch of emails with garbage phrases, such
as:
might ultimately make it worth while to his colleagues to purchase his
Vargrave upon the subject which had made some sensation abroad as well
estate was long since too deeply mortgaged to afford new security
respectably on a hundred and eighty pounds a year in a cottage with
one
cannot feel at ease with him a coldness a hauteur a measured
What are these???
MB
Do you get one good message from the sender followed by one, or more, of
these garbage messages? If so, my guess is that your sender is infected
and their trojanized computer is spewing out this crap in an attempt to
poison Bayesian filter programs (OL2003, SpamPal, SpamBayes, etc.) by
getting the Bayesian filter to consider them as non-spam messages
although they are sneaking in words like "ultimately", "purchase",
"Vargrave", "sensation", "mortgaged", and so on. Then when they spew
their crap later that has these words, they hope the weighting on these
poison words will offset the weighting of any "bad" words in their spam.
They try to poison your Bayesian filter to sneak in later crap.
It might work on new setups where the Bayesian filter has had no
training yet (where you feed it a set of known good messages to weight
their words as good and a set of known bad messages to weight their
words as bad). Some Bayesian filters also learn more slowly than other.
If the Bayesian filter can learn from other sources that detect spam
then it knows faster what is and is not spam; i.e., it doesn't rely on
just itself to detect bad e-mails. Some Bayesian filters don't let you
do any initial training, they come with some pre-configured database for
word weighting (which won't match YOUR experience with e-mail content),
and only train thereafter with subsequently received e-mails (OL2003 is
like this). Since Bayesian filters [should] only pick out a select few
keywords from a message, like maybe 15 of them, which is based on their
existing weighting in your Bayesian database, then it is unlikely that
these garbage messages will poison your Bayesian filter. It helps if
the Bayesian filter lets the user change the status of a received e-mail
from good to bad so the weighting can be updated correctly to remove any
poisoning or to catch false negatives. If the Bayesian filter has word
expiry to raise the noise floor by cleaning out old words with least
weighting from the database that haven't been used in a long time in
your e-mails then the poisoning is also removed.
They try, it fails. Diligence doesn't require intelligence.