Bayesian Filter Destroyer?

  • Thread starter Thread starter Bob Adkins
  • Start date Start date
B

Bob Adkins

Would the nonsensical e-mails like this one ruin your Bayesian filter in
SpamPal of Spamihilator?

mathias praiseworthy pollard vest wilson knott wolfish lackadaisic keys
precocious vibrato vignette wallace preempt wuhan portia banter lampoon
verbose wooster marcus maurine preach potable poor polyhedra prank
masonite avarice populous precept woebegone bandit woolgather w willard
koch backscatter barnacle labia bambi pray waldo kinematic baronial
lampoon barrington virile kumquat bandy premeditate barry laity voiceband

I've been avoiding training my filter on such e-mails because I'm afraid of
botching it. I only get 2-3 a week of this type, but I would think that
would be enough to ruin my filters. Is that the specific purpose of these
e-mails, or just to tick you off and tempt you to reply?

Many thanks,

Bob
 
Bob Adkins scribebat:
I've been avoiding training my filter on such e-mails because I'm afraid of
botching it. I only get 2-3 a week of this type, but I would think that
would be enough to ruin my filters. Is that the specific purpose of these
e-mails, or just to tick you off and tempt you to reply?

These words will bloat your database and makes it less useful, but it
should not botch it. The sense of these useless words is get past your
filter, by using so many words as possible which are seldom used in spam
mails.
 
Would the nonsensical e-mails like this one ruin your Bayesian
filter in SpamPal of Spamihilator?

Yes, they would.

[snip bayes poison]
I've been avoiding training my filter on such e-mails because I'm
afraid of botching it. I only get 2-3 a week of this type, but I
would think that would be enough to ruin my filters.

It doesn't take a lot of bayes poison to break bayesian filtering
badly. It's better to let those emails get through, rather than filling
your filter's word lists with words you really don't want in it.
Is that the specific purpose of these e-mails, or just to tick you
off and tempt you to reply?

The content of such emails is not called bayes poison for nothing.

Tone
 
Would the nonsensical e-mails like this one ruin your Bayesian filter in
SpamPal of Spamihilator?

mathias praiseworthy pollard vest wilson knott wolfish lackadaisic keys
precocious vibrato vignette wallace preempt wuhan portia banter lampoon
verbose wooster marcus maurine preach potable poor polyhedra prank
masonite avarice populous precept woebegone bandit woolgather w willard
koch backscatter barnacle labia bambi pray waldo kinematic baronial
lampoon barrington virile kumquat bandy premeditate barry laity voiceband

I've been avoiding training my filter on such e-mails because I'm afraid of
botching it. I only get 2-3 a week of this type, but I would think that
would be enough to ruin my filters. Is that the specific purpose of these
e-mails, or just to tick you off and tempt you to reply?

One of the points( there is another) is to "poison" your bayesian filter
yes. I was going to write a long lengthy post about how it does it, but
what's the point, you can google it all up/

I think you can just train on it. It won't ruin your filters, unless you
are getting hundreads of these coming through, in which case, you have
bigger worries.
 
Bob said:
Would the nonsensical e-mails like this one ruin your Bayesian filter in
SpamPal of Spamihilator?

mathias praiseworthy pollard vest wilson knott wolfish lackadaisic keys
precocious vibrato vignette wallace preempt wuhan portia banter lampoon
verbose wooster marcus maurine preach potable poor polyhedra prank
masonite avarice populous precept woebegone bandit woolgather w willard
koch backscatter barnacle labia bambi pray waldo kinematic baronial
lampoon barrington virile kumquat bandy premeditate barry laity voiceband

I've been avoiding training my filter on such e-mails because I'm afraid of
botching it. I only get 2-3 a week of this type, but I would think that
would be enough to ruin my filters. Is that the specific purpose of these
e-mails, or just to tick you off and tempt you to reply?

Bob,
Why deal with filters and the like to begin with? If you follow
these steps in the sequence I describe, you'll stop getting spam
pretty much entirely.

1. Stop posting to usenet using your real email address. Period.
That's about the fastest way to start getting spam, so MUNGE THAT
EMAIL ADDRESS!!!

2. It's a very bad idea to set up a webpage on the server's space
provided by your ISP for doing so. Contained within your website's URL
will usually be your user ID and the ISP's domain. Put the two
together and there's your private email address! Web crawling spambots
are mighty fast and once the cat's out of the bag it's too late. I
highly recommend taking such webpages down completely.

3. Contact your ISP and have them change your user ID to a totally
random character sequence of your choice. Make sure it's the maximum
number of characters allowed.

4. Set up a Yahoo account (or some other such free online email
account if you don't like Yahoo) for those occasions when you have to
provide an email address to a company, somebody you don't know or an
idiot who likes to send e-cards to you OR when you have to register
software.

5. Configure your mail reader so that it doesn't preview HTML spam
automatically. Also, make sure that it doesn't respond to requests for
verification that the email was either received and-or read. That way
if you do happen to get spam even after following the previous steps,
you'll be able to delete it unread and they won't know that you ever
got it.

These are the steps that I've taken, and I get *no* (read that ZERO)

xxx
xxxxxxxx
xxxx xxxx
xxxx xxxx
xxxx xxxx
xxxx xxxx
xxxxxxxx
xxxx


spam. Not only that, it didn't cost me a dime!!!
 
Bob said:
Would the nonsensical e-mails like this one ruin your Bayesian filter
in SpamPal of Spamihilator?

mathias praiseworthy pollard vest wilson knott wolfish lackadaisic
keys precocious vibrato vignette wallace preempt wuhan portia banter
lampoon verbose wooster marcus maurine preach potable poor polyhedra
prank
masonite avarice populous precept woebegone bandit woolgather w
willard koch backscatter barnacle labia bambi pray waldo kinematic
baronial
lampoon barrington virile kumquat bandy premeditate barry laity
voiceband

I've been avoiding training my filter on such e-mails because I'm
afraid of botching it. I only get 2-3 a week of this type, but I
would think that would be enough to ruin my filters. Is that the
specific purpose of these e-mails, or just to tick you off and tempt
you to reply?

I use POPfile to classify mail, and I've always made sure to accurately
mark an email for what it is.
If it's spam I mark it as such, including those that use random words or
sentences unrelated to the actual content of the email.
Currently, POPfile shows that it has accurately classified 98.45% of all
the email I've received since the beginning of the year. I don't use any
other special filters or "magnets". I average about 200-300 spam
messages per day.
While this is by no means a scientific study, it doesn't seem to have
hurt the classification process in any way. It may even have helped.
 
Perhapsh you should consider using DNSBL based spam blockers such as
SpamPal. They filter less spam than Bayesian (about 90% in my case) but do
not require training and there are no false positives.
 
Perhapsh you should consider using DNSBL based spam blockers such as
SpamPal. They filter less spam than Bayesian (about 90% in my case) but do
not require training and there are no false positives.

Yes, SpamPal is my favorite at the moment. I'm just trying out Spamihilator.
I like Spamihilator because searching DNSBL's tend to be slow, and it
doesn't send marked spam to your e-mail program. I'll probably go back to
SpamPal, but Spamihilator is working ~95% with <2 weeks of training.

Bob
 
Bob Adkins said:
Yes, SpamPal is my favorite at the moment. I'm just trying out
Spamihilator. I like Spamihilator because searching DNSBL's tend to be
slow, and it doesn't send marked spam to your e-mail program. I'll
probably go back to SpamPal, but Spamihilator is working ~95% with <2
weeks of training.

Many of these DNSBL list every mailserver/IP-range that gets at least 2 or
3 spam mail reports (they don't even check if it's real spam) during the
last couple of days. That means every mid to large size ISP is listed on 1
or more of these blacklists almost every other day. Many people have lost
important mails because they or their provider use these blacklist...

I would never want to use one of these, except maybe as part of a score-
based system (where they are not the only parameter to classify something
as spam).
 
Ben Cooper said:
I use POPfile to classify mail, and I've always made sure to accurately
mark an email for what it is.
If it's spam I mark it as such, including those that use random words or
sentences unrelated to the actual content of the email.
Currently, POPfile shows that it has accurately classified 98.45% of all
the email I've received since the beginning of the year. I don't use any
other special filters or "magnets". I average about 200-300 spam
messages per day.
While this is by no means a scientific study, it doesn't seem to have
hurt the classification process in any way. It may even have helped.

These random words are not a usual part of regular email, so the so-called
"bayesian" filters can learn to catch them.
 
I would never want to use one of these, except maybe as part of a score-
based system (where they are not the only parameter to classify something
as spam).

Spampal produces no false positives for me - all mails marked as spam
are spam - so I fail to see how it is unfair in marking some servers
as spam servers without reason.

I would have thought if it is too ruthless, I would have had false
positives - but I have had none - ever.

Apologies - only one is my router log sent to me via email -= because
it has the same from and to address, spampal marks it as spam - if I
turn off this option, all emails with spoofed addresses same as mine
get caught.

PS - JC - your advice is great - I have been having an argument with
my isp to let me change my root email addy - it got caught in the spam
databases because they trawled an open list of isp homepages, where
the directory shows the part before the @ - and the rest is easy to
work out. They won't change it without charge - and I am loathe to
change isp because other than this, they are really good.

Any advice ?
 
Bob said:
Would the nonsensical e-mails like this one ruin your Bayesian filter in
SpamPal of Spamihilator?

mathias praiseworthy pollard vest wilson knott wolfish lackadaisic keys
precocious vibrato vignette wallace preempt wuhan portia banter lampoon
verbose wooster marcus maurine preach potable poor polyhedra prank
masonite avarice populous precept woebegone bandit woolgather w willard
koch backscatter barnacle labia bambi pray waldo kinematic baronial
lampoon barrington virile kumquat bandy premeditate barry laity voiceband

I've been avoiding training my filter on such e-mails because I'm afraid
of botching it. I only get 2-3 a week of this type, but I would think that
would be enough to ruin my filters. Is that the specific purpose of these
e-mails, or just to tick you off and tempt you to reply?

Many thanks,

Bob

I was getting up to ten a day. I bounced one back at it came back at me.
(BTW They all have a link to yahoo.com in the body of the mail.)
Good-bye yahoo.
So... I have trained my filters on the ISP coming in, leaving in the '@'
character, e.g.'Match all of the following' If 'From' '@icq.com'
'File into folder' 'trash'
After about 2 weeks of training they all go into the trash now.
I have just counted and there are only 9 or 10 to filter.
As each day passes I'm getting less of that crap.
 
One of the points( there is another) is to "poison" your bayesian filter
yes. I was going to write a long lengthy post about how it does it, but
what's the point, you can google it all up/
I think you can just train on it. It won't ruin your filters, unless you
are getting hundreads of these coming through, in which case, you have
bigger worries.

I get these sorts of mails every day. Very handy. Any email with this
garbage in is spam. When using mailwasher this makes life so much
easier. :-)

As for poisoning Bayes it certainly hasn't happened on my system. In
any case if it did I would just delete my database and start again
from scratch. No big deal.

Regards, John.

--
****************************************************
,-._|\ (A.C.F FAQ) http://clients.net2000.com.au/~johnf/faq.html
/ Oz \ John Fitzsimons - Melbourne, Australia.
\_,--.x/ http://www.vicnet.net.au/~johnf/welcome.htm
v http://clients.net2000.com.au/~johnf/
 
Would the nonsensical e-mails like this one ruin your Bayesian filter in
SpamPal of Spamihilator?

mathias praiseworthy pollard vest wilson knott wolfish lackadaisic keys
precocious vibrato vignette wallace preempt wuhan portia banter lampoon
verbose wooster marcus maurine preach potable poor polyhedra prank
masonite avarice populous precept woebegone bandit woolgather w willard
koch backscatter barnacle labia bambi pray waldo kinematic baronial
lampoon barrington virile kumquat bandy premeditate barry laity voiceband

I've been avoiding training my filter on such e-mails because I'm afraid of
botching it. I only get 2-3 a week of this type, but I would think that
would be enough to ruin my filters. Is that the specific purpose of these
e-mails, or just to tick you off and tempt you to reply?

They are maybe trying to avoid a bayesian filter. As these words are
quite uncommon it wouldn't do much harm. The filter will only have some
words more with a rather neutral score. If it's not much work you can
avoud training on these e-mails.
Many thanks,

Bob

Alain
 
Perhapsh you should consider using DNSBL based spam blockers such as
SpamPal. They filter less spam than Bayesian (about 90% in my case) but do
not require training and there are no false positives.

SpamPal has a Bayesian plugin and also quite good automatic whitelisting
so that false positives are less likely. Combining the two techniques
is better.

Alain
 
Many of these DNSBL list every mailserver/IP-range that gets at least 2 or
3 spam mail reports (they don't even check if it's real spam) during the
last couple of days. That means every mid to large size ISP is listed on 1
or more of these blacklists almost every other day. Many people have lost
important mails because they or their provider use these blacklist...

I would never want to use one of these, except maybe as part of a score-
based system (where they are not the only parameter to classify something
as spam).

The available DNSBL's tend to fall into several categories, varying in
their severitity - some of the alternative ones ARE unusable, as they
are essentially private lists with whole ISP's marked if they are
considered not to respond adequately to spam complaints.

Some of the lists have very LOW false positives, and will take you to
95% coverage or better - note also, that some available lists are
essentially duplicates or combinations of others, so don't waste
resources by hitting too many.

Country blocking is also highly recommended - unless you actually
EXPECT to get any legitimate mail from China, Korea, Brazil or
Argentinan ... that only leaves the US cable ISPs, which are probably
as big a source again.
 
2. It's a very bad idea to set up a webpage on the server's
space provided by your ISP for doing so. Contained within
your website's URL will usually be your user ID and the
ISP's domain. Put the two together and there's your private
email address! Web crawling spambots are mighty fast and
once the cat's out of the bag it's too late. I highly
recommend taking such webpages down completely.

I was worrying about that, too. But for some reason, I never
get spam from that account - which I never use anyway. My ISP
gave 8 addresses and each associated with webspace, I can
change them easily as far as it is not the main one.



--
RL
Unofficial Adaware Updater; Little (File) Backer Upper; Uptime
Quickie; Tray Quickie; Google Quickie; Lefty Animated
Cursors;
http://home.earthlink.net/~ringomei/page2.html
*******************************************
Places that host a list of the Pricelessware annual voting
results and information:
http://www.pricelessware.org,
http://www.pricelesswarehome.org, http://www.earths-
ocular.com/mirror/www.pricelesswarehome.org/
 
Country blocking is also highly recommended - unless you actually
EXPECT to get any legitimate mail from China, Korea, Brazil or
Argentinan ...

I don't think you can say for sure you won't expect legimate mail from
china or korea or whatever, given that the internet is a global network.
Except perhaps for very restricted email accounts design for say work
only, otherwise you never know when someone from china or korea or asia
or whatever might have some interesting comment with regards to say a
posting you made here.


that only leaves the US cable ISPs, which are probably
as big a source again.

Very true.



Aaron (my email is not munged!)
 
Back
Top