public inbox for speakup@linux-speakup.org
 help / color / mirror / Atom feed
* ot, spamassassin question
@  Gregory Nowak
   ` trev.saunders
   ` John G. Heim
  0 siblings, 2 replies; 16+ messages in thread
From: Gregory Nowak @  UTC (permalink / raw)
  To: speakup

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello all.

As you may recall, a while back, I posted a ssl cert question, in
which I described that I'm setting up e-mail for my mom on my system,
and I needed some ssl opinions. Thanks again to those who responded.

Now, however, I have a new question, this time regarding
spamassassin. Let me provide some background for my question. My mom's
e-mail is piped through spamassassin, and then piped through
maildrop. If the message is not spam, it is delivered to maildir. If
"X-Spam-Flag" or "X-Spam-Status" are YES, the message is delivered to
maildir/.Spam. When my mom reads her mail, anything in the inbox that
is spam, she moves to Spam (which puts it into maildir/.Spam/cur), and
anything in the Spam folder which is ham, she moves to her inbox
(which puts it into maildir/cur). Every 24 hours, I run a script which
I lifted from the web, and modified for my needs, which runs sa-learn
on maildir/cur, and maildir/.Spam/cur, and tags them accordingly.

Ok, here's my problem/question. She's complaining that she keeps
placing messages from certain senders into the Spam folder, but new
messages from those senders still get marked as ham. So, what I need
is a sender blacklist approach, where if a message from sender x is
learned as spam, every other message from sender x will always be
marked as spam in the future, and I need this to be done in a way
transparent to the user (I.E. my mom keeps sorting inbox, and Spam
like she has up until now). I had a look at the spamass docs, but
there's no facility that does exactly what I need as far as I can
tell. The closest thing I found are lines like
blacklist_from someone@example.com
which one can place in the prefs file. So, the idea I have is to
include a file called blacklist from the prefs file. The blacklist
file would have lines of the form I just described. The script I run
every day to tag messages (or a different script), would grep through
all files in maildir/.Spam/cur/, and extract the from address,
appending a line like "blacklist_from from@addr.ess" to the blacklist
file.

Here is where my problems are. The first problem is that echoing
anything to a file over writes everything in that file. So, what I'm
thinking of is something like:

mv blacklist blacklist.old
echo -e "blacklist_from " >blacklist.tmp
echo output_from_grep>blacklist.tmp1
cat blacklist.tmp blacklist.tmp1 blacklist.old >blacklist
rm blacklist.tmp*
rm blacklist.old

This should work in theory, but it's cumbersome, and I was wondering if
someone had a better approach.

My second problem is with grep. If I invoke grep as:

grep -i "from" maildir/.Spam/cur/* |grep -o "@" 

I get a bunch of @ signs, but I want the e-mail address from the From:
line of each message. I suspect this could be done with a regexp, but
regexps aren't one of my strengths. If someone could please also
explain how I'd invoke grep to get the desired output, I'd really
appreciate it.

If you want to see the script I run every day to tag ham and spam, so
as to get an idea of how I could integrate into it the blacklisting
functionality, let me know, and I'll send it to you, so as not to
clutter the list. Thanks very much in advance for any
help/suggestions.

Greg


- -- 
web site: http://www.romuald.net.eu.org
gpg public key: http://www.romuald.net.eu.org/pubkey.asc
skype: gregn1
(authorization required, add me to your contacts list first)

- --
Free domains: http://www.eu.org/ or mail dns-manager@EU.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAktU3bYACgkQ7s9z/XlyUyDtvACguj2D/XUevj9bcDIPej5RjtU9
LLwAoJ/LamItbImDiRNLbR2C8AavbRHJ
=sjGu
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
   ot, spamassassin question Gregory Nowak
@  ` trev.saunders
   ` John G. Heim
  1 sibling, 0 replies; 16+ messages in thread
From: trev.saunders @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux., Gregory Nowak

Hi,

first part, you problem want >> that makes the shell append instead of overwrite.


Trev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
   ot, spamassassin question Gregory Nowak
   ` trev.saunders
@  ` John G. Heim
     ` Sina Bahram
                     ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: John G. Heim @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

One thing you should check is to see if bayesian filtering is even working. 
Maybe the sa-learn command is having no effect for one reason or another. 
You can configure spamassassin to put a verbose log in the message header. 
If you do that and if bayesian filtering is working, you should see lines 
like the following in the message headers:

 *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%

That line is good. It means spamassassin used bayesian logic to figure out 
that this message was spam and added 3.5 to the spam score.  If it was only 
50% certain, it would have added less. If the probably is less than that, it 
subtracts from the score.

You can check to see if bayesian filtering is working by looking for lines 
like the one above in your message headers and seeing if spamassassin seems 
to be learning to identify spam. If those lines do not appear or if the 
probabilities don't seem to be increasing as you'd expect, you will have to 
investigate further.

Things to check:
Is bayesian filtering turned on in your spamassassin local.cf file?
Does the end user have write access to her bayesian rules database file?

----- Original Message ----- 
From: "Gregory Nowak" <greg@romuald.net.eu.org>
To: <speakup@braille.uwo.ca>
Sent: Monday, January 18, 2010 4:16 PM
Subject: ot, spamassassin question


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello all.
>
> As you may recall, a while back, I posted a ssl cert question, in
> which I described that I'm setting up e-mail for my mom on my system,
> and I needed some ssl opinions. Thanks again to those who responded.
>
> Now, however, I have a new question, this time regarding
> spamassassin. Let me provide some background for my question. My mom's
> e-mail is piped through spamassassin, and then piped through
> maildrop. If the message is not spam, it is delivered to maildir. If
> "X-Spam-Flag" or "X-Spam-Status" are YES, the message is delivered to
> maildir/.Spam. When my mom reads her mail, anything in the inbox that
> is spam, she moves to Spam (which puts it into maildir/.Spam/cur), and
> anything in the Spam folder which is ham, she moves to her inbox
> (which puts it into maildir/cur). Every 24 hours, I run a script which
> I lifted from the web, and modified for my needs, which runs sa-learn
> on maildir/cur, and maildir/.Spam/cur, and tags them accordingly.
>
> Ok, here's my problem/question. She's complaining that she keeps
> placing messages from certain senders into the Spam folder, but new
> messages from those senders still get marked as ham. So, what I need
> is a sender blacklist approach, where if a message from sender x is
> learned as spam, every other message from sender x will always be
> marked as spam in the future, and I need this to be done in a way
> transparent to the user (I.E. my mom keeps sorting inbox, and Spam
> like she has up until now). I had a look at the spamass docs, but
> there's no facility that does exactly what I need as far as I can
> tell. The closest thing I found are lines like
> blacklist_from someone@example.com
> which one can place in the prefs file. So, the idea I have is to
> include a file called blacklist from the prefs file. The blacklist
> file would have lines of the form I just described. The script I run
> every day to tag messages (or a different script), would grep through
> all files in maildir/.Spam/cur/, and extract the from address,
> appending a line like "blacklist_from from@addr.ess" to the blacklist
> file.
>
> Here is where my problems are. The first problem is that echoing
> anything to a file over writes everything in that file. So, what I'm
> thinking of is something like:
>
> mv blacklist blacklist.old
> echo -e "blacklist_from " >blacklist.tmp
> echo output_from_grep>blacklist.tmp1
> cat blacklist.tmp blacklist.tmp1 blacklist.old >blacklist
> rm blacklist.tmp*
> rm blacklist.old
>
> This should work in theory, but it's cumbersome, and I was wondering if
> someone had a better approach.
>
> My second problem is with grep. If I invoke grep as:
>
> grep -i "from" maildir/.Spam/cur/* |grep -o "@"
>
> I get a bunch of @ signs, but I want the e-mail address from the From:
> line of each message. I suspect this could be done with a regexp, but
> regexps aren't one of my strengths. If someone could please also
> explain how I'd invoke grep to get the desired output, I'd really
> appreciate it.
>
> If you want to see the script I run every day to tag ham and spam, so
> as to get an idea of how I could integrate into it the blacklisting
> functionality, let me know, and I'll send it to you, so as not to
> clutter the list. Thanks very much in advance for any
> help/suggestions.
>
> Greg
>
>
> - -- 
> web site: http://www.romuald.net.eu.org
> gpg public key: http://www.romuald.net.eu.org/pubkey.asc
> skype: gregn1
> (authorization required, add me to your contacts list first)
>
> - --
> Free domains: http://www.eu.org/ or mail dns-manager@EU.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAktU3bYACgkQ7s9z/XlyUyDtvACguj2D/XUevj9bcDIPej5RjtU9
> LLwAoJ/LamItbImDiRNLbR2C8AavbRHJ
> =sjGu
> -----END PGP SIGNATURE-----
> _______________________________________________
> Speakup mailing list
> Speakup@braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: ot, spamassassin question
   ` John G. Heim
@    ` Sina Bahram
     ` Gregory Nowak
     ` Gregory Nowak
  2 siblings, 0 replies; 16+ messages in thread
From: Sina Bahram @  UTC (permalink / raw)
  To: 'Speakup is a screen review system for Linux.'

Just a quick addition. As far as that script goes, simply echo using the >>
operator which appends instead of overwrites.

But, I agree with John, try solving the actual problem first, which means
checking to make sure all components are working.  I bet you that's where
the real problem is.

Take care,
Sina



-----Original Message-----
From: speakup-bounces@braille.uwo.ca [mailto:speakup-bounces@braille.uwo.ca]
On Behalf Of John G. Heim
Sent: Tuesday, January 19, 2010 3:44 PM
To: Speakup is a screen review system for Linux.
Subject: Re: ot, spamassassin question

One thing you should check is to see if bayesian filtering is even working. 
Maybe the sa-learn command is having no effect for one reason or another. 
You can configure spamassassin to put a verbose log in the message header. 
If you do that and if bayesian filtering is working, you should see lines 
like the following in the message headers:

 *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%

That line is good. It means spamassassin used bayesian logic to figure out 
that this message was spam and added 3.5 to the spam score.  If it was only 
50% certain, it would have added less. If the probably is less than that, it

subtracts from the score.

You can check to see if bayesian filtering is working by looking for lines 
like the one above in your message headers and seeing if spamassassin seems 
to be learning to identify spam. If those lines do not appear or if the 
probabilities don't seem to be increasing as you'd expect, you will have to 
investigate further.

Things to check:
Is bayesian filtering turned on in your spamassassin local.cf file?
Does the end user have write access to her bayesian rules database file?

----- Original Message ----- 
From: "Gregory Nowak" <greg@romuald.net.eu.org>
To: <speakup@braille.uwo.ca>
Sent: Monday, January 18, 2010 4:16 PM
Subject: ot, spamassassin question


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello all.
>
> As you may recall, a while back, I posted a ssl cert question, in
> which I described that I'm setting up e-mail for my mom on my system,
> and I needed some ssl opinions. Thanks again to those who responded.
>
> Now, however, I have a new question, this time regarding
> spamassassin. Let me provide some background for my question. My mom's
> e-mail is piped through spamassassin, and then piped through
> maildrop. If the message is not spam, it is delivered to maildir. If
> "X-Spam-Flag" or "X-Spam-Status" are YES, the message is delivered to
> maildir/.Spam. When my mom reads her mail, anything in the inbox that
> is spam, she moves to Spam (which puts it into maildir/.Spam/cur), and
> anything in the Spam folder which is ham, she moves to her inbox
> (which puts it into maildir/cur). Every 24 hours, I run a script which
> I lifted from the web, and modified for my needs, which runs sa-learn
> on maildir/cur, and maildir/.Spam/cur, and tags them accordingly.
>
> Ok, here's my problem/question. She's complaining that she keeps
> placing messages from certain senders into the Spam folder, but new
> messages from those senders still get marked as ham. So, what I need
> is a sender blacklist approach, where if a message from sender x is
> learned as spam, every other message from sender x will always be
> marked as spam in the future, and I need this to be done in a way
> transparent to the user (I.E. my mom keeps sorting inbox, and Spam
> like she has up until now). I had a look at the spamass docs, but
> there's no facility that does exactly what I need as far as I can
> tell. The closest thing I found are lines like
> blacklist_from someone@example.com
> which one can place in the prefs file. So, the idea I have is to
> include a file called blacklist from the prefs file. The blacklist
> file would have lines of the form I just described. The script I run
> every day to tag messages (or a different script), would grep through
> all files in maildir/.Spam/cur/, and extract the from address,
> appending a line like "blacklist_from from@addr.ess" to the blacklist
> file.
>
> Here is where my problems are. The first problem is that echoing
> anything to a file over writes everything in that file. So, what I'm
> thinking of is something like:
>
> mv blacklist blacklist.old
> echo -e "blacklist_from " >blacklist.tmp
> echo output_from_grep>blacklist.tmp1
> cat blacklist.tmp blacklist.tmp1 blacklist.old >blacklist
> rm blacklist.tmp*
> rm blacklist.old
>
> This should work in theory, but it's cumbersome, and I was wondering if
> someone had a better approach.
>
> My second problem is with grep. If I invoke grep as:
>
> grep -i "from" maildir/.Spam/cur/* |grep -o "@"
>
> I get a bunch of @ signs, but I want the e-mail address from the From:
> line of each message. I suspect this could be done with a regexp, but
> regexps aren't one of my strengths. If someone could please also
> explain how I'd invoke grep to get the desired output, I'd really
> appreciate it.
>
> If you want to see the script I run every day to tag ham and spam, so
> as to get an idea of how I could integrate into it the blacklisting
> functionality, let me know, and I'll send it to you, so as not to
> clutter the list. Thanks very much in advance for any
> help/suggestions.
>
> Greg
>
>
> - -- 
> web site: http://www.romuald.net.eu.org
> gpg public key: http://www.romuald.net.eu.org/pubkey.asc
> skype: gregn1
> (authorization required, add me to your contacts list first)
>
> - --
> Free domains: http://www.eu.org/ or mail dns-manager@EU.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAktU3bYACgkQ7s9z/XlyUyDtvACguj2D/XUevj9bcDIPej5RjtU9
> LLwAoJ/LamItbImDiRNLbR2C8AavbRHJ
> =sjGu
> -----END PGP SIGNATURE-----
> _______________________________________________
> Speakup mailing list
> Speakup@braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>
> 

_______________________________________________
Speakup mailing list
Speakup@braille.uwo.ca
http://speech.braille.uwo.ca/mailman/listinfo/speakup


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
   ` John G. Heim
     ` Sina Bahram
@    ` Gregory Nowak
       ` John G. Heim
     ` Gregory Nowak
  2 siblings, 1 reply; 16+ messages in thread
From: Gregory Nowak @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi John and all,

According to the comments in local.cf, the option use_bayes was set to
1 by default, so I didn't uncomment it when initially setting things
up. I've uncommented it now though, and sent a test message, to see if
there is any change in the headers, but they are as they've always
been:

"X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
        linserver.romuald.net.eu.org
X-Spam-Level:
X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,NO_RELAYS
        autolearn=ham
        version=3.2.5".

These are the headers from my test message, and the scores are higher
than 0 on other messages of course, though I'm not sure how much of
what has come in was marked higher than 5, if any of it.

I searched the spamass docs for a verbose logging option like you
mentioned, but don't see any such option. Can you please be more
specific on what I should be looking for, or what to place into
local.cf to get the desired behavior? Thanks.

Greg


On Tue, Jan 19, 2010 at 02:43:38PM -0600, John G. Heim wrote:
> One thing you should check is to see if bayesian filtering is even 
> working. Maybe the sa-learn command is having no effect for one reason or 
> another. You can configure spamassassin to put a verbose log in the 
> message header. If you do that and if bayesian filtering is working, you 
> should see lines like the following in the message headers:
>
> *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
>
> That line is good. It means spamassassin used bayesian logic to figure 
> out that this message was spam and added 3.5 to the spam score.  If it 
> was only 50% certain, it would have added less. If the probably is less 
> than that, it subtracts from the score.
>
> You can check to see if bayesian filtering is working by looking for 
> lines like the one above in your message headers and seeing if 
> spamassassin seems to be learning to identify spam. If those lines do not 
> appear or if the probabilities don't seem to be increasing as you'd 
> expect, you will have to investigate further.
>
> Things to check:
> Is bayesian filtering turned on in your spamassassin local.cf file?
> Does the end user have write access to her bayesian rules database file?
>


- -- 
web site: http://www.romuald.net.eu.org
gpg public key: http://www.romuald.net.eu.org/pubkey.asc
skype: gregn1
(authorization required, add me to your contacts list first)

- --
Free domains: http://www.eu.org/ or mail dns-manager@EU.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAktWO3sACgkQ7s9z/XlyUyCAIACdFuP6nXSH1+HMN92d754KAlh3
d0cAni5pEqXdonbLwn4iamFk1kQZdnpw
=KdvF
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
     ` Gregory Nowak
@      ` John G. Heim
         ` Gregory Nowak
  0 siblings, 1 reply; 16+ messages in thread
From: John G. Heim @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

First about verbose headers... By that I just meant that if you looked at 
the message headers and didn't see any info from spamassassin, you would 
have to change a flag. That flag turns out to be report_safe. But I think 
you must already have it set correctly or you wouldn't see the headers 
you've reproduced below.

One thought that occurs to me is that sa-learn has to have had a certain 
number of messages passed to it before it starts to apply bayesian 
filtering. IIRC, the number is 200. There's a quick way to get it to reach 
that number which is to turn on autolearning.

bayes_auto_learn 1

As you probably know, a list of rules that were used to calculate the spam 
score is listed on the line that starts with X-Spam-Status. There should be 
a bayesian rule listed  there in the form, bayes_XX where XX is the bayesian 
probability that the message is spam. There should be something listed like 
bayes_00 to bayes_99 or somewhere in between.
So for whatever reason, I think bayesian logic is not being applied. I 
doubled checked and I see that bayesian filtering is suppoesed to be on by 
default. All my spamassassin config files have it explicitly turned on and 
I'm not sure what happens if you use the defaults.

use_bayes 1
use_bayes_rules 1

The only other thing I can think of is that if you are running spamassassin 
as a daemon, you need to restart it after changing the config.





----- Original Message ----- 
From: "Gregory Nowak" <greg@romuald.net.eu.org>
To: "Speakup is a screen review system for Linux." <speakup@braille.uwo.ca>
Sent: Tuesday, January 19, 2010 5:08 PM
Subject: Re: ot, spamassassin question


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi John and all,
>
> According to the comments in local.cf, the option use_bayes was set to
> 1 by default, so I didn't uncomment it when initially setting things
> up. I've uncommented it now though, and sent a test message, to see if
> there is any change in the headers, but they are as they've always
> been:
>
> "X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
>        linserver.romuald.net.eu.org
> X-Spam-Level:
> X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,NO_RELAYS
>        autolearn=ham
>        version=3.2.5".
>
> These are the headers from my test message, and the scores are higher
> than 0 on other messages of course, though I'm not sure how much of
> what has come in was marked higher than 5, if any of it.
>
> I searched the spamass docs for a verbose logging option like you
> mentioned, but don't see any such option. Can you please be more
> specific on what I should be looking for, or what to place into
> local.cf to get the desired behavior? Thanks.
>
> Greg
>
>
> On Tue, Jan 19, 2010 at 02:43:38PM -0600, John G. Heim wrote:
>> One thing you should check is to see if bayesian filtering is even
>> working. Maybe the sa-learn command is having no effect for one reason or
>> another. You can configure spamassassin to put a verbose log in the
>> message header. If you do that and if bayesian filtering is working, you
>> should see lines like the following in the message headers:
>>
>> *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
>>
>> That line is good. It means spamassassin used bayesian logic to figure
>> out that this message was spam and added 3.5 to the spam score.  If it
>> was only 50% certain, it would have added less. If the probably is less
>> than that, it subtracts from the score.
>>
>> You can check to see if bayesian filtering is working by looking for
>> lines like the one above in your message headers and seeing if
>> spamassassin seems to be learning to identify spam. If those lines do not
>> appear or if the probabilities don't seem to be increasing as you'd
>> expect, you will have to investigate further.
>>
>> Things to check:
>> Is bayesian filtering turned on in your spamassassin local.cf file?
>> Does the end user have write access to her bayesian rules database file?
>>
>
>
> - -- 
> web site: http://www.romuald.net.eu.org
> gpg public key: http://www.romuald.net.eu.org/pubkey.asc
> skype: gregn1
> (authorization required, add me to your contacts list first)
>
> - --
> Free domains: http://www.eu.org/ or mail dns-manager@EU.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAktWO3sACgkQ7s9z/XlyUyCAIACdFuP6nXSH1+HMN92d754KAlh3
> d0cAni5pEqXdonbLwn4iamFk1kQZdnpw
> =KdvF
> -----END PGP SIGNATURE-----
> _______________________________________________
> Speakup mailing list
> Speakup@braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
       ` John G. Heim
@        ` Gregory Nowak
           ` John G. Heim
  0 siblings, 1 reply; 16+ messages in thread
From: Gregory Nowak @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John and all,

On Wed, Jan 20, 2010 at 11:12:07AM -0600, John G. Heim wrote:
> That flag turns out to be report_safe. But I think 
> you must already have it set correctly or you wouldn't see the headers  
> you've reproduced below.

Hmmm, report_safe was set to 0. I changed it to 1, and did a
invoke-rc.d spamassassin reload, with no difference. That would
suggest that changes in local.cf aren't being applied, except I know
that they are being applied. I know this, because I had to put
report_hostname linserver.romuald.net.eu.org
into local.cf to show the correct host name on which the scanning took
place. Before I added that line, the host name displayed in the
headers was localhost, so local.cf is being read, and applied it seems.

>
> One thought that occurs to me is that sa-learn has to have had a certain  
> number of messages passed to it before it starts to apply bayesian  
> filtering. IIRC, the number is 200. There's a quick way to get it to 
> reach that number which is to turn on autolearning.

It has had a lot more than 200 messages pass through it, I'd say
something on the order of a couple thousand by now, ham and spam. Yes,
autolearning is enabled too, and the headers show that it is taking
place when the score is low enough.

>
> bayes_auto_learn 1
>
> As you probably know, a list of rules that were used to calculate the 
> spam score is listed on the line that starts with X-Spam-Status. There 
> should be a bayesian rule listed  there in the form, bayes_XX where XX is 
> the bayesian probability that the message is spam. There should be 
> something listed like bayes_00 to bayes_99 or somewhere in between.
> So for whatever reason, I think bayesian logic is not being applied. I  
> doubled checked and I see that bayesian filtering is suppoesed to be on 
> by default. All my spamassassin config files have it explicitly turned on 
> and I'm not sure what happens if you use the defaults.
>
> use_bayes 1
> use_bayes_rules 1

Yeah, I'd agree that it looks like the bayesian rules aren't being
applied. After reading the above, I explicitly put
use_bayes 1
and
use_bayes_rules 1
into local.cf, did invoke-rc.d spamassassin reload, and sent another
test message, but still no joy.

>
> The only other thing I can think of is that if you are running 
> spamassassin as a daemon, you need to restart it after changing the 
> config.

Yup, did a reload as shown above. Besides that, we had a short power
outage during the night, and the server machine restarted
completely. That means that if a reload wasn't good enough to read the
config changes, I should have seen different header info this morning
after the full restart, and that's not the case.

Looks like I'm going to have lots of fun cracking this one, and once I
figure it out, I'll probably slap myself, and say "oh, what a moron I
am, that was so obvious." If you have any other ideas on what to check
for, please share, and thank you very much for your troubleshooting
help so far.

Greg


- -- 
web site: http://www.romuald.net.eu.org
gpg public key: http://www.romuald.net.eu.org/pubkey.asc
skype: gregn1
(authorization required, add me to your contacts list first)

- --
Free domains: http://www.eu.org/ or mail dns-manager@EU.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAktXaPUACgkQ7s9z/XlyUyDHCACgisQPFmO1kUkIsUz3migWXJWk
TqkAnRbeaor3UXwW8EfOOxlwpGGJf7iV
=RRxW
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
   ` John G. Heim
     ` Sina Bahram
     ` Gregory Nowak
@    ` Gregory Nowak
       ` John G. Heim
  2 siblings, 1 reply; 16+ messages in thread
From: Gregory Nowak @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi again.

One more thing that occurred to me is that the
Mail::SpamAssassin::Plugin::Bayes plugin might not be installed, is
that possible? There is a line in one of the pre files in
/etc/spamassassin that loads it, and it's not commented. I did try
perldoc Mail::SpamAssassin::Plugin::Bayes
but it tells me there's no documentation. Perl is not one of my strong
points, so I'm not sure how to check if the plugin is installed, or
how I'd install it if it's not there. If someone could please point me
in the right direction there, I'd appreciate it. Thanks again.

Greg

P.S. I don't get any errors about missing plugins during spamd
startup, or in the mail logs when spamd is scanning messages.


- -- 
web site: http://www.romuald.net.eu.org
gpg public key: http://www.romuald.net.eu.org/pubkey.asc
skype: gregn1
(authorization required, add me to your contacts list first)

- --
Free domains: http://www.eu.org/ or mail dns-manager@EU.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAktXdhIACgkQ7s9z/XlyUyCYnQCghdV30yH278bhTeXQa5aFfHVQ
ygIAoIoVRFNb9AUBiyhshB4X46vgqzAd
=97+p
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
     ` Gregory Nowak
@      ` John G. Heim
         ` Gregory Nowak
  0 siblings, 1 reply; 16+ messages in thread
From: John G. Heim @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

Well, I'm not sure about the bayesian plugin module but I know spamassassin 
checks for some modules before trying to load them and if they aren't 
available the features they are used for just don't work. So I suppose it's 
possible that spamassassin would just go on merrily checking messages even 
if the bayesian plugin is missing.

Man, now you're really testing my memory. There is a way to run the 
spamassassin service in the foreground so you can see its output. It 
displays a list of perl modules it looked for and whether it found them.  I 
think if you just say, 'man spamassassin' you can find that.

But otherwise you can also check if the module is there in the perl 
interpreter itself. Just type 'perl' at the command line and then 'Use 
Mail::SpamAssassin::Plugin::Bayes ;' Or you could write a quick script to do 
it:

#!/usr/bin/perl
use Mail::SpamAssassin::Plugin::Bayes;

If the above 2 line script works, then the plugin is installed.   If you 
installed spamassassin via a package manager, I would think all necessary 
perl modules would be installed as dependencies.   It doesn't seem likely to 
me that the plugin would be missing.

----- Original Message ----- 
From: "Gregory Nowak" <greg@romuald.net.eu.org>
To: "Speakup is a screen review system for Linux." <speakup@braille.uwo.ca>
Sent: Wednesday, January 20, 2010 3:30 PM
Subject: Re: ot, spamassassin question


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi again.
>
> One more thing that occurred to me is that the
> Mail::SpamAssassin::Plugin::Bayes plugin might not be installed, is
> that possible? There is a line in one of the pre files in
> /etc/spamassassin that loads it, and it's not commented. I did try
> perldoc Mail::SpamAssassin::Plugin::Bayes
> but it tells me there's no documentation. Perl is not one of my strong
> points, so I'm not sure how to check if the plugin is installed, or
> how I'd install it if it's not there. If someone could please point me
> in the right direction there, I'd appreciate it. Thanks again.
>
> Greg
>
> P.S. I don't get any errors about missing plugins during spamd
> startup, or in the mail logs when spamd is scanning messages.
>
>
> - -- 
> web site: http://www.romuald.net.eu.org
> gpg public key: http://www.romuald.net.eu.org/pubkey.asc
> skype: gregn1
> (authorization required, add me to your contacts list first)
>
> - --
> Free domains: http://www.eu.org/ or mail dns-manager@EU.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAktXdhIACgkQ7s9z/XlyUyCYnQCghdV30yH278bhTeXQa5aFfHVQ
> ygIAoIoVRFNb9AUBiyhshB4X46vgqzAd
> =97+p
> -----END PGP SIGNATURE-----
> _______________________________________________
> Speakup mailing list
> Speakup@braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
         ` Gregory Nowak
@          ` John G. Heim
  0 siblings, 0 replies; 16+ messages in thread
From: John G. Heim @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

A couple of notes:

1. You can dump the contents of the bayesian database:

sa-learn --dump

Among other things, that shows you how many messages you've fed to the learn 
function.

2. You can run sa-learn in debug mode with the -D option.


----- Original Message ----- 
From: "Gregory Nowak" <greg@romuald.net.eu.org>
To: "Speakup is a screen review system for Linux." <speakup@braille.uwo.ca>
Sent: Wednesday, January 20, 2010 2:35 PM
Subject: Re: ot, spamassassin question


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> John and all,
>
> On Wed, Jan 20, 2010 at 11:12:07AM -0600, John G. Heim wrote:
>> That flag turns out to be report_safe. But I think
>> you must already have it set correctly or you wouldn't see the headers
>> you've reproduced below.
>
> Hmmm, report_safe was set to 0. I changed it to 1, and did a
> invoke-rc.d spamassassin reload, with no difference. That would
> suggest that changes in local.cf aren't being applied, except I know
> that they are being applied. I know this, because I had to put
> report_hostname linserver.romuald.net.eu.org
> into local.cf to show the correct host name on which the scanning took
> place. Before I added that line, the host name displayed in the
> headers was localhost, so local.cf is being read, and applied it seems.
>
>>
>> One thought that occurs to me is that sa-learn has to have had a certain
>> number of messages passed to it before it starts to apply bayesian
>> filtering. IIRC, the number is 200. There's a quick way to get it to
>> reach that number which is to turn on autolearning.
>
> It has had a lot more than 200 messages pass through it, I'd say
> something on the order of a couple thousand by now, ham and spam. Yes,
> autolearning is enabled too, and the headers show that it is taking
> place when the score is low enough.
>
>>
>> bayes_auto_learn 1
>>
>> As you probably know, a list of rules that were used to calculate the
>> spam score is listed on the line that starts with X-Spam-Status. There
>> should be a bayesian rule listed  there in the form, bayes_XX where XX is
>> the bayesian probability that the message is spam. There should be
>> something listed like bayes_00 to bayes_99 or somewhere in between.
>> So for whatever reason, I think bayesian logic is not being applied. I
>> doubled checked and I see that bayesian filtering is suppoesed to be on
>> by default. All my spamassassin config files have it explicitly turned on
>> and I'm not sure what happens if you use the defaults.
>>
>> use_bayes 1
>> use_bayes_rules 1
>
> Yeah, I'd agree that it looks like the bayesian rules aren't being
> applied. After reading the above, I explicitly put
> use_bayes 1
> and
> use_bayes_rules 1
> into local.cf, did invoke-rc.d spamassassin reload, and sent another
> test message, but still no joy.
>
>>
>> The only other thing I can think of is that if you are running
>> spamassassin as a daemon, you need to restart it after changing the
>> config.
>
> Yup, did a reload as shown above. Besides that, we had a short power
> outage during the night, and the server machine restarted
> completely. That means that if a reload wasn't good enough to read the
> config changes, I should have seen different header info this morning
> after the full restart, and that's not the case.
>
> Looks like I'm going to have lots of fun cracking this one, and once I
> figure it out, I'll probably slap myself, and say "oh, what a moron I
> am, that was so obvious." If you have any other ideas on what to check
> for, please share, and thank you very much for your troubleshooting
> help so far.
>
> Greg
>
>
> - -- 
> web site: http://www.romuald.net.eu.org
> gpg public key: http://www.romuald.net.eu.org/pubkey.asc
> skype: gregn1
> (authorization required, add me to your contacts list first)
>
> - --
> Free domains: http://www.eu.org/ or mail dns-manager@EU.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAktXaPUACgkQ7s9z/XlyUyDHCACgisQPFmO1kUkIsUz3migWXJWk
> TqkAnRbeaor3UXwW8EfOOxlwpGGJf7iV
> =RRxW
> -----END PGP SIGNATURE-----
> _______________________________________________
> Speakup mailing list
> Speakup@braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
       ` John G. Heim
@        ` Gregory Nowak
           ` Gregory Nowak
           ` John G. Heim
  0 siblings, 2 replies; 16+ messages in thread
From: Gregory Nowak @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ok, I think I figured it out, thanks John for reminding me of
sa-learn's --dump flag. It's a combination of 2 problems. The first
problem is that yes, I really am a moron. When I was running the
spam/ham learning script out of /etc/cron.daily, I was searching
through my mom's saved ham, and spam. However, I forgot to add the
- --dbpath option to sa-learn when calling it in the script. The result
is that I happen to have a /root/.spamassassin directory now, with the
files bayes_seen  bayes_toks in it. I've had a look at the sa-learn
man page, but don't see a way to merge that data with the data in my
mom's .spamassassin directory, that was collected through
autolearning. I can backup the data from /root/.spamassassin/*, but I
don't see a way to append it to the existing data in my mom's
directory. If someone knows of a way to do that, without destroying
the data already there, please let me know.

The second problem is that my mom's database contains less than 200
messages of either ham, or spam. When I said she's had more than a
couple thousand messages come through, I really did think that. So,
I'm either wrong on that figure, and that many didn't come in, or not
enough of them were autolearned from. If I were to merge somehow the 2
sets of data, she still wouldn't have 200 of either one, but she'd be
a lot closer there as far as ham is concerned.

BTW John, thanks for that small perl script. Yes, the bayse plugin is
installed, or at least I assume it is, since I just got the shell
prompt back after running it. Thanks a lot again John for your help in
troubleshooting this, I really appreciate it.

Greg


On Wed, Jan 20, 2010 at 04:33:05PM -0600, John G. Heim wrote:
> Well, I'm not sure about the bayesian plugin module but I know 
> spamassassin checks for some modules before trying to load them and if 
> they aren't available the features they are used for just don't work. So 
> I suppose it's possible that spamassassin would just go on merrily 
> checking messages even if the bayesian plugin is missing.
>
> Man, now you're really testing my memory. There is a way to run the  
> spamassassin service in the foreground so you can see its output. It  
> displays a list of perl modules it looked for and whether it found them.  
> I think if you just say, 'man spamassassin' you can find that.
>
> But otherwise you can also check if the module is there in the perl  
> interpreter itself. Just type 'perl' at the command line and then 'Use  
> Mail::SpamAssassin::Plugin::Bayes ;' Or you could write a quick script to 
> do it:
>
> #!/usr/bin/perl
> use Mail::SpamAssassin::Plugin::Bayes;
>
> If the above 2 line script works, then the plugin is installed.   If you  
> installed spamassassin via a package manager, I would think all necessary 
> perl modules would be installed as dependencies.   It doesn't seem likely 
> to me that the plugin would be missing.


- -- 
web site: http://www.romuald.net.eu.org
gpg public key: http://www.romuald.net.eu.org/pubkey.asc
skype: gregn1
(authorization required, add me to your contacts list first)

- --
Free domains: http://www.eu.org/ or mail dns-manager@EU.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAktXoLsACgkQ7s9z/XlyUyCKnwCdEcp39RE/vo8KI0/Kfa642OFA
WRMAn2Cqz7lx4Tt7Ikg5JA5SL6nEck5a
=iLIm
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
         ` Gregory Nowak
@          ` Gregory Nowak
           ` John G. Heim
  1 sibling, 0 replies; 16+ messages in thread
From: Gregory Nowak @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ok, for the record, looks like I managed to merge the 2 sets of
data. I ran sa-learn --backup on both sets. I then adjusted the spam,
and not spam number to be the total from both sets, and pasted the
rest of one of the files to the end of the other. I than ran sa-learn
- --restore with the updated file to import it into my mom's directory,
and it looks to have gone fine. Sa-learn --dump is now reporting the
correct numbers for spam, ham, and seen tokens on her set of data.

Greg


On Wed, Jan 20, 2010 at 05:33:00PM -0700, Gregory Nowak wrote:
> Ok, I think I figured it out, thanks John for reminding me of
> sa-learn's --dump flag. It's a combination of 2 problems. The first
> problem is that yes, I really am a moron. When I was running the
> spam/ham learning script out of /etc/cron.daily, I was searching
> through my mom's saved ham, and spam. However, I forgot to add the
> --dbpath option to sa-learn when calling it in the script. The result
> is that I happen to have a /root/.spamassassin directory now, with the
> files bayes_seen  bayes_toks in it. I've had a look at the sa-learn
> man page, but don't see a way to merge that data with the data in my
> mom's .spamassassin directory, that was collected through
> autolearning. I can backup the data from /root/.spamassassin/*, but I
> don't see a way to append it to the existing data in my mom's
> directory. If someone knows of a way to do that, without destroying
> the data already there, please let me know.
> 
> The second problem is that my mom's database contains less than 200
> messages of either ham, or spam. When I said she's had more than a
> couple thousand messages come through, I really did think that. So,
> I'm either wrong on that figure, and that many didn't come in, or not
> enough of them were autolearned from. If I were to merge somehow the 2
> sets of data, she still wouldn't have 200 of either one, but she'd be
> a lot closer there as far as ham is concerned.
> 
> BTW John, thanks for that small perl script. Yes, the bayse plugin is
> installed, or at least I assume it is, since I just got the shell
> prompt back after running it. Thanks a lot again John for your help in
> troubleshooting this, I really appreciate it.
> 
> Greg
> 
> 


- -- 
web site: http://www.romuald.net.eu.org
gpg public key: http://www.romuald.net.eu.org/pubkey.asc
skype: gregn1
(authorization required, add me to your contacts list first)

- --
Free domains: http://www.eu.org/ or mail dns-manager@EU.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAktXsQIACgkQ7s9z/XlyUyBMLACeMbd7LBnBGDDws6cANYSZuvb5
7D8AnjLGsjVcbCmzc+H7IUlkJbhyHAxu
=WRv6
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
         ` Gregory Nowak
           ` Gregory Nowak
@          ` John G. Heim
             ` Gregory Nowak
  1 sibling, 1 reply; 16+ messages in thread
From: John G. Heim @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

You're welcome. I'm just glad I could help for once instead of being helped.

One additional note... To get over the 200 message threshold, you might try 
running sa-learn on the messages in the spam folder. I'm guessing you have 
procmail putting messages marked as spam into a spam folder. Make sure they 
really are all spam and then feed those messages to sa-learn.  Messages 
marked as spam won't be used by the auto learn feature unless their spam 
score is really high in the first place.

----- Original Message ----- 
From: "Gregory Nowak" <greg@romuald.net.eu.org>
To: "Speakup is a screen review system for Linux." <speakup@braille.uwo.ca>
Sent: Wednesday, January 20, 2010 6:33 PM
Subject: Re: ot, spamassassin question


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Ok, I think I figured it out, thanks John for reminding me of
> sa-learn's --dump flag. It's a combination of 2 problems. The first
> problem is that yes, I really am a moron. When I was running the
> spam/ham learning script out of /etc/cron.daily, I was searching
> through my mom's saved ham, and spam. However, I forgot to add the
> - --dbpath option to sa-learn when calling it in the script. The result
> is that I happen to have a /root/.spamassassin directory now, with the
> files bayes_seen  bayes_toks in it. I've had a look at the sa-learn
> man page, but don't see a way to merge that data with the data in my
> mom's .spamassassin directory, that was collected through
> autolearning. I can backup the data from /root/.spamassassin/*, but I
> don't see a way to append it to the existing data in my mom's
> directory. If someone knows of a way to do that, without destroying
> the data already there, please let me know.
>
> The second problem is that my mom's database contains less than 200
> messages of either ham, or spam. When I said she's had more than a
> couple thousand messages come through, I really did think that. So,
> I'm either wrong on that figure, and that many didn't come in, or not
> enough of them were autolearned from. If I were to merge somehow the 2
> sets of data, she still wouldn't have 200 of either one, but she'd be
> a lot closer there as far as ham is concerned.
>
> BTW John, thanks for that small perl script. Yes, the bayse plugin is
> installed, or at least I assume it is, since I just got the shell
> prompt back after running it. Thanks a lot again John for your help in
> troubleshooting this, I really appreciate it.
>
> Greg
>
>
> On Wed, Jan 20, 2010 at 04:33:05PM -0600, John G. Heim wrote:
>> Well, I'm not sure about the bayesian plugin module but I know
>> spamassassin checks for some modules before trying to load them and if
>> they aren't available the features they are used for just don't work. So
>> I suppose it's possible that spamassassin would just go on merrily
>> checking messages even if the bayesian plugin is missing.
>>
>> Man, now you're really testing my memory. There is a way to run the
>> spamassassin service in the foreground so you can see its output. It
>> displays a list of perl modules it looked for and whether it found them.
>> I think if you just say, 'man spamassassin' you can find that.
>>
>> But otherwise you can also check if the module is there in the perl
>> interpreter itself. Just type 'perl' at the command line and then 'Use
>> Mail::SpamAssassin::Plugin::Bayes ;' Or you could write a quick script to
>> do it:
>>
>> #!/usr/bin/perl
>> use Mail::SpamAssassin::Plugin::Bayes;
>>
>> If the above 2 line script works, then the plugin is installed.   If you
>> installed spamassassin via a package manager, I would think all necessary
>> perl modules would be installed as dependencies.   It doesn't seem likely
>> to me that the plugin would be missing.
>
>
> - -- 
> web site: http://www.romuald.net.eu.org
> gpg public key: http://www.romuald.net.eu.org/pubkey.asc
> skype: gregn1
> (authorization required, add me to your contacts list first)
>
> - --
> Free domains: http://www.eu.org/ or mail dns-manager@EU.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAktXoLsACgkQ7s9z/XlyUyCKnwCdEcp39RE/vo8KI0/Kfa642OFA
> WRMAn2Cqz7lx4Tt7Ikg5JA5SL6nEck5a
> =iLIm
> -----END PGP SIGNATURE-----
> _______________________________________________
> Speakup mailing list
> Speakup@braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ot, spamassassin question
           ` John G. Heim
@            ` Gregory Nowak
               ` Spam protection strategies, was:x " Chuck Hallenbeck
  0 siblings, 1 reply; 16+ messages in thread
From: Gregory Nowak @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John and all,

yes, like I said before, I have my script run every 24 hours on the
messages my mom left as misidentified spam, or ham, so sa-learn is
being trained that way. Also, yes, the messages are being segregated
based on the x-spam-status header using maildrop actually, since I
personally prefer that to procmail. Thanks again.

Greg


On Thu, Jan 21, 2010 at 10:32:35AM -0600, John G. Heim wrote:
> You're welcome. I'm just glad I could help for once instead of being helped.
>
> One additional note... To get over the 200 message threshold, you might 
> try running sa-learn on the messages in the spam folder. I'm guessing you 
> have procmail putting messages marked as spam into a spam folder. Make 
> sure they really are all spam and then feed those messages to sa-learn.  
> Messages marked as spam won't be used by the auto learn feature unless 
> their spam score is really high in the first place.
>


- -- 
web site: http://www.romuald.net.eu.org
gpg public key: http://www.romuald.net.eu.org/pubkey.asc
skype: gregn1
(authorization required, add me to your contacts list first)

- --
Free domains: http://www.eu.org/ or mail dns-manager@EU.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAktYsVIACgkQ7s9z/XlyUyBQmACfR5LhnEWaYHw6l8wPvACvnpKW
G7QAoI86inHuoktSYqwkpSkLyq9EZXve
=AuSZ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Spam protection strategies, was:x ot, spamassassin question
             ` Gregory Nowak
@              ` Chuck Hallenbeck
                 ` John G. Heim
  0 siblings, 1 reply; 16+ messages in thread
From: Chuck Hallenbeck @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

Greg, John, and all

On Thu, Jan 21, 2010 at 12:56:02PM -0700, Gregory Nowak wrote:
> 
> yes, like I said before, I have my script run every 24 hours on the
> messages my mom left as misidentified spam, or ham, so sa-learn is
> being trained that way. Also, yes, the messages are being segregated
> based on the x-spam-status header using maildrop actually, since I
> personally prefer that to procmail. Thanks again.
> 
> Greg

Congratulations on taming your spamassassin configuration. I just
finished installing a different spam protection here, much easier, and
with a different strategy. It uses vipul's "razor" and integrates
nicely with procmailrc and mutt. Razor is a collaborative spam
protection scheme, and works a lot like "denyhosts" in that false
positives and false negatives are reported to a distributed set of
hosts one by one as they are encountered, and each email the user
checks for spam is tested against the collective data base. It is the
consensus of razor users that distinguishes between ham and spam, and
that consensus is constantly being updated as the FP's and FN's are
reported. Also, each user acquires a "trust level" as his reports are
made, based on the soundness of his judgment in making the report.

Advantages: It's a smart filter out of the box, already trained by
other users.

Disadvantages: It doesn't work too well if an individual user has
ideosyncratic criteria for distinguishing ham/spam, since it stresses
group consensus on that question.

I divert incoming mail that is flagged by razor to a spam folder, and
occasionally scan it for false positives, reporting them when I find
them, by a keypress in mutt. As I examine mail that has not been
flagged and diverted to my spam folder, I occasionally find a false
negative, and simply report that to the network by another mutt
keypress.

I wonder what your thoughts are about such a collaborative data base
for spam protection?  It seems to me that it has some strengths that
"going it alone" with the bayesian model for isolated individuals may
lack.


Chuck



-- 
The Moon is Waxing Crescent (35% of Full)
      Either of these web addresses will take you to my web site:
          www.mhcable.com/~chuckh, or www.hallenbeck.ftml.net
                Audio editor weblog: edway.wordpress.com
               Or jabber 1on1 with me, chuckh1@jabber.org
                                --------
 People in general do not willingly read if they have anything else to
amuse them. -- S. Johnson
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Spam protection strategies, was:x ot, spamassassin question
               ` Spam protection strategies, was:x " Chuck Hallenbeck
@                ` John G. Heim
  0 siblings, 0 replies; 16+ messages in thread
From: John G. Heim @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

From: "Chuck Hallenbeck" <chuckh@ftml.net>
> Congratulations on taming your spamassassin configuration. I just
> finished installing a different spam protection here, much easier, and
> with a different strategy. It uses vipul's "razor" and integrates
> nicely with procmailrc and mutt. Razor is a collaborative spam
> protection scheme, and works a lot like "denyhosts" in that false
> positives and false negatives are reported to a distributed set of
> hosts one by one as they are encountered, and each email the user
> checks for spam is tested against the collective data base. It is the
> consensus of razor users that distinguishes between ham and spam, and
> that consensus is constantly being updated as the FP's and FN's are
> reported. Also, each user acquires a "trust level" as his reports are
> made, based on the soundness of his judgment in making the report.

Spamassassin can be configuredt to use razor. We use bayesian rules, some 
downloaded rules, my own custom rules, razor and dcc (Distributed Checksom 
Clearinghouse) in determining the spam score for a message.

I have not seen a false positive in my own mail and no user has reported a 
false positive for about 2 years. And I probably catch about 98% of spam.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~ UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
 ot, spamassassin question Gregory Nowak
 ` trev.saunders
 ` John G. Heim
   ` Sina Bahram
   ` Gregory Nowak
     ` John G. Heim
       ` Gregory Nowak
         ` John G. Heim
   ` Gregory Nowak
     ` John G. Heim
       ` Gregory Nowak
         ` Gregory Nowak
         ` John G. Heim
           ` Gregory Nowak
             ` Spam protection strategies, was:x " Chuck Hallenbeck
               ` John G. Heim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).