SCOUG Logo


Next Meeting: Sat, TBD
Meeting Directions


Be a Member
Join SCOUG

Navigation:


Help with Searching

20 Most Recent Documents
Search Archives
Index by date, title, author, category.


Features:

Mr. Know-It-All
Ink
Download!










SCOUG:

Home

Email Lists

SIGs (Internet, General Interest, Programming, Network, more..)

Online Chats

Business

Past Presentations

Credits

Submissions

Contact SCOUG

Copyright SCOUG



warp expowest
Pictures from Sept. 1999

The views expressed in articles on this site are those of their authors.

warptech
SCOUG was there!


Copyright 1998-2024, Southern California OS/2 User Group. ALL RIGHTS RESERVED.

SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group. OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation. All other trademarks remain the property of their respective owners.

The Southern California OS/2 User Group
USA

SCOUG-HELP Mailing List Archives

Return to [ 23 | March | 2003 ]

<< Previous Message << >> Next Message >>


Date: Sun, 23 Mar 2003 10:41:19 PST8
From: "Steven Levine" <steve53@earthlink.net >
Reply-To: scoug-help@scoug.com
To: scoug-help@scoug.com
Subject: SCOUG-Help: Junk Spy filtering (was: Mozilla profiles)

=====================================================
If you are responding to someone asking for help who
may not be a member of this list, be sure to use the
REPLY TO ALL feature of your email program.
=====================================================

In <3E7D5F39.6631@peterskye.com>, on 03/22/03
at 11:15 PM, Peter Skye said:

>to write several hundred of their own filters and then have to worry
>about maintaining them as the spammers get smarter. (On one of my email
>accounts Junk Spy typically catches 100% of the spam; on my others which
>I combine Junk Spy catches over 90%.)

Currently, it appears I need 7 filters for spam. I could have less, but I
have to workaround a couple of ICE filter expression defects. Those
spammers are incredibly consistent. I clear the Review for Delete folder
about twice a day. I just cleared it to see what the count was and there
where approx 25 spams. Any spam that gets left in the Inbox gets moved to
the Unfiltered Spam folder. There are 17 messages in there since the 1st
of March. I'll let you do the math. You are a much better mathematician
than I.

>they create by reviewing spam messages. There's a Bayesian movement to
>have a massive group of volunteers send in their results, but this
>becomes an incredible mess when the spammers of the world surreptitiously
>join that same movement and start supplying their own results to the

It's also, in my IMNSHO, the worst way to use a Bayesian filter. To be
effective, a Bayesian filter must be trained for the spam you receive and
the corpus should be as small as possible.

I use Polarbar for the Mr. KIA list because the spam level is extremely
high these days. The corpus is 22 bad messages and 4 good messages. I'm
actually a bit short on good messages and might decide to train the filter
with a few more known good messages. What I'm doing now is a bit of an
experiment. I wanted to see how long it would take to achieve high
accuracy starting from an empty corpus and just marking misindentified
messages. The marked messages were by definition not identified
correctly.

Currently, the list receives about 10 spams per day and maybe 1 good
message a week. The list does need to be promoted. :-) I estimate the
current accuracy is in the 80% range, but that will get better as good
messages show up.

Several folks on the Polarbar list tried to train their corpus with spam
archives with expected not so good results. I suspect they all started
over and are now achieving better results.

One thing Polarbar needs, and will probably get eventually, is a tool to
thin the corpus of words that are not contributing to the final result.

>And I'm not knocking your custom filters, Steven. But you should add
>Junk Spy to your email chain and put its massive spam filter after your
>own.

I'll consider this if my filters start missing more than 1 message a day.

>That's 200 junk mails a week I don't have to see -- or 10,000 a year.
>The time savings is incredible and it requires _no_ work on my part.

Oh, I agree. The only maintenance my folders need at the moment, if you
want to call it that, is I need to add new addresses to my address book,
if the filters misidentify a message.

>Care to share your MR/2 filters?

Sure. You are not the first to ask. There's always a relatively
up-to-date copy at:

http://home.earthlink.net/~steve53/mr2i/MyICEFilters.txt

IAC, thanks. You reminded me it was time to refresh the copy I maintain
there.

Look for the enabled filters that are tagged as spam filters.

To give you an idea of how much maintenance this takes, these are the
filter control files as of today:

mr2i .flt 5,831 .a.. 2-28-03 14:49:58
OKFromFields .txt 429 .a.. 2-22-03 14:53:02
OKToFields .txt 288 .a.. 3-05-03 16:03:50

As you can see, I don't need to maintain them very often and they are not
all that large. If Nick would fix a few defects in this filter
expressions, I could lose the whitelist files.

IIRC, all I did the last time I changed mr2i.flt was delete a bunch of
disabled filters. The last major update to the spam filters was probably
about 6 months ago when I completed the migration to my current approach.
This was when I added the spam logging. This made it rather easy to
figure out which filters were doing all the work. Over time, I was able
to delete several filters that were not paying their way. Even so, I
never had more than 15 spam filters. They were just not as effective.

Steven

--
---------------------------------------------------------------------
"Steven Levine" MR2/ICE 2.35 #10183 Warp4/FP15/14.085_W4
www.scoug.com irc.webbnet.org #scoug (Wed 7pm PST)
---------------------------------------------------------------------

=====================================================

To unsubscribe from this list, send an email message
to "steward@scoug.com". In the body of the message,
put the command "unsubscribe scoug-help".

For problems, contact the list owner at
"rollin@scoug.com".

=====================================================


<< Previous Message << >> Next Message >>

Return to [ 23 | March | 2003 ]



The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA

Copyright 2001 the Southern California OS/2 User Group. ALL RIGHTS RESERVED.

SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group. OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation. All other trademarks remain the property of their respective owners.