Best way to build Paranoid dictionary

Spam-detecting plug-in
Junior Member
Posts: 53
Joined: Sun Aug 24, 2008 12:10 am

Best way to build Paranoid dictionary

Postby secsol » Tue Jul 21, 2009 6:47 am

Hi,

Currently I just have a folder with 2 sub folders - Ham and Spam.
Each folder has exactly 17.012 emails.

Now I am thinking okay I guess my dictionary works okay now, however it has alot of problems with blocking russian emails that should not have been blocked but i am working on that.

Do you have any tips on how I can improve for a new dictionary?
Is this too many emails to base the dictionary on?
SecSol Security Solutions
Please visit us @ http://www.secsol.dk

Official reseller of Eset NOD32 Antivirus and Simple DNS Plus

Developer
User avatar
Posts: 4431
Joined: Tue Apr 20, 2004 3:43 pm

Postby Alexander Telegin » Thu Jul 23, 2009 10:25 am

Hm... I never tried to build dictionary from such large corpora of emails, but should be OK.

Junior Member
Posts: 53
Joined: Sun Aug 24, 2008 12:10 am

Postby secsol » Mon Jul 27, 2009 12:44 pm

How many emails do you normally use?
SecSol Security Solutions
Please visit us @ http://www.secsol.dk

Official reseller of Eset NOD32 Antivirus and Simple DNS Plus

Developer
User avatar
Posts: 4431
Joined: Tue Apr 20, 2004 3:43 pm

Postby Alexander Telegin » Mon Jul 27, 2009 6:19 pm

I usually build dictionary from 2000-3000 emails.

Junior Member
Posts: 53
Joined: Sun Aug 24, 2008 12:10 am

Postby secsol » Thu Jul 30, 2009 9:35 am

How do you select what messages are best from a large corpus?
SecSol Security Solutions
Please visit us @ http://www.secsol.dk

Official reseller of Eset NOD32 Antivirus and Simple DNS Plus

Developer
User avatar
Posts: 4431
Joined: Tue Apr 20, 2004 3:43 pm

Postby Alexander Telegin » Thu Jul 30, 2009 9:33 pm

I exclude spam duplicates and sort them by appearance time in descend order, then get about 2000-3000 depending on how much fresh legit emails I have.

Return to Paranoid - General

Who is online

Users browsing this forum: No registered users and 1 guest