2017-03-17 00:00:00 +0000
We have released the new stable version of Rspamd today. It includes couple of important fixes and improvements. Here is the list of the most important ones.
Base64 decoding fix
We have found and resolved a serious flaw in the current base64 decoder in Rspamd. It could lead to the corrupted output in case if decoder finds out some non base64 characters, for example, spaces or newlines. This bug could affect statistics, fuzzy checks and couple of other fields in Rspamd. Hence, we recommend to update to 1.5.3
as soon as possible.
Redis history
This release includes experimental feature that allows to save history in Redis. There is the initial WebUI support of this feature, however, it is not now enabled by default. In future, we plan to enable it and to enchance history with a set of new options:
- displaying of sender and recipient in history table;
- support of symbols options;
- clustered history;
- dynamic load of history rows;
- compressed history;
All these features are implemented for the backend part (namely, Rspamd controller) but it still requires some major rework of the web interface itself, therefore, this work is postponed till the next version.
Dkim plugin improvements
DKIM signing module now supports the type of private key passed to the module: in addition to PEM format stored in a file, DKIM signing now supports raw keys, base64 encoded keys and PEM keys from raw strings.
DKIM signing now also supports maps for selecting domains to sign.
Other plugins improvements
greylist
plugin now supports excluding low-scoring messages from greylisting
whitelist
plugin can now load list of maps
ratelimit
plugin now excludes greylisted messages
metadata exporter
uses rule-specific settings for emails
metadata exporter
can now use non-ASCII characters in reports
Rules update
Here is the list of rules that are fixed or reworked:
URI_COUNT_ODD
rule now excludes visual URLs which reduces its FP rate
RCPT_COUNT*
and HAS_X_PRIO*
rules are reworked to the normal Rspamd symbols conventions
misc.lua
has been split to multiple modules that share the common rules
Other bugs fixes
- imported important fixes for
ac-trie
module
- fixed local networks proxying
- fixed memory corruption in periodic tasks during worker cleanup phase
- fixed subject rewriting
- improved
zstd
lua API to avoid extra reallocation
2017-03-01 00:00:00 +0000
We are pleased to announce the new major Rspamd release 1.5
today. This release includes a lot of major reworks, new cool features and a significant number of bugs being fixed. The update from the previous versions shouldn’t be hard, however, please check the migration document to be sure that the new version will not break the existing configuration.
Here is a list of major changes for this version.
New MIME parser
Rspamd has used the GMime library for a very long time but we decided to switch to from it for several reasons.
The main problem is that Rspamd requires very precise control of MIME parsing as it has to deal with broken messages not for displaying purposes but for extracting data from them. This procedure has some simplifications and some complications comparing to a generic MIME parser, such as GMime: for example, we do not need to support streaming mode but we have to deal with many non-standard messages that are intended to be parsed incorrectly by some adversary side, e.g. spammers. The current architecture is described here: https://gist.github.com/vstakhov/937f253d5935ee4158688932589b1dcc
Through use of the new parser, Rspamd can now deal with the following messages:
- Messages with redundant
Content-Type
headers:
Content-Type: text/plain
Content-Type: multipart/alternative
Currently, Rspamd always prefers multipart types over plain types and text types unless there is not specific binary type (e.g. if there is text/plain
and application/octet-stream
)
- Messages with broken multipart structures:
- new parts after closing boundary (e.g. attachment in multipart/mixed after the closing part)
- incorrect inheritance
- incorrect multipart type (now Rspamd just ignores the exact
multipart/*
type)
- Filenames that are badly encoded (non-utf8)
- Incorrect
Content-Transfer-Encoding
(now heuristic based):
8bit
when content is Base64
base64
or qp
when content is 8bit
- Bad
Content-Type
, e.g. text
- Messages with no headers or messages with no body
- Messages with mixed newlines in headers and/or body
Switching from libiconv to libicu
Rspamd has switched charset conversion from libiconv
to libicu
. This allowed to speed up the conversion time since libicu
is much faster (~100MB of text from windows-1251
to utf-8
):
0,83s user 0,08s system 98% cpu 0,921 total - iconv
0,36s user 0,07s system 95% cpu 0,450 total - libicu
Furthermore, switching to libicu
allowed for implementation of many useful features:
- heuristic charset detection (NGramms for 1byte charsets);
- visual obfuscation detector (e.g.
google.com
-> gооgle.com
)
- better IDNA processing
- better unicode manipulation
WebUI rework
The Web interface has been reworked for better representation and configuration:
- The web interface now supports displaying & aggregating statistics from a cluster of Rspamd machines
- The internal structure of the Web Interface has changed to a set of modules so that new features could be implemented without touching the overall logic
- The throughput graph has been improved and now displays a small pie chart for the specified time range
Lua TCP module rework
In Rspamd 1.5
Lua TCP module now supports complex protocols with dialogs and states similar to AnyEvent
module in Perl. For example, it is now possible to set a reaction for each communication stage and perform full SMTP or IMAP dialog.
URL redirector module
URL shorteners and redirectors are part of the modern email ecosystem and they are widely used in many emails, both legitimate and not (e.g. in Spam and Phishing). Rspamd has an old and outdated utility service that is intended to resolve such redirects called redirector.pl
. It is written in Perl and hasn’t been updated for a long time. It has a long dependencies list and performs a lot of unnecessary tasks. In Rspamd 1.5, there is a new lightweight lua redirector module which is intented to resolve URLs redirect in a more efficient and simple way. Dereferenced links are processed by SURBL module and added as tags for other modules. Redis is used for caching. This module is not enabled by default so far, but it can easily be enabled by placing redirector_hosts_map = "/etc/rspamd/redirectors.inc";
in /etc/rspamd/local.d/surbl.conf
.
The Rmilter headers module provides an easy way to add common headers; support is available for Authentication-Results, SpamAssassin-compatible headers and user-defined headers among others.
DKIM signing module
The DKIM signing module provides a simple policy-based approach to DKIM signing similar to Rmilter. It supports multiple cool features, for example, you can now store your DKIM keys in Redis.
Force actions module
The Force actions module provides a way to force actions for messages based on flexible conditions (an expression consisting of symbols to verify presence/absence of & the already-assigned action of a message), optionally setting SMTP messages & rewritten subjects.
Configuration of this module has been reworked to provide more flexible operation & library functions have been added to provide JSON-formatted general message metadata, e-Mail alerts and more - making this module readily useful for quarantines, logging & alerting.
URLs can now be assigned tags and it is the job of the URL tags plugin to persist these in Redis for a period of time; which could be used to avoid redundant checks.
URL reputation plugin
The URL reputation plugin filters URLs for relevance and assigns dynamic reputation to selected TLDs which is persisted in Redis.
Multimap ‘received’ maps
Now multimap can be used to match information extracted from Received headers (which could be filtered based on their position in the message). It is also possible to use SMTP HELO
messages in maps for this module. There are also new URL filters, SMTP message setup depending on map data and the ability to skip archives checks for certain filetypes or maps.
Changes in RBL module
Support has been added for using hashes in email
and helo
RBLs (so that information which can’t be represented in a DNS record could be queried).
Support for Avira SAVAPI in antivirus module
Rspamd antivirus module now supports AVIRA antivirus. This code has been contributed by Christian Rößner.
Neural net plugin improvements
We have fixed couple of issues in the neural network plugin allowing to have multiple configurations in the cluster. We have also fixed couple of issues with storing and loading of learning vectors especially in errors handling paths. New metatokens have been added to improve neural network classification quality.
Fuzzy matching for images
Rspamd fuzzy hashing now support matching of the images attached to emails checked. To enable this feature, Rspamd should be built with libgd
support (provided by the pre-built packages). However, this feature is not currently enabled by default as it seems to be too aggressive when used in conjunction with large fuzzy storages producing a lot of false positive hits.
New rules
There are couple of new rules added to Rspamd 1.5:
OMOGRAPH_URL
: detects visually confusable URLs
FROM_NAME_HAS_TITLE
: fixed title match
- Add
REPLYTO_EMAIL_HAS_TITLE
rule
- Add
FROM_NAME_EXCESS_SPACE
rule
Rspamadm grep
A grep
-like tool inspired by exigrep
has been added to rspamadm- see rspamadm grep --help
for usage information: this provides a convenient way to produce logically collated logs based on search strings/regular expressions.
There are number of improvements regarding the performance of processing:
- Base64 decoder now has
sse4.2
and avx
backends
- Better internal caching of various ‘heavy’ objects
- Switching to a faster hash function
t1ha
- Enabled link time optimizations for the pre-built packages
- Bundled luajit 2.1 which has significant performance improvements to the provided Debian packages
Stability improvements and bug fixes
We constantly improve the stability of Rspamd and in this version we have fixed number of issues related to the graceful reload
. Historically, this command has very poor support and there were a number of issues related to memory leaks and corruptions that could occur during reload. In this release, we have fixed a lot of such issues, therefore, you can use reload
more safely now. We have also eliminated various issues related to unicode processing, Lua API, signals race conditions and other important problems found by Rspamd users.
2016-11-30 00:00:00 +0000
The next stable release 1.4.1 of Rspamd is available to download. This release includes various bugfixes and couple of new cool features. The most notable new feature is the Clickhouse plugin.
Clickhouse plugin
This plugin is intended to export scan data to the clickhouse column oriented database. This feature allows to perform very deep analysis of data and use advanced statistical tool to examine your mail flows and the efficiency of Rspamd. For example, you can find the most abused domains, the largest spam senders, the attachments statistics, URLs statistics and so on and so forth. The module documentation includes some samples of what you can do with this tool.
Universal maps
It was not very convenient that maps could only contain references to external resources. From the version 1.4.1
, you can also embed maps into the configuration to simplify small maps definitions:
map = ["elt1", "elt2" ...]; # Embedded map
map = "/some/file.map"; # External map
Lua modules debugging improvements
You can now specify lua modules in debug_modules
to investigate some concrete module without global debug being enabled
New rules
Steve Freegard has added a bunch of new rules useful for the actual spam trends, including such rules as:
- Freemail and disposable emails addresses
- Common Message-ID abuse
- Compromised hosts rules
- Rules for upstream services that have already run spam checks
- Commonly abused patterns in From, To and other headers
- Suspicious subjects
- MIME misusages
Multiple fixes to the ANN module
Neural networks has been fixed to work in a distributed environment. Couple of consistency bugs have been found and eliminated during Redis operations.
Other bugfixes
There are couple of other bugs and memory leaks that were fixed in this release. Please check the full release notes for details.
2016-11-21 00:00:00 +0000
Today, after 4 months of development, we’ve released major updates for both Rspamd and Rmilter: Rspamd is updated to version 1.4 and Rmilter is updated to version 1.10. These updates include many new features, including Redis pool support, new modules, improved neural networks support, zstd compression for protocol and many other important improvements.
Redis pool support
Rspamd now connects to Redis using a pool of persistent connections. This feature does not require any special setup and allows reuse of existing connections improving load profile for Redis instances. Enabling this feature allowed Rspamd to use Redis more extensively for different tasks.
New neural nets plugin
Neural nets plugin has been reworked to store both training vectors and neural nets in Redis. This change allows to use a single neural network for the whole cluster of Rspamd scanners improving thus both the quality of classification and the speed of training.
Bayes improvements
Some work has been performed to improve the Bayesian statistical classifier. Rspamd now uses more metadata to estimate ham/spam probability. You can read more about Bayes classifier in Rspamd compared to other spam filters here: https://rspamd.com/misc/2016/10/14/bayes-performance.html.
New Antivirus plugin
Rspamd can now check messages for viruses using Antivirus plugin. This module provides multiple features including:
- different antivirus types support:
ClamAV
, Sophos
and F-Prot
- support of custom patterns (e.g. experimental databases for
ClamAV
)
- support of caching for checks result
- support of attachments only mode to save AV resources
- whitelists, size limits and custom condition scripts
New MX check plugin
Rspamd can now verify MX
validity for scanned messages using the new MX check plugin. This plugin is useful for protecting from messages with invalid return paths.
Compression support in the protocol
Rmilter and Rspamd now support zstd compression. This algorithm is fast and efficient for reducing of network and CPU load when transferring data over the network. Zstd is also used to store large chunks of data in Redis (e.g. serialized neural nets).
Reworked model for DNS failures in SPF, DKIM and DMARC
Rspamd now has better understanding of temporary failures when performing DNS related checks, e.g. DKIM, DMARC or SPF. There are special symbols to represent both temporary and permanent errors for these plugins.
Adaptive & user-defined ratelimits
Ratelimit module now supports adaptive
ratelimits meaning that limits can be made stricter for new and/or bad reputation senders & more lenient for good reputation senders. Furthermore, ratelimits are now composable from keywords providing greater flexibility & user-defined keywords can be created with Lua functions to support custom requirements.
Monitored objects
There is a new concept in Rspamd: monitored resources. This means that Rspamd periodically check if some resource is still available and healthy. For example, this feature is enabled for RBLs and URIBLs. In this mode, Rspamd checks that the DNSBL is available and that it does not blacklist the world. If these checks fail, then a monitored resource is ignored for further checks.
Redis backend for fuzzy storage
It is now possible to store fuzzy hashes in Redis. This storage is more fast, scalable and more featureful than SQLite. rspamadm
utility can convert fuzzy hashes from SQLite storage to Redis using fuzzy_convert
tool.
Delhash support for fuzzy storage
You can now remove a specific hash from fuzzy storage without a message, you just need to find it in the logs and call rspamc fuzzy_delhash <hex>
. Multiple hashes can be specified for this command.
Metric exporter allows for periodically pushing Rspamd’s internal statistics to an external monitoring system (currently just Graphite is supported). Metadata exporter is a flexible mechanism for conditionally pushing user-defined message metadata to an external system (current backends are Redis Pub/Sub & HTTP).
Dynamic configuration in Redis
This feature is useful when you want to manage multiple instances of Rspamd centrally. Currently, dynamic configuration is limited to scores of symbols, actions and global enable/disable definitions for symbols only. In future, these functionality is planned to be extended.
Users settings in Redis
Users settings module now supports loading for users settings from Redis server. This is useful feature for dynamic configuration of users’ preferences without reloading of the whole bunch of settings.
Errors ring buffer
Rspamd logger now stores errors in a central ring buffer that contains information about the most recent errors occurred in all Rspamd processes. Controller worker can return this buffer as JSON when asking for /errors
path (this requires enable_password
).
Messages rework
It is now possible to have multiple messages when returning Rspamd reply, e.g.
{"messages": {"smtp_message": "Try again later"}}
Rmilter 1.10
also supports this to tell MTA some specific error message (e.g. ratelimit or greylisting).
Multiple updates to Rspamd Lua API
There are many new features in Rspamd Lua API:
rspamd_config:add_periodic(ev_base, 1.0, function(cfg, ev_base)
local logger = require "rspamd_logger"
i = i + 1
logger.infox(cfg, "periodic function, %s", i)
return false -- if return false, then the periodic event is removed
end, true)
on_load
and on_terminate
scripts
rspamd_config:add_on_load(function(cfg, ev_base, worker)
if worker:get_name() == 'normal' then
-- Do something
end
end)
local hash = require "rspamd_cryptobox_hash"
hash.create_specific('md5', 'string'):hex()
-- b45cffe084dd3d20d928bee85e7b0f21
- HTTPS support in
lua_http
- many improvements in ANN module, including batch training and threaded training
- zstd compression and decompression support has been added to
rspamd_util
Rules improvements
Various new rules to detect suspicious patterns; fixes to improve accuracy. Better HTML rules, fixed various bugs in DNS related services, namely, removed couple of untrusted DNSBLs (SORBS and UCEPROTECT).
WebUI improvements
There are many major improvements to the Rspamd Web Interface including the following:
- new symbols scores configuration tab:
- new last errors table in the history tab
- WebUI is now loaded on demand for each tab
- updated d3 graphs scripts
- the default passwords are now BANNED from using in WebUI
- read-only mode has been added to the interface
Conclusions
Rspamd 1.4 and Rmilter 1.10 are the current stable branches and all users are recommended to update their Rspamd versions. Please read the migration guide if you are unsure about the upgrade process.
2016-10-14 00:00:00 +0000
I have recently decided to compare Bayes classifier in Rspamd
with the closest analogues. I have tried 3 competitors:
Rspamd
(version 1.4 git master)
Bogofilter
- classical bayesian filter
Dspam
- the most advanced bayesian filter used by many projects and people
For Dspam
, I have tested both chain
and osb
tokenization modes. I have tried to test chi-square
probabilities combiner (since the same algorithm is used in Rspamd
), however, I could not make it working somehow.
Testing methodology
First of all, I have collected some corpus of messages with about 1k of spam messages and 1k of ham messages. All messages were carefully selected and manually checked. Then, I have written a small script that performs the following steps:
- Split corpus randomly into two equal parts with about 500 messages of Ham and Spam correspondingly.
- Learn bayes classifier using the desired spam filtering engine (
-d
for Dspam, -b
for Bogofilter).
- Use the rest of messages to test classifier after learning procedure.
- Use 95% confidence factor for
Rspamd
and Dspam
(e.g. when probability of spam is less than 95% then consider that a classifier is in undefined state, Bogofilter
, in turn, automatically provides 3 results: spam
, ham
, undefined
).
This script collects 6 main values for each classifier:
- Spam/Ham detection rate - number of messages that are correctly recognized as spam and ham
- Spam FP rate - number of false positives for Spam: HAM messages that are recognized as SPAM
- Ham FP rate - number of false positives for Ham: SPAM messages that are recognized as HAM
- Ham and Spam FN rate - number of messages that are not recognized as Ham or Spam (but not classified as the opposite class, meaning uncertainty for a classifier)
The worse error for a classifier is Spam False Positive, since it detects an innocent message as Spam. Ham FP and false negatives are more permissive: they just mean that you receive more spam than you want.
Results
The raw results are pasted at the following gist.
Here are the corresponding graphs for detection rate and errors for the competitors.
Conclusions
Rspamd
Bayes performs very well comparing to the competitors. It provides higher spam detection rate comparing to both Dspam
and Bogofilter
. All competitors demonstrated the common spam false positives rate. However, Dspam
is more aggressive in marking messages as Ham (which is not bad because Bayes is the only check Dspam
provides).
Rspamd
is also much faster in learning and testing. With Redis backend, it learns 1k messages in less than 5 seconds. Dspam
and Bogofilter
both require about 30 seconds to learn.
I have not included SpamAssassin
into the comparison since it uses naive Bayes classifier similar to Bogofilter
. Hence, it’s quality is very close to Bogofilter's
one.
Furthermore, unlike competitors, Rspamd
provides a lot of other checks and features. The goal of this particular benchmark was to compare merely Bayesian engines of different spam filters. To summarise, I can conclude that quality of Bayes classifier in Rspamd
is high enough to recommend it for using in the production environments or to replace Dspam
or Bogofilter
in your email system.