Rspamd

Fast, free and open-source spam filtering system.

Rspamd 1.5 has been released

01 Mar 2017

We are pleased to announce the new major Rspamd release 1.5 today. This release incudes a lot of major reworks, new cool features and a significant number of bugs being fixed. The update from the previous versions shouldn’t be hard, however, please check the migration document to be sure that the new version will not break the existing configuration.

Here is a list of major changes for this version.

New MIME parser

Rspamd has used the GMime library for a very long time but we decided to switch to from it for several reasons.

The main problem is that Rspamd requires very precise control of MIME parsing as it has to deal with broken messages not for displaying purposes but for extracting data from them. This procedure has some simplifications and some complications comparing to a generic MIME parser, such as GMime: for example, we do not need to support streaming mode but we have to deal with many non-standard messages that are intended to be parsed incorrectly by some adversary side, e.g. spammers. The current architecture is described here: https://gist.github.com/vstakhov/937f253d5935ee4158688932589b1dcc

Through use of the new parser, Rspamd can now deal with the following messages:

  1. Messages with redundant Content-Type headers:
Content-Type: text/plain
Content-Type: multipart/alternative

Currently, Rspamd always prefers multipart types over plain types and text types unless there is not specific binary type (e.g. if there is text/plain and application/octet-stream)

  1. Messages with broken multipart structures:
    • new parts after closing boundary (e.g. attachment in multipart/mixed after the closing part)
    • incorrect inheritance
    • incorrect multipart type (now Rspamd just ignores the exact multipart/* type)
  2. Filenames that are badly encoded (non-utf8)
  3. Incorrect Content-Transfer-Encoding (now heuristic based):
    • 8bit when content is Base64
    • base64 or qp when content is 8bit
  4. Bad Content-Type, e.g. text
  5. Messages with no headers or messages with no body
  6. Messages with mixed newlines in headers and/or body

Switching from libiconv to libicu

Rspamd has switched charset conversion from libiconv to libicu. This allowed to speed up the conversion time since libicu is much faster (~100MB of text from windows-1251 to utf-8):

0,83s user 0,08s system 98% cpu 0,921 total - iconv
0,36s user 0,07s system 95% cpu 0,450 total - libicu

Furthermore, switching to libicu allowed for implementation of many useful features:

  • heuristic charset detection (NGramms for 1byte charsets);
  • visual obfuscation detector (e.g. google.com -> gооgle.com)
  • better IDNA processing
  • better unicode manipulation

WebUI rework

The Web interface has been reworked for better representation and configuration:

  • The web interface now supports displaying & aggregating statistics from a cluster of Rspamd machines
  • The internal structure of the Web Interface has changed to a set of modules so that new features could be implemented without touching the overall logic
  • The throughput graph has been improved and now displays a small pie chart for the specified time range

Lua TCP module rework

In Rspamd 1.5 Lua TCP module now supports complex protocols with dialogs and states similar to AnyEvent module in Perl. For example, it is now possible to set a reaction for each communication stage and perform full SMTP or IMAP dialog.

URL redirector module

URL shorteners and redirectors are part of the modern email ecosystem and they are widely used in many emails, both legitimate and not (e.g. in Spam and Phishing). Rspamd has an old and outdated utility service that is intended to resolve such redirects called redirector.pl. It is written in Perl and hasn’t been updated for a long time. It has a long dependencies list and performs a lot of unnecessary tasks. In Rspamd 1.5, there is a new lightweight lua redirector module which is intented to resolve URLs redirect in a more efficient and simple way. Dereferenced links are processed by SURBL module and added as tags for other modules. Redis is used for caching. This module is not enabled by default so far, but it can easily be enabled by placing redirector_hosts_map = "/etc/rspamd/redirectors.inc"; in /etc/rspamd/local.d/surbl.conf.

Rmilter headers module

The Rmilter headers module provides an easy way to add common headers; support is available for Authentication-Results, SpamAssassin-compatible headers and user-defined headers among others.

DKIM signing module

The DKIM signing module provides a simple policy-based approach to DKIM signing similar to Rmilter. It supports multiple cool features, for example, you can now store your DKIM keys in Redis.

Force actions module

The Force actions module provides a way to force actions for messages based on flexible conditions (an expression consisting of symbols to verify presence/absence of & the already-assigned action of a message), optionally setting SMTP messages & rewritten subjects.

Reworked & improved metadata exporter

Configuration of this module has been reworked to provide more flexible operation & library functions have been added to provide JSON-formatted general message metadata, e-Mail alerts and more - making this module readily useful for quarantines, logging & alerting.

URL tags plugin

URLs can now be assigned tags and it is the job of the URL tags plugin to persist these in Redis for a period of time; which could be used to avoid redundant checks.

URL reputation plugin

The URL reputation plugin filters URLs for relevance and assigns dynamic reputation to selected TLDs which is persisted in Redis.

Multimap ‘received’ maps

Now multimap can be used to match information extracted from Received headers (which could be filtered based on their position in the message). It is also possible to use SMTP HELO messages in maps for this module. There are also new URL filters, SMTP message setup depending on map data and the ability to skip archives checks for certain filetypes or maps.

Changes in RBL module

Support has been added for using hashes in email and helo RBLs (so that information which can’t be represented in a DNS record could be queried).

Support for Avira SAVAPI in antivirus module

Rspamd antivirus module now supports AVIRA antivirus. This code has been contributed by Christian Rößner.

Neural net plugin improvements

We have fixed couple of issues in the neural network plugin allowing to have multiple configurations in the cluster. We have also fixed couple of issues with storing and loading of learning vectors especially in errors handling paths. New metatokens have been added to improve neural network classification quality.

Fuzzy matching for images

Rspamd fuzzy hashing now support matching of the images attached to emails checked. To enable this feature, Rspamd should be built with libgd support (provided by the pre-built packages). However, this feature is not currently enabled by default as it seems to be too aggressive when used in conjunction with large fuzzy storages producing a lot of false positive hits.

New rules

There are couple of new rules added to Rspamd 1.5:

  • OMOGRAPH_URL: detects visually confusable URLs
  • FROM_NAME_HAS_TITLE: fixed title match
  • Add REPLYTO_EMAIL_HAS_TITLE rule
  • Add FROM_NAME_EXCESS_SPACE rule

Rspamadm grep

A grep-like tool inspired by exigrep has been added to rspamadm- see rspamadm grep --help for usage information: this provides a convenient way to produce logically collated logs based on search strings/regular expressions.

Performance improvements

There are number of improvements regarding the performance of processing:

  • Base64 decoder now has sse4.2 and avx backends
  • Better internal caching of various ‘heavy’ objects
  • Switching to a faster hash function t1ha
  • Enabled link time optimizations for the pre-built packages
  • Bundled luajit 2.1 which has significant performance improvements to the provided Debian packages

Stability improvements and bug fixes

We constantly improve the stability of Rspamd and in this version we have fixed number of issues related to the graceful reload. Historically, this command has very poor support and there were a number of issues related to memory leaks and corruptions that could occur during reload. In this release, we have fixed a lot of such issues, therefore, you can use reload more safely now. We have also eliminated various issues related to unicode processing, Lua API, signals race conditions and other important problems found by Rspamd users.