Rspamd Features

About Rspamd

Rspamd is an advanced spam filtering system that allows evaluation of messages by a number of rules including regular expressions, statistical analysis and custom services such as URL black lists. Each message is analysed by Rspamd and given a spam score.

According to this spam score and the user’s settings Rspamd recommends an action for the MTA to apply to the message: for example, to pass, to reject or to add a header. Rspamd is designed to process hundreds of messages per second simultaneously and has a number of features available.

You can watch the following introduction video from the FOSDEM-2016 where I describe the main features of Rspamd and explain why Rspamd runs so fast.

Unique features

  • Web interface. Rspamd is shipped with the fully functional Ajax-based web interface that allows to monitor and configure Rspamd rules, scores, dynamic lists, to scan and learn messages and to view the history of scans. The web interface is self-hosted, requires zero configuration and follows the recent web applications standards. You don’t need a web server or applications server to run web UI - you just need to run Rspamd itself and a web browser.

  • Integration with MTA. Rspamd can work with the most popular mail transfer systems, such as Postfix, Exim or Sendmail. For Postfix and Sendmail, there is an Rmilter project, whilst there are several solutions for Exim and OpenSMTPD to scan mail on Rspamd.

  • Extensive Lua API. Rspamd ships with hundreds of Lua functions that are helpful to create your own rules for efficient and targeted spam filtering.

  • Dynamic tables - it is possible to specify bulk lists as dynamic maps that are checked in runtime with updating data only when they are changed. Rspamd supports file, HTTP and HTTPS maps.

Content scan features

Content scan features are used to find certain patterns in messages, including text parts, headers and raw content. Content scan technologies are intended to filter the most common cases of spam messages and offer the static part of spam filtering. Rspamd supports various types of content scanning checks, such as:

  • Regular expression filtering offers basic processing of messages, their textual parts, MIME headers and SMTP data received by MTA against a set of expressions that includes both normal regular expressions and message processing functions. Rspamd regular expressions are a powerful tool that allows to filter messages based on some pre-defined rules. Rspamd can also use SpamAssassin regular expressions via plugin.

  • Fuzzy hashes are used by Rspamd to find similar messages. Unlike normal hashes, these structures are targeted to hide small differences between text patterns allowing to find common messages quickly. Rspamd has internal storage of such hashes and allows to block spam mass mails based on user’s feedback that specifies message reputation. Moreover, fuzzy storage allows to feed Rspamd with data from honeypots without polluting the statistical module. You can read more about it in the following document.

  • DCC is quite similar to the previous one but it uses the external service DCC to check if a message is a bulk message (that is sent to many recipients simultaneously).

  • Chartable module helps to find specially crafted messages that are intended to cheat spam filtering systems by switching the language of text and replacing letters with their analogues. Rspamd uses UTF-8 normalization to detect and filter such techniques commonly used by many spammers.

Policy check features

There are many resources that define policies for different objects in email transfer: for sender’s IP address, for URLs in a message and even for a message itself. For example, a message could be signed by sender using DKIM technology. Another example could be URL filtering: phishing checks or URL DNS blacklists - SURBL. Rspamd supports various policy checks:

  • SPF checks allow to validate a message’s sender using the policy defined in the DNS record of sender’s domain. You can read about SPF policies here. A number of mail systems support SPF, such as Gmail or Yahoo Mail.

  • DKIM policy validates a message’s cryptographic signature against a public key placed in the DNS record of sender’s domain. This method allows to ensure that a message has been received from the specified domain without altering on the path.

  • DMARC combines DKIM and SPF techniques to define more or less restrictive policies for certain domains. Rspamd can also store data for DMARC reports in Redis database.

  • Whitelists are used to avoid false positive hits for trusted domains that pass other checks, such as DKIM, SPF or DMARC. For example, we should not filter messages from PayPal if they are correctly signed with PayPal domain signature. On the other hand, if they are not signed and DMARC policy defines restrictive rules for DKIM, we should mark this message as spam as it is potentially phishing. Whitelist module provides different modes to perform policy matching and whitelisting or blacklisting of certain combinations of verification results.

  • DNS lists allows to estimate reputation of sender’s IP address or network. Rspamd uses a number of DNS lists including such lists as SORBS or SpamHaus. However, Rspamd doesn’t trust ultimately any specific DNS list and does not reject mail based just on this factor. Rspamd also uses white and grey DNS lists to avoid false positive spam hits.

  • URL lists are rather similar to DNS black lists but uses URLs in a message to fight spam and phishing. Rspamd has full embedded support of the most popular SURBL lists, such as URIBL and SURBL from SpamHaus.

  • Phishing checks are extremely useful to filter phishing messages and protect users from cyber attacks. Rspamd uses sophisticated algorithms to find phished URLs and supports the popular URL redirectors (for example, http://t.co) to avoid false positive hits. Popular phishing databases, such as OpenPhish and PhishTank are also supported.

  • Rate limits allow to prevent mass mails to be sent from your own hacked users. This is an extremely useful feature to protect both inbound and outbound mail flows.

  • Greylisting is a common method to introduce delay for suspicious messages, as many spammers do not use the fully functional SMTP servers that allow to queue delayed messages. Rspamd implements greylisting internally and can delay messages that has a score higher than certain threshold.

  • Replies module is intended to whitelist messages that are reply to our own messages as these messages are likely important for users and false positives are highly undesirable for them.

  • Maps module provides a Swiss Knife alike tool that could filter messages based on different attributes: headers, envelope data, sender’s IP and so on. This module is very useful for building custom rules.

Statistical tools

Statistical approach includes many useful spam recognition techniques that can learn dynamically from messages being scanned. Rspamd provides different tools that could be learned either manually or automatically and adopt for the actual mail flow.

  • Bayes classifier is a tool to classify spam and ham messages. Rspamd uses an advanced algorithm of statistical tokens generation that achieves better results than traditionally used ones (e.g. in SpamAssassin) that is described in details in the following paper.

  • Neural network learns from scan results and allows to improve the final score by finding some common patterns of rules that are typical for either spam or ham messages. This module is especially useful for large email systems as it can learn from your own rules and adopt quickly for spam mass mailings.