Today I have released the next major version of rspamd - 0.8.0. The main difference from the 0.7 branch is completely reworked fuzzy storage.
I have switched the storage itself from own memory-based hashes structure to the sqlite3 database and redesigned protocol
for future extensions and new features. At the same time, I have preserved backward compatibility with previous rspamd versions, therefore
no specific upgrade movements are needed. Moreover, the conversion of an old database format is performed automatically and does not require
special attention as well.
Fuzzy check plugin has been reworked accordingly. First of all, I have changed the algorithm of fuzzy hashes to the
probabilistic shingles algorithm. It is blazingly fast and still rather accurate to find close texts in the database. Secondly,
I have added the normalization algorithm for the target language using snowball lemmatizer. It allows to remove grammar forms and check merely
the first forms of all words in a text improving thus fuzzy matching quality by removing meaningless parts.
Rspamd 0.8 is heavily tested in production environments and I consider it as a production ready release. No manual migration is required for rspamd 0.7 users, however, if you use
more old versions of rspamd then you should check the migration guide.
As usually, please feel free to ask any questions in rspamd mailing list or IRC discussion channel (#rspamd at freenode).
After a year of development I’m proud to present you the new major release of rspamd - 0.7.0. This is the first release of 0.7 branch and it includes a lot of improvement and reorganization.
I have added the document that describes migration from rspamd 0.6 to rspamd 0.7: https://rspamd.com/doc/migration.html
Unfortunately, due to the poor design of the LUA API used in the old rspamd versions, several incompatibilities are introduced. Please consult with the migration document that describes how to deal
with those incompatibilities.
Rspamd web interface is finally a part of rspamd package. Moreover, you no longer need an HTTP server to serve its files - rspamd can do it natively. Of course, it is not a good idea to open
web UI to the Internet as this UI is designed to manage rspamd from the protected or internal network. However, in the future versions of rspamd this could be changed.
Rspamd 0.7 contains a lot of improvements in terms of performance and the quality of spam filtering. The internal structure of the rspamd project has changed a lot. Nevertheless, I tried to
keep the backward compatibility as much as possible. For example, despite of migration to the HTTP for all communications rspamd still supports legacy rspamc protocol.
Rspamd CLI client rspamc has been improved as well. It now uses HTTP protocol and works in non-blocking mode allowing multiple simultaneous connections. Moreover, it now can output machine
readable output with flags --ucl or --json for UCL and JSON outputs accordingly.
After long live with XML format I’ve finally decided to improve the configuration
system to avoid various issues related to the configuration extending and readability.
In this post I try to describe the main features and principles of the configuration
language which I’ve called RCL - rspamd configuration language.
RCL is heavily infused by nginx configuration as the example of a convenient configuration
system. However, RCL is fully compatible with JSON format and is able to parse json files.
For example, you can write the same configuration in the following ways:
in nginx like:
or in JSON:
Improvements to the json notation.
There are various things that makes json parsing more convenient for editing:
Braces are not necessary to enclose the top object: it is automatically treated as object:
is the equivalent to:
There is no requirement of quotes for strings and keys, moreover, : sign may be replaced with = sign or even skipped for objects:
is the equivalent to:
No commas mess: you can safely place a comma or semicolon for the last element in array or object:
Non-unique keys in an object are allowed and automatically converted to the arrays internally:
is converted to:
Numbers can have suffixes to specify standard multipliers:
[kKmMgG] - standard 10 base multipliers (so 1k is translated to 1000)
[kKmMgG]b - 2 power multipliers (so 1kb is translated to 1024)
[s|min|d|w|y] - time multipliers, all time values are translated to float number of seconds, for example 10min is translated to 3600.0 and 10ms is translated to 0.01
Booleans can be specified as true or yes or on and false or no or off.
It is still possible to treat numbers and booleans as strings by enclosing them in double quotes.
RCL supports different style of comments:
single line: # or //
multiline: /* ... */
Multiline comments may be nested:
RCL supports external macroes both multiline and single line ones:
There are two internal macroes provided by RCL:
include - read a file /path/to/file or an url http://example.com/file and include it to the current place of
includes - read a file or an url like the previous macro, but fetch and check the signature file (which is obtained
by .sig suffix appending).
Public key (or keys) used for the last command are specified by the concrete RCL user (by rspamd for example).
Each RCL object can be serialized to one of the three supported formats:
Compacted JSON - compact json notation (without spaces or newlines);
Configuration - nginx like notation.
RCL has clear design that should be very convenient for reading and writing. At the same time it is compatible with
JSON language and therefore can be used as a simple JSON parser. Macroes logic provides an ability to extend configuration
language (for example by including some lua code) and comments allows to disable or enable the parts of a configuration
quickly. Rspamd 0.6.0 will be the first version with RCL configuration. It will be possible to convert the existing XML configuration
to RCL one by rspamd itself.