2015-05-15 00:00:00 +0000
After almost half a year of development we are intorducing rspamd 0.9 which is the next major version of rspamd. You can view the full list of changes in the
ChangeLog file. But here is the list of the most notable changes introduced in this version:
- Improved optimizations via abstract syntax tree for all expressions (my presentation describes some basic principles of optimizations.
- Switched to luajit and pcre jit by default. JIT compilation allowed to improve the performance in the bottlenecks so now rspamd is significantly faster than 0.8 branch.
- Added spamassassin rules support: you can now use the most of your spamassassin rules in rspamd natively. Of course, they are optimized with JIT and AST techniques.
- Added encryption support: rspamd now can encrypt all traffic with extremely fast and low latency encryption based on public key cryptography and cryptobox construction.
- New aho-corasic implementation has been imported. Now rspamd can search for hundreds thousands of search patterns in almost linear time.
- New statistics architecture:
- advanced tokenization techniques (secure or fast hashes);
- improved UTF8 tokenization;
- avoid multiple learning by the same message by maintaining learning cache;
- improved features normalization to reduce false positives rate.
Moreover, this release contains a lot of other improvements to plugins, lua API, rspamd core and the build system. In fact, the most of rspamd codebase has been either reworked or completely rewritten
to improve the architecture, performance and stability.
2015-03-04 00:00:00 +0000
We are proud to announce that rspamd is accepted by the Google Summer of Code program.
The list of ideas, possible mentors and other useful stuff is placed on the ideas page. We encourage prospective students to apply and help us to make rspamd better with funding generously provided by Google.
2015-01-02 00:00:00 +0000
Today I have released the next major version of rspamd - 0.8.0. The main difference from the 0.7 branch is completely reworked fuzzy storage.
I have switched the storage itself from own memory-based hashes structure to the sqlite3 database and redesigned protocol
for future extensions and new features. At the same time, I have preserved backward compatibility with previous rspamd versions, therefore
no specific upgrade movements are needed. Moreover, the conversion of an old database format is performed automatically and does not require
special attention as well.
Fuzzy check plugin has been reworked accordingly. First of all, I have changed the algorithm of fuzzy hashes to the
probabilistic shingles algorithm. It is blazingly fast and still rather accurate to find close texts in the database. Secondly,
I have added the normalization algorithm for the target language using snowball lemmatizer. It allows to remove grammar forms and check merely
the first forms of all words in a text improving thus fuzzy matching quality by removing meaningless parts.
Rspamd 0.8 is heavily tested in production environments and I consider it as a production ready release. No manual migration is required for rspamd 0.7 users, however, if you use
more old versions of rspamd then you should check the migration guide.
As usually, please feel free to ask any questions in rspamd mailing list or IRC discussion channel (#rspamd at OFTC).
2014-09-11 00:00:00 +0000
After a year of development I’m proud to present you the new major release of rspamd - 0.7.0. This is the first release of 0.7
branch and it includes a lot of improvement and reorganization.
I have added the document that describes migration from Rspamd 0.6 to Rspamd 0.7.
Unfortunately, due to the poor design of the LUA
API used in the old rspamd versions, several incompatibilities are introduced. Please consult with the migration document that describes how to deal
with those incompatibilities.
Rspamd web interface is finally a part of rspamd package. Moreover, you no longer need an HTTP server to serve its files - rspamd can do it natively. Of course, it is not a good idea to open
web UI to the Internet as this UI is designed to manage rspamd from the protected or internal network. However, in the future versions of rspamd this could be changed.
Rspamd 0.7 contains a lot of improvements in terms of performance and the quality of spam filtering. The internal structure of the rspamd project has changed a lot. Nevertheless, I tried to
keep the backward compatibility as much as possible. For example, despite of migration to the HTTP for all communications rspamd still supports legacy rspamc
protocol.
Rspamd CLI client rspamc
has been improved as well. It now uses HTTP protocol and works in non-blocking mode allowing multiple simultaneous connections. Moreover, it now can output machine
readable output with flags --ucl
or --json
for UCL and JSON outputs accordingly.
Please feel free to ask any questions in rspamd mailing list or IRC discussion channel. You could find their’s credits at https://rspamd.com/support.html.
2013-09-07 00:00:00 +0000
After long live with XML format I’ve finally decided to improve the configuration
system to avoid various issues related to the configuration extending and readability.
In this post I try to describe the main features and principles of the configuration
language which I’ve called RCL
- rspamd configuration language.
Basic structure
RCL is heavily infused by nginx
configuration as the example of a convenient configuration
system. However, RCL is fully compatible with JSON
format and is able to parse json files.
For example, you can write the same configuration in the following ways:
param = value;
section {
param = value;
param1 = value1;
flag = true;
number = 10k;
time = 0.2s;
string = "something";
subsection {
host = {
host = "hostname";
port = 900;
}
host = {
host = "hostname";
port = 901;
}
}
}
{
"param": "value",
"param1": "value1",
"flag": true,
"subsection": {
"host": [
{
"host": "hostname",
"port": 900
},
{
"host": "hostname",
"port": 901
}
]
}
}
Improvements to the json notation.
There are various things that makes json parsing more convenient for editing:
- Braces are not necessary to enclose the top object: it is automatically treated as object:
is the equivalent to:
- There is no requirement of quotes for strings and keys, moreover,
:
sign may be replaced with =
sign or even skipped for objects:
key = value;
section {
key = value;
}
is the equivalent to:
{
"key": "value",
"section": {
"key": "value"
}
}
- No commas mess: you can safely place a comma or semicolon for the last element in array or object:
{
"key1": "value",
"key2": "value",
}
- Non-unique keys in an object are allowed and automatically converted to the arrays internally:
{
"key": "value1",
"key": "value2"
}
is converted to:
{
"key": ["value1", "value2"]
}
- Numbers can have suffixes to specify standard multipliers:
[kKmMgG]
- standard 10 base multipliers (so 1k
is translated to 1000)
[kKmMgG]b
- 2 power multipliers (so 1kb
is translated to 1024)
[s|min|d|w|y]
- time multipliers, all time values are translated to float number of seconds, for example 10min
is translated to 3600.0 and 10ms
is translated to 0.01
- Booleans can be specified as
true
or yes
or on
and false
or no
or off
.
- It is still possible to treat numbers and booleans as strings by enclosing them in double quotes.
General improvements
RCL supports different style of comments:
- single line:
#
or //
- multiline:
/* ... */
Multiline comments may be nested:
# Sample single line comment
/*
some comment
/* nested comment */
end of comment
*/
RCL supports external macroes both multiline and single line ones:
.macro "sometext";
.macro {
Some long text
....
};
There are two internal macroes provided by RCL:
include
- read a file /path/to/file
or an url http://example.com/file
and include it to the current place of
RCL configuration;
includes
- read a file or an url like the previous macro, but fetch and check the signature file (which is obtained
by .sig
suffix appending).
Public key (or keys) used for the last command are specified by the concrete RCL user (by rspamd for example).
Emitter
Each RCL object can be serialized to one of the three supported formats:
JSON
- canonic json notation (with spaces indented structure);
Compacted JSON
- compact json notation (without spaces or newlines);
Configuration
- nginx like notation.
Conclusion
RCL has clear design that should be very convenient for reading and writing. At the same time it is compatible with
JSON language and therefore can be used as a simple JSON parser. Macroes logic provides an ability to extend configuration
language (for example by including some lua code) and comments allows to disable or enable the parts of a configuration
quickly. Rspamd 0.6.0 will be the first version with RCL configuration. It will be possible to convert the existing XML configuration
to RCL one by rspamd itself.