Rspamd 0.9 has been released

2015-05-15 00:00:00 +0000

After almost half a year of development we are intorducing rspamd 0.9 which is the next major version of rspamd. You can view the full list of changes in the ChangeLog file. But here is the list of the most notable changes introduced in this version:

  • Improved optimizations via abstract syntax tree for all expressions (my presentation describes some basic principles of optimizations.
  • Switched to luajit and pcre jit by default. JIT compilation allowed to improve the performance in the bottlenecks so now rspamd is significantly faster than 0.8 branch.
  • Added spamassassin rules support: you can now use the most of your spamassassin rules in rspamd natively. Of course, they are optimized with JIT and AST techniques.
  • Added encryption support: rspamd now can encrypt all traffic with extremely fast and low latency encryption based on public key cryptography and cryptobox construction.
  • New aho-corasic implementation has been imported. Now rspamd can search for hundreds thousands of search patterns in almost linear time.
  • New statistics architecture:
    • advanced tokenization techniques (secure or fast hashes);
    • improved UTF8 tokenization;
    • avoid multiple learning by the same message by maintaining learning cache;
    • improved features normalization to reduce false positives rate.

Moreover, this release contains a lot of other improvements to plugins, lua API, rspamd core and the build system. In fact, the most of rspamd codebase has been either reworked or completely rewritten to improve the architecture, performance and stability.

Rspamd is accepted for the Google Summer of Code 2015

2015-03-04 00:00:00 +0000

We are proud to announce that rspamd is accepted by the Google Summer of Code program. The list of ideas, possible mentors and other useful stuff is placed on the ideas page. We encourage prospective students to apply and help us to make rspamd better with funding generously provided by Google.

Rspamd 0.8.0 released

2015-01-02 00:00:00 +0000

Today I have released the next major version of rspamd - 0.8.0. The main difference from the 0.7 branch is completely reworked fuzzy storage. I have switched the storage itself from own memory-based hashes structure to the sqlite3 database and redesigned protocol for future extensions and new features. At the same time, I have preserved backward compatibility with previous rspamd versions, therefore no specific upgrade movements are needed. Moreover, the conversion of an old database format is performed automatically and does not require special attention as well.

Fuzzy check plugin has been reworked accordingly. First of all, I have changed the algorithm of fuzzy hashes to the probabilistic shingles algorithm. It is blazingly fast and still rather accurate to find close texts in the database. Secondly, I have added the normalization algorithm for the target language using snowball lemmatizer. It allows to remove grammar forms and check merely the first forms of all words in a text improving thus fuzzy matching quality by removing meaningless parts.

Rspamd 0.8 is heavily tested in production environments and I consider it as a production ready release. No manual migration is required for rspamd 0.7 users, however, if you use more old versions of rspamd then you should check the migration guide.

As usually, please feel free to ask any questions in rspamd mailing list or IRC discussion channel (#rspamd at OFTC).

Rspamd 0.7.0 released

2014-09-11 00:00:00 +0000

After a year of development I’m proud to present you the new major release of rspamd - 0.7.0. This is the first release of 0.7 branch and it includes a lot of improvement and reorganization. I have added the document that describes migration from Rspamd 0.6 to Rspamd 0.7. Unfortunately, due to the poor design of the LUA API used in the old rspamd versions, several incompatibilities are introduced. Please consult with the migration document that describes how to deal with those incompatibilities.

Rspamd web interface is finally a part of rspamd package. Moreover, you no longer need an HTTP server to serve its files - rspamd can do it natively. Of course, it is not a good idea to open web UI to the Internet as this UI is designed to manage rspamd from the protected or internal network. However, in the future versions of rspamd this could be changed.

Rspamd 0.7 contains a lot of improvements in terms of performance and the quality of spam filtering. The internal structure of the rspamd project has changed a lot. Nevertheless, I tried to keep the backward compatibility as much as possible. For example, despite of migration to the HTTP for all communications rspamd still supports legacy rspamc protocol.

Rspamd CLI client rspamc has been improved as well. It now uses HTTP protocol and works in non-blocking mode allowing multiple simultaneous connections. Moreover, it now can output machine readable output with flags --ucl or --json for UCL and JSON outputs accordingly.

Please feel free to ask any questions in rspamd mailing list or IRC discussion channel. You could find their’s credits at https://rspamd.com/support.html.

New configuration format for rspamd

2013-09-07 00:00:00 +0000

After long live with XML format I’ve finally decided to improve the configuration system to avoid various issues related to the configuration extending and readability. In this post I try to describe the main features and principles of the configuration language which I’ve called RCL - rspamd configuration language.

Basic structure

RCL is heavily infused by nginx configuration as the example of a convenient configuration system. However, RCL is fully compatible with JSON format and is able to parse json files. For example, you can write the same configuration in the following ways:

  • in nginx like:
param = value;
section {
	param = value;
	param1 = value1;
	flag = true;
	number = 10k;
	time = 0.2s;
	string = "something";
	subsection {
		host = {
			host = "hostname"; 
			port = 900;
		}	
		host = {
			host = "hostname";
			port = 901;
		}	
	}
}
  • or in JSON:
{
	"param": "value",
	"param1": "value1",
	"flag": true,
	"subsection": {
		"host": [
			{	
				"host": "hostname",
				"port": 900
			},
			{
				"host": "hostname",
				"port": 901
			}
		]
	}
}

Improvements to the json notation.

There are various things that makes json parsing more convenient for editing:

  • Braces are not necessary to enclose the top object: it is automatically treated as object:
"key": "value"

is the equivalent to:

{"key": "value"}
  • There is no requirement of quotes for strings and keys, moreover, : sign may be replaced with = sign or even skipped for objects:
key = value;
section {
	key = value;
}

is the equivalent to:

{
	"key": "value",
	"section": {
		"key": "value"
	}
}
  • No commas mess: you can safely place a comma or semicolon for the last element in array or object:
{
	"key1": "value",
	"key2": "value",
}
  • Non-unique keys in an object are allowed and automatically converted to the arrays internally:
{
	"key": "value1",
	"key": "value2"
}

is converted to:

{
		"key": ["value1", "value2"]
}
  • Numbers can have suffixes to specify standard multipliers:
    • [kKmMgG] - standard 10 base multipliers (so 1k is translated to 1000)
    • [kKmMgG]b - 2 power multipliers (so 1kb is translated to 1024)
    • [s|min|d|w|y] - time multipliers, all time values are translated to float number of seconds, for example 10min is translated to 3600.0 and 10ms is translated to 0.01
  • Booleans can be specified as true or yes or on and false or no or off.
  • It is still possible to treat numbers and booleans as strings by enclosing them in double quotes.

General improvements

RCL supports different style of comments:

  • single line: # or //
  • multiline: /* ... */

Multiline comments may be nested:

# Sample single line comment
/* 
 some comment
 /* nested comment */
 end of comment
*/

RCL supports external macroes both multiline and single line ones:

.macro "sometext";
.macro {
	Some long text
	....
};

There are two internal macroes provided by RCL:

  • include - read a file /path/to/file or an url http://example.com/file and include it to the current place of RCL configuration;
  • includes - read a file or an url like the previous macro, but fetch and check the signature file (which is obtained by .sig suffix appending).

Public key (or keys) used for the last command are specified by the concrete RCL user (by rspamd for example).

Emitter

Each RCL object can be serialized to one of the three supported formats:

  • JSON - canonic json notation (with spaces indented structure);
  • Compacted JSON - compact json notation (without spaces or newlines);
  • Configuration - nginx like notation.

Conclusion

RCL has clear design that should be very convenient for reading and writing. At the same time it is compatible with JSON language and therefore can be used as a simple JSON parser. Macroes logic provides an ability to extend configuration language (for example by including some lua code) and comments allows to disable or enable the parts of a configuration quickly. Rspamd 0.6.0 will be the first version with RCL configuration. It will be possible to convert the existing XML configuration to RCL one by rspamd itself.