Rspamd 2.5 has been released

2020-04-01 00:00:00 +0000

We have released Rspamd 2.5 today.

You can find that the first start of Rspamd will take more time than usually (around 10-30 seconds). That is intended as Rspamd has to recompile some of the url pattern matchers. It will not happen on subsequent restarts nor to the subsequent updates. It should also not happen if you have used Release Candidates for this version.

There are 3 major projects in this release:

  • URL extraction rework: the URL extraction logic in Rspamd has been significantly reworked to provide better DoS resistance, better matching and lower false positives.
  • URL structure update: the URL storage structures have been reworked to occupy less memory and use a more efficient storage
  • Hyperscan early load: since PCRE is a backtracking RE engine it is not very safe to use in conjunction with complicated rules and untrusted data, in this release, Rspamd will load hyperscan database on early stages to avoid PCRE fallback if possible

Several major fixes:

  • Base64 detection has been fixed and improved to reduce FP rate
  • Query urls are now fully processed
  • Bundled libev has been updated to 4.33 (fixing many issues with FD closing race conditions)
  • Fixed ANN normalisation
  • Fixed redis backend leaks

Useful features:

  • Added whitelisted_signers_map in ARC module
  • Implemented /etc/hosts files processing

Here is the list of the most important changes:

  • [Conf] Mark Rspamd emailbl as ignore whitelist
  • [Conf] RBL: Add missing emails = true option
  • [Feature] Add support for scripts in fuzzy storage
  • [Feature] Arc: Add whitelisted_signers_map option
  • [Feature] Implement hosts file processing
  • [Feature] Neural: Introduce classes bias that allows non-equal classes learning
  • [Feature] Update libev to 4.33
  • [Fix] Another brain damage html standard adoptions
  • [Fix] Another fix for brain damaged obs-fws state
  • [Fix] Fix flags that caused force_actions failure
  • [Fix] Fix logging issue
  • [Fix] Fix lua symbols scores registration when config does not define scores
  • [Fix] Fix opaque maps logic
  • [Fix] Fix parsing of the html tags with no spaces after attributes
  • [Fix] Fix some corner cases in urls parsing, add limits
  • [Fix] Fix tlds extraction if custom composition rules are used
  • [Fix] Fix variables replacement in mempool
  • [Fix] Improve base64 detection
  • [Fix] Normalize dynamic scores in ANN correctly
  • [Fix] Plug memory leak introduced by #3153
  • [Fix] Stat_redis_backend: Fix memory leak and simplify learn path
  • [Fix] Try hard to deal with ghost workers
  • [Fix] metadata_exporter default formatter
  • [Rework] Change the way to extract URLs when dealing with alternative parts
  • [Rework] Fix various url extraction issues
  • [Rework] Re cache: Load compiled hyperscan in the main process as well
  • [Rework] Re cache: Load hyperscan early
  • [Rework] Rework URL structure: adjust tld part
  • [Rework] Rework URL structure: host field
  • [Rework] Rework URL structure: more structure optimisations
  • [Rework] Rework URL structure: user field
  • [Rework] URL: Another update for urls extraction logic
  • [Rework] Urls: Improve query urls handling
  • [Rework] Urls: adopt html related stuff
  • [Rework] Urls: more rework of the urls sets
  • [Rework] Urls: process query urls in HTML urls correctly
  • [Rework] Urls: rework urls hash structure
  • [Rework] Urls: update lua libraries
  • [Rework] Use multiple search tries for different url extraction types

Rspamd 2.4 has been released

2020-02-26 00:00:00 +0000

We have released Rspamd 2.4 today.

This is a bug fixes release mainly.

3 major projects in this release:

  • Logger system rework: fixed syslog logging, improved architecture, improved logging reload
  • URL composition library (similar to old 2tld map for surbl module), use this library in RBL module for all URL like objects: urls, emails, dkim domains
  • Implemented SSL client caching: it should improve client SSL connections: Clickhouse, SMTPS, maps check and so on for both client and a server.

Several major fixes:

  • Parsing of the content type attributes
  • Avoid collisions in mempool variables
  • Fixed Redis Sentinel support
  • Fixed IPv6 listening
  • Fixed mime modifications for 7bit parts
  • Fixed passthrough result and smtp message
  • Important eSLD url composition fixes
  • Various neural network plugin fixes

Useful features:

  • Custom additional columns in Clickhouse plugin
  • Support of CDB maps everywhere to share huge maps across workers with no extra cost

Here is the list of the most important changes:

  • [CritFix] Fix parsing of the content type attributes
  • [Feature] Clickhouse: Add extra columns support
  • [Feature] Rbl: Add url_compose_map option for RBL rules
  • [Fix] ‘R’ flag is for all headers regexp
  • [Fix] Allow to reset settings id from Lua (e.g. because of the priority)
  • [Fix] Avoid collisions in mempool variables by changing fuzzy caching logic
  • [Fix] Avoid strdup usage for symbols options
  • [Fix] Do not trust stat(2) it lies
  • [Fix] Filter all options for symbols to have sane characters
  • [Fix] Fix all headers iteration
  • [Fix] Fix allowed_settings for neural
  • [Fix] Fix listen socket parsing
  • [Fix] Fix maps expressions evaluation
  • [Fix] Fix sentinel connections leak by using async connections
  • [Fix] Fix smtp message on passthrough result
  • [Fix] Fix tld compositon rules
  • [Fix] Fuzzy_storage: Do not check for shingles if a direct hash has been found
  • [Fix] Lua_mime: Do not perform QP encoding for 7bit parts
  • [Fix] Neural: Distinguish missing symbols from symbols with low scores
  • [Fix] Support listening on systemd sockets by name
  • [Project] Add lua_urls_compose library
  • [Project] Allow to set a custom log function to the logger
  • [Project] CDB maps: Start making cdb a first class citizen
  • [Project] Clickhouse: Add extra columns concept
  • [Project] Fix urls composition rules, add unit tests
  • [Project] Unify cdb maps
  • [Rework] Logger infrastructure rework
  • [Rework] Refactor libraries structure
  • [Rework] Rework SSL caching
  • [Rework] Update snowball stemmer to 2.0 and remove all crap aside of UTF8

Rspamd 2.3 has been released

2020-02-04 00:00:00 +0000

We have released Rspamd 2.3 today.

This release has various improvements, pdf parsing support and new SPF plugin. Numerous of the bug fixes, including some critical ones have also been applied during this release cycle.

Here is the list of the most important changes.

Lua content library

This library is a part of content scanning project that enables Rspamd to process content formats that are commonly met in email. In this release, we have added some preliminary PDF files support.

Rspamd supports the following features so far:

  • URLs extraction from PDF files
  • JavaScript extraction from actions
  • Using of fuzzy hashes storage to store suspicious js extracted from PDF files
  • Couple of rules related to bad PDF files (e.g. PDF_ENCRYPTED)

This library is written in pure Lua, using LPEG and Hyperscan under the hood to provide memory safety and high processing speed.

SPF plugin revamped

SPF plugin has been one of the oldest plugins in Rspamd. Unfortunately due to the fact that it has been written in C, it was very hard to add new functions to this plugin nor maintain it in the consistent state. In this release, SPF plugin has been rewritten in Lua providing configuration compatible mode with some of new features, such as support of the external relay handling

Slow rules protection

Rspamd will now try to check async timers and events when it notices that some rule takes too much time. This will allow to break dead rules to hold processing for more than task timeout time. In future, we plan to extend task timeout relation for all internal rule timers.

Other features

Here is a list of other important but not categorised features:

  • Allow milter code to deal with multiple headers
  • Antivirus : Add Avast support
  • Dkim_signing : Allow to sign via milter_headers
  • Send quit command to Redis
  • Speed up is_ascii function
  • Improve memory pool allocation routines and add memory debugging

Important bug fixes

  • Critical fix: fix html entities decoding. This bug could cause incorrect URLs being extracted from messages.
  • Critical fix: fix re cache when mix of pcre and hyperscan is used. This bug could cause random failures when scanning regular expressions that could not be evaluated nor approximated to hyperscan.
  • Fix arc seal validation
  • Fix base tag processing according to stupid HTML renderer behaviour
  • Fix dealing with \0 in ucl strings and JSON
  • Fix gpg parts misdetection
  • Fix ignored symbols exporting (e.g. to ClickHouse)
  • Fix processing of numeric url’s
  • Fix processing of the closed tcp connections
  • Fix mixed charset rule for some languages (e.g. Czech)
  • Fix mixing of the IPv4 and IPv6 addresses in Radix maps
  • Fix soft hypen processing
  • Fix O(N^2) algorithm when comparing recipients
  • Various stability improvements when dealing with large or specially crafted messages

Full list of the meaningful changes

  • [Conf] SPF is no longer a C module
  • [Conf] Update spamtrap map path example
  • [CritFix] Fix html entities decoding
  • [CritFix] Fix re cache when mix of pcre and hyperscan is used
  • [Feature] Allow milter code to deal with multiple headers
  • [Feature] Antivirus: Add avast support
  • [Feature] Dkim_signing: Allow to sign via milter_headers
  • [Feature] Implement content hashes
  • [Feature] Lua_text: Add regexp split iterator method
  • [Feature] Lua_text: Implement flattening of the input tables
  • [Feature] Send quit command to Redis
  • [Feature] Speed up is_ascii function
  • [Feature] Spf: Add external_relay option
  • [Fix] Avoid double escaping
  • [Fix] Fix O(N^2) algorithm
  • [Fix] Fix arc seal validation
  • [Fix] Fix base tag processing according to stupid HTML renderer behaviour
  • [Fix] Fix dealing with \0 in ucl strings and JSON
  • [Fix] Fix gpg parts misdetection
  • [Fix] Fix ignored symbols exporting
  • [Fix] Fix processing of numeric url’s
  • [Fix] Fix processing of the closed tcp connections
  • [Fix] Fix regexp type check for pcre2
  • [Fix] Fix urls encode function
  • [Fix] Fix urls shifting when doing decode to include separators
  • [Fix] Fix white on white rule and add is_leaf flag
  • [Fix] Further fixes in charset detection
  • [Fix] Ignore diacritics in chartable module for specific languages
  • [Fix] Limit size of symbols options by max_opts_len option
  • [Fix] More fixes in html tag content calculations
  • [Fix] Plug memory leak in fuzzy storage
  • [Fix] Process high priority settings even if settings/id has been specified
  • [Fix] Select a different upstream on last retransmit
  • [Fix] Treat soft hyphen as zero width space
  • [Fix] Try harder to watch the lifetime of the key_stat
  • [Fix] Use ipv6-mapped-ipv4 addresses in radix trie
  • [Project] Add logic to break execution when processing symbols*
  • [Project] Add methods to set specific content for mime parts from Lua
  • [Project] Lua_content: support PDF files
  • [Project] Move dns_tool to using of the rspamd_spf from FFI module
  • [Project] Preliminary SPF plugin in Lua
  • [Project] Show debug stat for memory pool
  • [Project] Some rework about specific data that is now tagged
  • [Project] Start reworking of the mempool structure
  • [Rework] Allow to add userdata as symbols options
  • [Rework] Change mime part specifics handling
  • [Rework] Move LRU SPF cache from spf plugin
  • [Rework] Rework HTML tags content attachment
  • [Rework] Rework options hash structure
  • [Rework] Start lua_content library
  • [Rework] Stop using of uthash for http headers
  • [Rework] Use faster hashing approach for memory pools variables
  • [Rules] Add PDF related rules

Rspamd 2.2 has been released

2019-11-19 00:00:00 +0000

We have released Rspamd 2.2 today.

This release contains some new features and many bug fixes. There are no incompatible changes introduced with this release to our best knowledge.

This release includes the following features and important fixes.

Added virustotal support

Rspamd now supports Virustotal as an Antivirus plugin. You need to obtain API key to use this service. All normal antivirus module operations are applicable to this plugin.

Clickhouse collection rework

Rspamd now does Clickhouse data collection in a separate perioric event. It allows to do collections based on time, number of rows (as previously) or on amount of memory used. More details are in the GitHub issue.

ASAN builds

Rspamd packages have now ASAN branches to help debugging issues with Rspamd and provide better feedback for the developers. The details about ASAN builds are covered in this FAQ section.

Faster base64 decoding

We have applied number of optimizations to improve the performance of base64 decoding on the modern hardware (especially with AVX2 and/or SSE4.2 support).

Fast unicode validation library

Rspamd now uses number of techniques to improve utf8 validation by utilising modern CPU instructions, such as AVX2 and SSE4. This code is based on the work from Yibo Cai and achieves around 0.5 CPU cycles per byte speed when using AVX2 codec.

Upstreams fixes

There are number of significant improvements in the upstreams library of Rspamd. Specifically, that includes better consistent hashing, better upstreams marking logic and improved logging.

Build system rework

The CMake based build system has been reworked to use more modern design practices provided by newer CMake versions (Rspamd now requires CMake 3.9 as minimum). New build system should improve multiple configurations support and simplify CMake build files.

Full list of the meaningful changes

  • [Conf] Antivirus: Fix the default config
  • [Feature] Add verdict library in lua
  • [Feature] Allow exception when choosing upstream
  • [Feature] Allow to disable symbols from the metric config
  • [Feature] Allow to limit maps per specific worker
  • [Feature] Always validate Rspamd protocol output
  • [Feature] Antivirus: Add preliminary virustotal support
  • [Feature] Clickhouse: Rework Clickhouse collection logic
  • [Feature] Improve base64 usage
  • [Feature] Shutdown timeout is now associated with task timeout
  • [Fix] #3129 Multiple classifiers on redis working incorrectly
  • [Fix] Allow real upstreams configuration
  • [Fix] Another try to fix slow callbacks and timers
  • [Fix] Check results of write message as SSL can bork them
  • [Fix] Clickhouse: Avoid potential races in collection
  • [Fix] Clickhouse: Fix periodic script
  • [Fix] Fail DNS upstream on each retransmit attempt
  • [Fix] Fix consistent hashing when upstreams are marked inactive
  • [Fix] Fix issues found
  • [Fix] Fix off-by-one in retries for the proxy
  • [Fix] Fix termination
  • [Fix] Fix upstreams exclusion logic
  • [Fix] Fix utf8 validation for symbols options and empty strings
  • [Fix] Oops, fix maps reload
  • [Fix] Rbl: Allow utf8 lookups for IDN domains
  • [Fix] Sigh, another try to fix brain-damaged openssl
  • [Project] Add fast utf8 validation library
  • [Project] Use own utf8 validation instead of glib
  • [Rework] Another phase of finish actions rework
  • [Rework] Further cmake system rework
  • [Rework] Further isolation of the controller’s functions
  • [Rework] Make cmake structure more modular
  • [Rework] Move cmake modules to a dedicated path
  • [Rework] Replace controller functions by any scanner worker if needed
  • [Rework] Rework final scripts logic
  • [Rework] Rewrite rspamd_str_make_utf_valid function

Rspamd 2.1 has been released

2019-10-28 00:00:00 +0000

We have released Rspamd 2.1 today.

This release contains some new features and many bug fixes. There are no incompatible changes introduced with this release to our best knowledge.

This release includes the following features and important fixes.

Add uuencode support

Despite of being very old standard, UUenconde parts are still quite common in the email traffic observed. From this version, Rspamd supports uuencoded parts (both normal and base64 version).

Critical issue found in dkim verification

There was a critical regression in 2.0 DKIM verification code caused verification failures for some of the valid DKIM signatures. More details are in the GitHub issue.

Improved neural training

There are number of fixes and improvements in the Neural module. Now all training samples are balanced using random sampling allowing a smoother training vectors selection. Some number of bugs has been fixed, as well as scores are no longer recommended to select training vectors - Rspamd automatically applies heuristic to select messages for learning. Also some issues around infinities and learning threads count have been addressed.

Maps fixes

There are number of fixes and improvements around maps handling logic. This include fixes for both HTTP and file maps, as well as better timeout and caches handling.

Event loop fixes

Rspamd could previously select an inefficient backend on some OSes, notably, on BSD and OSx. This version should fix it. The ability to configure the events backend manually via the configuration file has also been added to Rspamd.

Full list of the meaningful changes

  • [Conf] Update neural.conf
  • [CritFix] Fix dkim verification for multiple headers listed
  • [Feature] Add support of uudecode
  • [Feature] Allow to explicitly set events backend
  • [Feature] Implement configurable limits for SPF lookups
  • [Feature] Lua_scanners: Use lua magic for inclusion/exclusion logic
  • [Feature] Multimap: Do not check files in office archives
  • [Feature] Neural: Add sampling when storing training vectors
  • [Feature] SPF: Allow to disable AAAA checks in configuration
  • [Feature] Spf: Add limits configuration support
  • [Feature] Store etag in cached HTTP maps + better logging
  • [Feature] Support segwit BTC addresses, fix LTC verification
  • [Feature] Support uuencoding
  • [Fix] Add configurable number of threads for OpenBLAS
  • [Fix] Add workaround for ragel 7 in hyperscan related maps code
  • [Fix] Another fix for numeric urls parsing
  • [Fix] Correct EMA time calculations
  • [Fix] Do not treat archives as text
  • [Fix] Do not use strdup on data extracted from lua
  • [Fix] Fix a failure calcuating URL reputation.
  • [Fix] Fix crash due to constructors init order
  • [Fix] Fix crash on parts with no cd
  • [Fix] Fix empty prefilters that require mime structures
  • [Fix] Fix event loop creation
  • [Fix] Fix issues sending DMARC reports.
  • [Fix] Fix misprint
  • [Fix] Fix saving of the file maps
  • [Fix] Fix size calculations when converting from utf16
  • [Fix] Fix support of disable_monitoring in rbl
  • [Fix] Fix use-after-free
  • [Fix] Fix zip files check to relax requirements
  • [Fix] Important hiredis fixes
  • [Fix] Lot’s of fixes in maps check logic
  • [Fix] Lua_tcp: Deal with temporary fails on write
  • [Fix] Lua_tcp: Make write errors fatal and rework error handlers
  • [Fix] Meta: Filter some more values
  • [Fix] Neural: Add protection agains infinities
  • [Fix] Oops, fix math.huge invocation
  • [Fix] Plug memory leak
  • [Fix] Sigh, another email to string fix
  • [Fix] Try to fix another ownership race in ssl connection
  • [Fix] Uuencode: Fix parsing of corrupted uuencode
  • [Fix] lua_scanners - razor rename need_check function
  • [Rework] Require CMake 3.9 to work, remove manual lto crap