Rspamd 2.7 has been released

2021-01-08 00:00:00 +0000

We have released Rspamd 2.7 today.

This is mostly a bug-fix release with no incompatible changes.

Here is a list of the major projects and serious bugfixes where applicable.

Fixed issues with DKIM and ARC verification

It was possible for some DKIM checks to fail where multiple signatures are present due to a canonicalisation bug. This issue has now been fixed. Arc plugin has also been fixed to support certain CV values.

Added support for S/MIME containers

From this version, Rspamd supports .p7 containers and extracting signed parts during the checks. For details see the following issue.

Several important rules rework

Anton Yuzhaninov has reworked many old rules in Rspamd improving their quality and has removed several outdated rules as well.

Support of caching for regexp multimaps

Regexp maps can now be cached on disk which should improve loading speed of large maps on reload/restart of Rspamd if they are unchanged.

Neural plugin offline learning

In this mode, Rspamd can train neural network from Clickhouse so it is possible to define better training conditions and manage learning for large systems with more fine grained control. Please refer to the corresponding documentation section for more details. Thanks to Andrew Lewis for implementing this functionality.

Other changes

Here is the list of the important changes:

  • [Conf] Add R_DKIM_PERMFAIL to the metric
  • [CritFix] Dkim: Fix simple canonicalisation if multiple signatures are presented
  • [CritFix] Fix controller paths normalisation
  • [Feature] Add INVALID_DATE rule
  • [Feature] Add controller endpoint for training neural
  • [Feature] Add sanity checks for actions thresholds
  • [Feature] Add support of ‘==’ and ‘!=’ in Rspamd expressions
  • [Feature] Composites: Improve composite atoms parser
  • [Feature] Docker: use Debian slim variant
  • [Feature] Elastic: Add some missing fields
  • [Feature] Extract text from img alt attributes
  • [Feature] Improve charset detection logic
  • [Feature] Lua_clickhouse: Add optional row callback for large selections
  • [Feature] Lua_dns_resolver: Add idna_convert_utf8 method
  • [Feature] Lua_mime: Add ability to do multipattern replacement
  • [Feature] Lua_trie: Allow to report start of the match
  • [Feature] Multimap: support adding map values as extra options
  • [Feature] Neural: Move PCA learning to a subprocess
  • [Feature] RBL: support matching content/image URLs only
  • [Feature] RBL: support use of multiple selectors
  • [Feature] Reputation: Allow to specify ip masks
  • [Feature] Support SMIME signed messages container
  • [Feature] Support multiple conditions for symbols
  • [Feature] Support ping in milter mode
  • [Feature] Support rspamd_text in selector regexps
  • [Feature] Use own daemonization routine
  • [Feature] Vadesecure: Implement settings_outbound feature as recommended by Vade
  • [Feature] rspamadm clickhouse command
  • [Feature] allow hyperscan for aarch64
  • [Fix] Allow to set priorities between post init scripts
  • [Fix] Allow to use maps for strings that are not zero terminated
  • [Fix] Apply max_lua_urls limit for emails as well
  • [Fix] Arc: Fix CV check on signing
  • [Fix] Arc: Fix signing of the broken ARC chains
  • [Fix] Clickhouse: escape carriage return
  • [Fix] Composites: Allow partial match
  • [Fix] Deduct type of a table methods
  • [Fix] Do not load errored hyperscan database
  • [Fix] Do not process links in ignored html tags
  • [Fix] Fix ClamAV result for cached encrypted file (#3395)
  • [Fix] Fix canonicalisation when l= tag is presented
  • [Fix] Fix flag shift
  • [Fix] Fix handling of skip/skip_process http flags
  • [Fix] Fix html attachments checks
  • [Fix] Fix issue with pushing binary formats to Lua strings
  • [Fix] Fix logging for rspamadm
  • [Fix] Fix off-by-one with init check
  • [Fix] Fix parsing of escape characters in quoted pairs
  • [Fix] Fix pushing ucl strings with \0 inside
  • [Fix] Fix quoted-printable soft newlines bugged case
  • [Fix] Fix settings in case actions are set to null (#3415)
  • [Fix] Fix several issues with auth results producing
  • [Fix] Fix smtp comments exclusion
  • [Fix] Fix smtp date syntax definition
  • [Fix] Fix substring search in case if srchlen == inlen
  • [Fix] Fix text selectors
  • [Fix] Honour systemd setting when logging to console (#3514)
  • [Fix] Html: Add entities collisions prevention logic (e.g. for mathml entities)
  • [Fix] Lua_auth_results: Quote potentially bad values in AR header
  • [Fix] Multimap: Fix flags usage
  • [Fix] Multimap: Fix scoring for combined maps
  • [Fix] Plug GList * leak in redis pool
  • [Fix] RBL: allow for multiple matches of the same label if types are different
  • [Fix] Rely on libev checks for file maps
  • [Fix] Restore simple dkim canonicalisation mode
  • [Fix] Return MimeCharset as we work with emails…
  • [Fix] Spamassassin: Fix pcre_only flags
  • [Fix] Spamassassin: Preserve ‘pcre_only’ flag when dealing with regexp replacements
  • [Fix] Try to fix GError leak
  • [Fix] Try to fix a mess with settings loading by adding priorities
  • [Fix] Try to move setings initialisation to a later stage
  • [Fix] Use dup fd in milter handler to avoid races with the proxy
  • [Fix] Use message pointer to avoid obsolete data to be cached
  • [Project] Rbl: Migrate to checks
  • [Project] Rbl: Move config code outside of the plugin
  • [Project] Ressurect empty prefilters as connection filters
  • [Project] Support connection filters registration from Lua
  • [Rework] Add final cleanup logic
  • [Rework] Add preliminary support of hyperscan caching for re maps
  • [Rework] Add stale cache removal
  • [Rework] Clickhouse: Improve performance
  • [Rework] Distinguish between strict config test mode
  • [Rework] Furhter logging improvements
  • [Rework] Milter_headers: improve extended_headers_rcpt support
  • [Rework] Move parsers to a separate lua library
  • [Rework] Neural: Skip composite symbols
  • [Rework] Rbl: Rework defaults logic
  • [Rework] Some tunes to cache saving
  • [Rework] Track maps origins
  • [Rework] Use full crypto hash for regexp maps
  • [Rules] Remove broken rule

Rspamd 2.6 has been released

2020-09-30 00:00:00 +0000

We have released Rspamd 2.6 today.

There are several major projects in this release: neural network plugin various improvements, better bitcoin scam detection, conditional regular expressions and other reworks of the code, such as shadow results support has been done. Numerous of the bug fixes, including some critical ones have also been applied during this release cycle.

Here is a list of the major projects and serious bugfixes where applicable.

Neural network plugin rework

Rspamd now includes PCA method to reduce the input space dimentionality in the heavily customised environments with many rules. This method allows to transform all rules set to a fixed number of inputs for neural network using linear transformation. There are also other improvements for neural network plugin that have been added in this release, including the following:

  • Probabilistic learn method where spam and ham samples could be not balanced (useful for the cases where spam/ham amounts are significantly different)
  • Allowing to set a maximum number of inputs for ANN (via PCA prefiltering)
  • Reworked the internal structure of ANN (more hidden layers and fixed the output function)
  • Low level tensors library for speeding up the matrices operations
  • BLIS algebra library support

Reworked bitcoin detection library

Rspamd now supports lua filters for regular expressions. The idea is to allow fast pre-filter with regular expressions and slow Lua postprocessing for the cases where this processing is needed. Here is how it’s used in bitcoin library:

config.regexp['RE_POSTPROCESS'] = {
  description = 'Example of postprocessing for regular expressions',
  re = string.format('(%s) || (%s)', re1, re2),
  re_conditions = {
    [re1] = function(task, txt, s, e)
      if e - s <= 2 then
        return false

      if check_re1(task, txt:sub(s + 1, e)) then
        return true
    [re2] = function(task, txt, s, e)
      if e - s <= 2 then
        return false

      if check_re2(task, txt:sub(s + 1, e)) then
        return true

This allows to add accelerated rules that are enabled merely if some relatively rare regular expression matches. In this particular case this feature is used to do BTC wallet verification and validation.

IDNA bugs are fixed

Dr. Hajime Shimada and Mr. Shirakura from Nagoya University have investigated that it is possible to bypass Rspamd URLs detection by using of a special Unicode characters. We have changed this behaviour so now full IDNA validation/normalisation is performed. I would like to thank the researchers for sharing that with us.

Fuzzy module telemetry

Rspamd will now send more data when checking for fuzzy hashes: it will send the source IP address of email being scanned and the domain name of a sender. This data is end-to-end encrypted between you and Rspamd public fuzzy storage and I plan to use it for better spam detection. If you don’t want this data to be shared then please stop using of the public fuzzy storage or set no_share flag to true.

Other major improvements

  • Use google-ced instead of libicu character detection
  • Rework and refactor forged recipients plugin
  • Added SO_REUSEPORT support for UDP sockets on Linux
  • Better Spamhaus DQS service support (e.g. hashbl)
  • Added secretbox Lua API for symmetric encryption (AEAD)
  • More bitcoin addresses support (Bitcoincash, new BTC addresses etc)
  • Timeouts for PDF processing
  • Many improvements to the tests and build systems

Critical/important fixes

  • Arc: Fix ARC validation for chains of signatures
  • Fix IDNA dots parsing
  • Fix usage of crypto_sign it should be crypto_sign_detached!

Here is the list of the important changes:

  • [Conf] Add missing symbols
  • [Conf] Add missing symbols
  • [Conf] Fix fat-fingers typo
  • [Conf] Fix wrong comment in
  • [Conf] Neural: Fix the default name for max_trains
  • [Conf] Register a known symbol
  • [Conf] Spf: Add R_SPF_PERMFAIL symbol
  • [CritFix] Arc: Fix ARC validation for chains of signatures
  • [CritFix] Distinguish socketpairs between different fuzzy workers
  • [CritFix] Fix IDNA dots parsing
  • [CritFix] Fix test assertion method
  • [CritFix] Fix usage of crypto_sign it should be crypto_sign_detached!
  • [Feature] Add BOUNCE rule
  • [Feature] Add controller plugins support and selectors plugin
  • [Feature] Add maps query method
  • [Feature] Add minimal delay to fuzzy storage
  • [Feature] Add multiple base32 alphabets for decoding
  • [Feature] Add preliminary support of BCH addresses
  • [Feature] Add query_specific endpoint
  • [Feature] Allow multiple base32 encodings in Lua API
  • [Feature] Allow to specify nonces manually
  • [Feature] Controller: Allow to pass query arguments to the lua webui plugins
  • [Feature] Fuzzy_check: Add gen_hashes command
  • [Feature] Fuzzy_check: Add weight_threshold option for fuzzy rules
  • [Feature] Implement address retry on connection failure
  • [Feature] Improve limits in pdf scanning
  • [Feature] Initial support of subscribe command in lua_redis
  • [Feature] Lua_cryptobox: Add secretbox API
  • [Feature] Lua_text: Add encoding methods
  • [Feature] Milter_headers: Allow to activate routines via users settings
  • [Feature] PDF: Add timeouts for expensive operations
  • [Feature] Preliminary maps addon for controller
  • [Feature] Split pdf processing object and output object to allow GC
  • [Feature] Support BLIS blas library
  • [Feature] Support input vectorisation by recvmmsg call
  • [Feature] Support multiple base32 alphabets
  • [Feature] add queueid, uid, messageid and specific symbols to selectors [Minor] use only selectors to fill vars in force_actions message
  • [Feature] allow variables in force_actions messages
  • [Feature] extend lua api
  • [Fix] #3249
  • [Fix] Allow to adjust neurons in the hidden layer
  • [Fix] Another try to fix email names parsing
  • [Fix] Arc: Allow to reuse authentication results when doing multi-stage signing
  • [Fix] Arc: Fix bug with arc chains verification where i>1
  • [Fix] Arc: Sort headers by their i= value
  • [Fix] Change neural plugin’s loss function
  • [Fix] Deal with double eqsigns when decoding headers
  • [Fix] Default ANN names in clickhouse
  • [Fix] Disable reuseport for TCP sockets as it causes too many troubles
  • [Fix] Disable text detection heuristics for encrypted parts
  • [Fix] Distinguish DKIM keys by md5
  • [Fix] Distinguish type from flags in register_symbol
  • [Fix] Dmarc: Unbreak reporting after cf2ae3292ac93da8b6e0624b48a62828a51803c9
  • [Fix] Do not flag pre-result of virus scanners as least if action is reject
  • [Fix] Do not use GC64 workaround on 32bit platforms, omg
  • [Fix] Exclude damaged urls from html parser
  • [Fix] Fix FWD_GOOGLE rule (#1815)
  • [Fix] Fix adding of the empty archive file for gzip
  • [Fix] Fix aliases in forged recipients and limit number of iterations
  • [Fix] Fix authentication results insertion
  • [Fix] Fix calling of methods in selectors
  • [Fix] Fix clen length for hiredis…
  • [Fix] Fix endless loop if broken arc chain has been found
  • [Fix] Fix false - operation
  • [Fix] Fix get_urls table invocation
  • [Fix] Fix group based composites
  • [Fix] Fix headers passing in rspamd_proxy
  • [Fix] Fix incomplete utf8 sequences handling
  • [Fix] Fix lua_next invocation
  • [Fix] Fix lua_parse_symbol_type function logic
  • [Fix] Fix multiple listen configuration
  • [Fix] Fix occasional encryption of the cached data
  • [Fix] Fix parsing boundaries with spaces
  • [Fix] Fix passing of methods arguments
  • [Fix] Fix poor man allocator algorithm
  • [Fix] Fix regexp selector and add flattening
  • [Fix] Fix rfc base32 encode ordering (skip inverse bits)
  • [Fix] Fix rfc based base32 decoding
  • [Fix] Fix sockets leak in the client
  • [Fix] Fix storing of the original smtp from
  • [Fix] Fix types check and types usage in lua_cryptobox
  • [Fix] Fix unused results
  • [Fix] Fuzzy_check: Disable shingles for short texts (really)
  • [Fix] Ical: Fix identation grammar
  • [Fix] Improve part:is_attachment logic
  • [Fix] Mmap return value must be checked versus MAP_FAILED
  • [Fix] One more fix to skip images that are not urls
  • [Fix] Pdf: Support some weird objects with no newline before endobj
  • [Fix] Rbl: Fix ignore_defaults in conjunction with ignore_whitelists
  • [Fix] Restore support for for and id parts in received headers
  • [Fix] Segmentation fault in contrib/lua-lpeg/lpvm.c on ppc64el
  • [Fix] Skip spaces at the boundary end
  • [Fix] Slashing fix: fix captures matching API
  • [Fix] Spamassassin: Rework metas processing
  • [Fix] Store reference of upstream list in upstreams objects
  • [Fix] Understand utf8 in content-disposition parser
  • [Fix] Unify selectors digest functions
  • [Fix] Use abs value when checking composites
  • [Fix] Use strict IDNA for utf8 DNS names + add sanity checks for DNS names
  • [Fix] Use unsigned char and better support of utf8 in ragel parser
  • [Fix] add missing selector_cache declaration
  • [Project] Add L flag for regexps to save start of the match in Hyperscan
  • [Project] Add lower method to lua_text
  • [Project] Add a simple matrix Lua library
  • [Project] Add implicit bitcoincash prefix
  • [Project] Add linalg ffi library for prototyping
  • [Project] Add methods to append data to fuzzy requests
  • [Project] Add routine to call a generic lua function
  • [Project] Add ssyev method interface
  • [Project] Add tensors index method
  • [Project] Add text:sub method
  • [Project] Allow rspamd_text based selectors
  • [Project] Allow to specify re_conditions for regular expressions
  • [Project] Attach extensions to the binary fuzzy commands
  • [Project] Bitcoin: BTC cash addresses needs some checksum validation
  • [Project] Cleanup the redis script
  • [Project] Convert bitcoin rules to the new regexp conditions feature
  • [Project] Detect memrchr in systems that supports it
  • [Project] Do not listen sockets in the main process
  • [Project] Implement ‘probabilistic’ learn mode for ANN
  • [Project] Implement BTC polymod in C as it requires 64 bit ops
  • [Project] Implement bitcoin cash validation in a proper way
  • [Project] Implement extensions logic for fuzzy storage
  • [Project] Implement symbols insertion in multiple results mode
  • [Project] Lua_text: Add method memchr
  • [Project] Neural: Add PCA loading logic
  • [Project] Neural: Fix PCA based learning
  • [Project] Neural: Fix matrix gemm
  • [Project] Neural: Further PCA fixes
  • [Project] Neural: Implement PCA in learning
  • [Project] Neural: Implement PCA learning
  • [Project] Neural: Implement PCA on ANN forward
  • [Project] Neural: Implement PCA serialisation
  • [Project] Neural: Start PCA implementation
  • [Project] Neural: Use C version of scatter matrix producing
  • [Project] Preliminary support of lua conditions for regexps
  • [Project] Preliminary usage of the reuseport
  • [Project] Process composites separately for each shadow result
  • [Project] Remove old code
  • [Project] Rework scan result functions to support shadow results
  • [Project] Rework some more functions to work with shadow results
  • [Project] Some more fixes
  • [Project] Start results chain implementation
  • [Project] Support fun iterators on rspamd_text objects
  • [Project] Support multiply, minus and divide operators in expressions
  • [Project] Tensor: Move scatter matrix calculation to C
  • [Rework] Allow to specify exat metric result when adding a symbol
  • [Rework] Change and improve openblas detection and usage
  • [Rework] Close listen sockets in main after fork
  • [Rework] Further rework of lua urls extraction API
  • [Rework] Lua_cryptobox: Allow to store output of the hash function
  • [Rework] Lua_task: Add more methods to deal with shadow results
  • [Rework] Modernize logging for expressions
  • [Rework] Remove empty prefilters feature - we are not prepared…
  • [Rework] Remove old FindLua module, disable lua fallback when LuaJIT is enabled
  • [Rework] Rework and refactor forged recipients plugin
  • [Rework] Rework expressions processing
  • [Rework] Rework fuzzy commands processing
  • [Rework] Rework url flags handling API
  • [Rework] Rework urls extraction
  • [Rework] Split operations processing and add more debug logs
  • [Rework] Update zstd to 1.4.5
  • [Rework] Use google-ced instead of libicu chardet as the former sucks
  • [Rework] add alias util:parse_addr for util:parse_mail_address
  • [Rework] get rid of util:parse_addr duplicating the util:parse_mail_address, replace where used
  • [Rules] Allow prefix for bitcoin cash addresses
  • [Rules] More fixes for bitcoin cash addresses decoding
  • [Rules] Refactor bleach32 addresses handling

Rspamd 2.5 has been released

2020-04-01 00:00:00 +0000

We have released Rspamd 2.5 today.

You can find that the first start of Rspamd will take more time than usually (around 10-30 seconds). That is intended as Rspamd has to recompile some of the url pattern matchers. It will not happen on subsequent restarts nor to the subsequent updates. It should also not happen if you have used Release Candidates for this version.

There are 3 major projects in this release:

  • URL extraction rework: the URL extraction logic in Rspamd has been significantly reworked to provide better DoS resistance, better matching and lower false positives.
  • URL structure update: the URL storage structures have been reworked to occupy less memory and use a more efficient storage
  • Hyperscan early load: since PCRE is a backtracking RE engine it is not very safe to use in conjunction with complicated rules and untrusted data, in this release, Rspamd will load hyperscan database on early stages to avoid PCRE fallback if possible

Several major fixes:

  • Base64 detection has been fixed and improved to reduce FP rate
  • Query urls are now fully processed
  • Bundled libev has been updated to 4.33 (fixing many issues with FD closing race conditions)
  • Fixed ANN normalisation
  • Fixed redis backend leaks

Useful features:

  • Added whitelisted_signers_map in ARC module
  • Implemented /etc/hosts files processing

Here is the list of the most important changes:

  • [Conf] Mark Rspamd emailbl as ignore whitelist
  • [Conf] RBL: Add missing emails = true option
  • [Feature] Add support for scripts in fuzzy storage
  • [Feature] Arc: Add whitelisted_signers_map option
  • [Feature] Implement hosts file processing
  • [Feature] Neural: Introduce classes bias that allows non-equal classes learning
  • [Feature] Update libev to 4.33
  • [Fix] Another brain damage html standard adoptions
  • [Fix] Another fix for brain damaged obs-fws state
  • [Fix] Fix flags that caused force_actions failure
  • [Fix] Fix logging issue
  • [Fix] Fix lua symbols scores registration when config does not define scores
  • [Fix] Fix opaque maps logic
  • [Fix] Fix parsing of the html tags with no spaces after attributes
  • [Fix] Fix some corner cases in urls parsing, add limits
  • [Fix] Fix tlds extraction if custom composition rules are used
  • [Fix] Fix variables replacement in mempool
  • [Fix] Improve base64 detection
  • [Fix] Normalize dynamic scores in ANN correctly
  • [Fix] Plug memory leak introduced by #3153
  • [Fix] Stat_redis_backend: Fix memory leak and simplify learn path
  • [Fix] Try hard to deal with ghost workers
  • [Fix] metadata_exporter default formatter
  • [Rework] Change the way to extract URLs when dealing with alternative parts
  • [Rework] Fix various url extraction issues
  • [Rework] Re cache: Load compiled hyperscan in the main process as well
  • [Rework] Re cache: Load hyperscan early
  • [Rework] Rework URL structure: adjust tld part
  • [Rework] Rework URL structure: host field
  • [Rework] Rework URL structure: more structure optimisations
  • [Rework] Rework URL structure: user field
  • [Rework] URL: Another update for urls extraction logic
  • [Rework] Urls: Improve query urls handling
  • [Rework] Urls: adopt html related stuff
  • [Rework] Urls: more rework of the urls sets
  • [Rework] Urls: process query urls in HTML urls correctly
  • [Rework] Urls: rework urls hash structure
  • [Rework] Urls: update lua libraries
  • [Rework] Use multiple search tries for different url extraction types

Rspamd 2.4 has been released

2020-02-26 00:00:00 +0000

We have released Rspamd 2.4 today.

This is a bug fixes release mainly.

3 major projects in this release:

  • Logger system rework: fixed syslog logging, improved architecture, improved logging reload
  • URL composition library (similar to old 2tld map for surbl module), use this library in RBL module for all URL like objects: urls, emails, dkim domains
  • Implemented SSL client caching: it should improve client SSL connections: Clickhouse, SMTPS, maps check and so on for both client and a server.

Several major fixes:

  • Parsing of the content type attributes
  • Avoid collisions in mempool variables
  • Fixed Redis Sentinel support
  • Fixed IPv6 listening
  • Fixed mime modifications for 7bit parts
  • Fixed passthrough result and smtp message
  • Important eSLD url composition fixes
  • Various neural network plugin fixes

Useful features:

  • Custom additional columns in Clickhouse plugin
  • Support of CDB maps everywhere to share huge maps across workers with no extra cost

Here is the list of the most important changes:

  • [CritFix] Fix parsing of the content type attributes
  • [Feature] Clickhouse: Add extra columns support
  • [Feature] Rbl: Add url_compose_map option for RBL rules
  • [Fix] ‘R’ flag is for all headers regexp
  • [Fix] Allow to reset settings id from Lua (e.g. because of the priority)
  • [Fix] Avoid collisions in mempool variables by changing fuzzy caching logic
  • [Fix] Avoid strdup usage for symbols options
  • [Fix] Do not trust stat(2) it lies
  • [Fix] Filter all options for symbols to have sane characters
  • [Fix] Fix all headers iteration
  • [Fix] Fix allowed_settings for neural
  • [Fix] Fix listen socket parsing
  • [Fix] Fix maps expressions evaluation
  • [Fix] Fix sentinel connections leak by using async connections
  • [Fix] Fix smtp message on passthrough result
  • [Fix] Fix tld compositon rules
  • [Fix] Fuzzy_storage: Do not check for shingles if a direct hash has been found
  • [Fix] Lua_mime: Do not perform QP encoding for 7bit parts
  • [Fix] Neural: Distinguish missing symbols from symbols with low scores
  • [Fix] Support listening on systemd sockets by name
  • [Project] Add lua_urls_compose library
  • [Project] Allow to set a custom log function to the logger
  • [Project] CDB maps: Start making cdb a first class citizen
  • [Project] Clickhouse: Add extra columns concept
  • [Project] Fix urls composition rules, add unit tests
  • [Project] Unify cdb maps
  • [Rework] Logger infrastructure rework
  • [Rework] Refactor libraries structure
  • [Rework] Rework SSL caching
  • [Rework] Update snowball stemmer to 2.0 and remove all crap aside of UTF8

Rspamd 2.3 has been released

2020-02-04 00:00:00 +0000

We have released Rspamd 2.3 today.

This release has various improvements, pdf parsing support and new SPF plugin. Numerous of the bug fixes, including some critical ones have also been applied during this release cycle.

Here is the list of the most important changes.

Lua content library

This library is a part of content scanning project that enables Rspamd to process content formats that are commonly met in email. In this release, we have added some preliminary PDF files support.

Rspamd supports the following features so far:

  • URLs extraction from PDF files
  • JavaScript extraction from actions
  • Using of fuzzy hashes storage to store suspicious js extracted from PDF files
  • Couple of rules related to bad PDF files (e.g. PDF_ENCRYPTED)

This library is written in pure Lua, using LPEG and Hyperscan under the hood to provide memory safety and high processing speed.

SPF plugin revamped

SPF plugin has been one of the oldest plugins in Rspamd. Unfortunately due to the fact that it has been written in C, it was very hard to add new functions to this plugin nor maintain it in the consistent state. In this release, SPF plugin has been rewritten in Lua providing configuration compatible mode with some of new features, such as support of the external relay handling

Slow rules protection

Rspamd will now try to check async timers and events when it notices that some rule takes too much time. This will allow to break dead rules to hold processing for more than task timeout time. In future, we plan to extend task timeout relation for all internal rule timers.

Other features

Here is a list of other important but not categorised features:

  • Allow milter code to deal with multiple headers
  • Antivirus : Add Avast support
  • Dkim_signing : Allow to sign via milter_headers
  • Send quit command to Redis
  • Speed up is_ascii function
  • Improve memory pool allocation routines and add memory debugging

Important bug fixes

  • Critical fix: fix html entities decoding. This bug could cause incorrect URLs being extracted from messages.
  • Critical fix: fix re cache when mix of pcre and hyperscan is used. This bug could cause random failures when scanning regular expressions that could not be evaluated nor approximated to hyperscan.
  • Fix arc seal validation
  • Fix base tag processing according to stupid HTML renderer behaviour
  • Fix dealing with \0 in ucl strings and JSON
  • Fix gpg parts misdetection
  • Fix ignored symbols exporting (e.g. to ClickHouse)
  • Fix processing of numeric url’s
  • Fix processing of the closed tcp connections
  • Fix mixed charset rule for some languages (e.g. Czech)
  • Fix mixing of the IPv4 and IPv6 addresses in Radix maps
  • Fix soft hypen processing
  • Fix O(N^2) algorithm when comparing recipients
  • Various stability improvements when dealing with large or specially crafted messages

Full list of the meaningful changes

  • [Conf] SPF is no longer a C module
  • [Conf] Update spamtrap map path example
  • [CritFix] Fix html entities decoding
  • [CritFix] Fix re cache when mix of pcre and hyperscan is used
  • [Feature] Allow milter code to deal with multiple headers
  • [Feature] Antivirus: Add avast support
  • [Feature] Dkim_signing: Allow to sign via milter_headers
  • [Feature] Implement content hashes
  • [Feature] Lua_text: Add regexp split iterator method
  • [Feature] Lua_text: Implement flattening of the input tables
  • [Feature] Send quit command to Redis
  • [Feature] Speed up is_ascii function
  • [Feature] Spf: Add external_relay option
  • [Fix] Avoid double escaping
  • [Fix] Fix O(N^2) algorithm
  • [Fix] Fix arc seal validation
  • [Fix] Fix base tag processing according to stupid HTML renderer behaviour
  • [Fix] Fix dealing with \0 in ucl strings and JSON
  • [Fix] Fix gpg parts misdetection
  • [Fix] Fix ignored symbols exporting
  • [Fix] Fix processing of numeric url’s
  • [Fix] Fix processing of the closed tcp connections
  • [Fix] Fix regexp type check for pcre2
  • [Fix] Fix urls encode function
  • [Fix] Fix urls shifting when doing decode to include separators
  • [Fix] Fix white on white rule and add is_leaf flag
  • [Fix] Further fixes in charset detection
  • [Fix] Ignore diacritics in chartable module for specific languages
  • [Fix] Limit size of symbols options by max_opts_len option
  • [Fix] More fixes in html tag content calculations
  • [Fix] Plug memory leak in fuzzy storage
  • [Fix] Process high priority settings even if settings/id has been specified
  • [Fix] Select a different upstream on last retransmit
  • [Fix] Treat soft hyphen as zero width space
  • [Fix] Try harder to watch the lifetime of the key_stat
  • [Fix] Use ipv6-mapped-ipv4 addresses in radix trie
  • [Project] Add logic to break execution when processing symbols*
  • [Project] Add methods to set specific content for mime parts from Lua
  • [Project] Lua_content: support PDF files
  • [Project] Move dns_tool to using of the rspamd_spf from FFI module
  • [Project] Preliminary SPF plugin in Lua
  • [Project] Show debug stat for memory pool
  • [Project] Some rework about specific data that is now tagged
  • [Project] Start reworking of the mempool structure
  • [Rework] Allow to add userdata as symbols options
  • [Rework] Change mime part specifics handling
  • [Rework] Move LRU SPF cache from spf plugin
  • [Rework] Rework HTML tags content attachment
  • [Rework] Rework options hash structure
  • [Rework] Start lua_content library
  • [Rework] Stop using of uthash for http headers
  • [Rework] Use faster hashing approach for memory pools variables
  • [Rules] Add PDF related rules