We have released Rspamd 2.3 today.
This release has various improvements, pdf parsing support and new SPF plugin.
Numerous of the bug fixes, including some critical ones have also been applied during this release cycle.
Here is the list of the most important changes.
Lua content library
This library is a part of content scanning project that enables Rspamd to process content formats that
are commonly met in email. In this release, we have added some preliminary PDF files support.
Rspamd supports the following features so far:
- URLs extraction from PDF files
- JavaScript extraction from actions
- Using of fuzzy hashes storage to store suspicious js extracted from PDF files
- Couple of rules related to bad PDF files (e.g.
PDF_ENCRYPTED
)
This library is written in pure Lua, using LPEG and Hyperscan under the hood to provide
memory safety and high processing speed.
SPF plugin revamped
SPF plugin has been one of the oldest plugins in Rspamd. Unfortunately due to the fact that it has been written in C, it was very hard to add new functions to this plugin nor maintain it in the consistent state. In this release, SPF plugin has been rewritten in Lua providing configuration compatible mode with some of new features, such as support of the external relay handling
Slow rules protection
Rspamd will now try to check async timers and events when it notices that some rule takes too much time. This will allow to break dead
rules to hold processing for more than task timeout time. In future, we plan to extend task timeout relation for all internal rule timers.
Other features
Here is a list of other important but not categorised features:
- Allow milter code to deal with multiple headers
- Antivirus : Add Avast support
- Dkim_signing : Allow to sign via milter_headers
- Send quit command to Redis
- Speed up is_ascii function
- Improve memory pool allocation routines and add memory debugging
Important bug fixes
- Critical fix: fix html entities decoding. This bug could cause incorrect URLs being extracted from messages.
- Critical fix: fix re cache when mix of pcre and hyperscan is used. This bug could cause random failures when scanning regular expressions that could not be evaluated nor approximated to hyperscan.
- Fix arc seal validation
- Fix base tag processing according to stupid HTML renderer behaviour
- Fix dealing with
\0
in ucl strings and JSON
- Fix gpg parts misdetection
- Fix ignored symbols exporting (e.g. to ClickHouse)
- Fix processing of numeric url’s
- Fix processing of the closed tcp connections
- Fix mixed charset rule for some languages (e.g. Czech)
- Fix mixing of the IPv4 and IPv6 addresses in Radix maps
- Fix soft hypen processing
- Fix
O(N^2)
algorithm when comparing recipients
- Various stability improvements when dealing with large or specially crafted messages
Full list of the meaningful changes
- [Conf] SPF is no longer a C module
- [Conf] Update spamtrap map path example
- [CritFix] Fix html entities decoding
- [CritFix] Fix re cache when mix of pcre and hyperscan is used
- [Feature] Allow milter code to deal with multiple headers
- [Feature] Antivirus: Add avast support
- [Feature] Dkim_signing: Allow to sign via milter_headers
- [Feature] Implement content hashes
- [Feature] Lua_text: Add regexp split iterator method
- [Feature] Lua_text: Implement flattening of the input tables
- [Feature] Send quit command to Redis
- [Feature] Speed up is_ascii function
- [Feature] Spf: Add external_relay option
- [Fix] Avoid double escaping
- [Fix] Fix O(N^2) algorithm
- [Fix] Fix arc seal validation
- [Fix] Fix base tag processing according to stupid HTML renderer behaviour
- [Fix] Fix dealing with
\0
in ucl strings and JSON
- [Fix] Fix gpg parts misdetection
- [Fix] Fix ignored symbols exporting
- [Fix] Fix processing of numeric url’s
- [Fix] Fix processing of the closed tcp connections
- [Fix] Fix regexp type check for pcre2
- [Fix] Fix urls encode function
- [Fix] Fix urls shifting when doing decode to include separators
- [Fix] Fix white on white rule and add is_leaf flag
- [Fix] Further fixes in charset detection
- [Fix] Ignore diacritics in chartable module for specific languages
- [Fix] Limit size of symbols options by max_opts_len option
- [Fix] More fixes in html tag content calculations
- [Fix] Plug memory leak in fuzzy storage
- [Fix] Process high priority settings even if settings/id has been specified
- [Fix] Select a different upstream on last retransmit
- [Fix] Treat soft hyphen as zero width space
- [Fix] Try harder to watch the lifetime of the key_stat
- [Fix] Use ipv6-mapped-ipv4 addresses in radix trie
- [Project] Add logic to break execution when processing symbols*
- [Project] Add methods to set specific content for mime parts from Lua
- [Project] Lua_content: support PDF files
- [Project] Move dns_tool to using of the rspamd_spf from FFI module
- [Project] Preliminary SPF plugin in Lua
- [Project] Show debug stat for memory pool
- [Project] Some rework about specific data that is now tagged
- [Project] Start reworking of the mempool structure
- [Rework] Allow to add userdata as symbols options
- [Rework] Change mime part specifics handling
- [Rework] Move LRU SPF cache from spf plugin
- [Rework] Rework HTML tags content attachment
- [Rework] Rework options hash structure
- [Rework] Start lua_content library
- [Rework] Stop using of uthash for http headers
- [Rework] Use faster hashing approach for memory pools variables
- [Rules] Add PDF related rules
We have released Rspamd 2.2 today.
This release contains some new features and many bug fixes. There are no incompatible changes introduced with this release to our best knowledge.
This release includes the following features and important fixes.
Added virustotal support
Rspamd now supports Virustotal as an Antivirus plugin. You need to obtain API key to use this service. All normal antivirus module operations are applicable to this plugin.
Clickhouse collection rework
Rspamd now does Clickhouse data collection in a separate perioric event. It allows to do collections based on time, number of rows (as previously) or on amount of memory used. More details are in the GitHub issue.
ASAN builds
Rspamd packages have now ASAN branches to help debugging issues with Rspamd and provide better feedback for the developers. The details about ASAN builds are covered in this FAQ section.
Faster base64 decoding
We have applied number of optimizations to improve the performance of base64 decoding on the modern hardware (especially with AVX2 and/or SSE4.2 support).
Fast unicode validation library
Rspamd now uses number of techniques to improve utf8 validation by utilising modern CPU instructions, such as AVX2 and SSE4. This code is based on the work from Yibo Cai and achieves around 0.5 CPU cycles per byte speed when using AVX2 codec.
Upstreams fixes
There are number of significant improvements in the upstreams library of Rspamd. Specifically, that includes better consistent hashing, better upstreams marking logic and improved logging.
Build system rework
The CMake based build system has been reworked to use more modern design practices provided by newer CMake versions (Rspamd now requires CMake 3.9 as minimum).
New build system should improve multiple configurations support and simplify CMake build files.
Full list of the meaningful changes
- [Conf] Antivirus: Fix the default config
- [Feature] Add verdict library in lua
- [Feature] Allow exception when choosing upstream
- [Feature] Allow to disable symbols from the metric config
- [Feature] Allow to limit maps per specific worker
- [Feature] Always validate Rspamd protocol output
- [Feature] Antivirus: Add preliminary virustotal support
- [Feature] Clickhouse: Rework Clickhouse collection logic
- [Feature] Improve base64 usage
- [Feature] Shutdown timeout is now associated with task timeout
- [Fix] #3129 Multiple classifiers on redis working incorrectly
- [Fix] Allow real upstreams configuration
- [Fix] Another try to fix slow callbacks and timers
- [Fix] Check results of write message as SSL can bork them
- [Fix] Clickhouse: Avoid potential races in collection
- [Fix] Clickhouse: Fix periodic script
- [Fix] Fail DNS upstream on each retransmit attempt
- [Fix] Fix consistent hashing when upstreams are marked inactive
- [Fix] Fix issues found
- [Fix] Fix off-by-one in retries for the proxy
- [Fix] Fix termination
- [Fix] Fix upstreams exclusion logic
- [Fix] Fix utf8 validation for symbols options and empty strings
- [Fix] Oops, fix maps reload
- [Fix] Rbl: Allow utf8 lookups for IDN domains
- [Fix] Sigh, another try to fix brain-damaged openssl
- [Project] Add fast utf8 validation library
- [Project] Use own utf8 validation instead of glib
- [Rework] Another phase of finish actions rework
- [Rework] Further cmake system rework
- [Rework] Further isolation of the controller’s functions
- [Rework] Make cmake structure more modular
- [Rework] Move cmake modules to a dedicated path
- [Rework] Replace controller functions by any scanner worker if needed
- [Rework] Rework final scripts logic
- [Rework] Rewrite rspamd_str_make_utf_valid function
We have released Rspamd 2.1 today.
This release contains some new features and many bug fixes. There are no incompatible changes introduced with this release to our best knowledge.
This release includes the following features and important fixes.
Add uuencode support
Despite of being very old standard, UUenconde parts are still quite common in the email traffic observed. From this version, Rspamd supports uuencoded parts (both normal and base64 version).
Critical issue found in dkim verification
There was a critical regression in 2.0 DKIM verification code caused verification failures for some of the valid DKIM signatures. More details are in the GitHub issue.
Improved neural training
There are number of fixes and improvements in the Neural module. Now all training samples are balanced using random sampling allowing a smoother training vectors selection. Some number of bugs has been fixed, as well as scores are no longer recommended to select training vectors - Rspamd automatically applies heuristic to select messages for learning. Also some issues around infinities and learning threads count have been addressed.
Maps fixes
There are number of fixes and improvements around maps handling logic. This include fixes for both HTTP and file maps, as well as better timeout and caches handling.
Event loop fixes
Rspamd could previously select an inefficient backend on some OSes, notably, on BSD and OSx. This version should fix it. The ability to configure the events backend manually via the configuration file has also been added to Rspamd.
Full list of the meaningful changes
- [Conf] Update neural.conf
- [CritFix] Fix dkim verification for multiple headers listed
- [Feature] Add support of uudecode
- [Feature] Allow to explicitly set events backend
- [Feature] Implement configurable limits for SPF lookups
- [Feature] Lua_scanners: Use lua magic for inclusion/exclusion logic
- [Feature] Multimap: Do not check files in office archives
- [Feature] Neural: Add sampling when storing training vectors
- [Feature] SPF: Allow to disable AAAA checks in configuration
- [Feature] Spf: Add limits configuration support
- [Feature] Store etag in cached HTTP maps + better logging
- [Feature] Support segwit BTC addresses, fix LTC verification
- [Feature] Support uuencoding
- [Fix] Add configurable number of threads for OpenBLAS
- [Fix] Add workaround for ragel 7 in hyperscan related maps code
- [Fix] Another fix for numeric urls parsing
- [Fix] Correct EMA time calculations
- [Fix] Do not treat archives as text
- [Fix] Do not use strdup on data extracted from lua
- [Fix] Fix a failure calcuating URL reputation.
- [Fix] Fix crash due to constructors init order
- [Fix] Fix crash on parts with no cd
- [Fix] Fix empty prefilters that require mime structures
- [Fix] Fix event loop creation
- [Fix] Fix issues sending DMARC reports.
- [Fix] Fix misprint
- [Fix] Fix saving of the file maps
- [Fix] Fix size calculations when converting from utf16
- [Fix] Fix support of disable_monitoring in rbl
- [Fix] Fix use-after-free
- [Fix] Fix zip files check to relax requirements
- [Fix] Important hiredis fixes
- [Fix] Lot’s of fixes in maps check logic
- [Fix] Lua_tcp: Deal with temporary fails on write
- [Fix] Lua_tcp: Make write errors fatal and rework error handlers
- [Fix] Meta: Filter some more values
- [Fix] Neural: Add protection agains infinities
- [Fix] Oops, fix math.huge invocation
- [Fix] Plug memory leak
- [Fix] Sigh, another email to string fix
- [Fix] Try to fix another ownership race in ssl connection
- [Fix] Uuencode: Fix parsing of corrupted uuencode
- [Fix] lua_scanners - razor rename need_check function
- [Rework] Require CMake 3.9 to work, remove manual lto crap
We have released Rspamd 2.0 today. This version encompasses new versioning schema that will be used in future Rspamd releases: specifically, instead of the <major>.<minor>.<patch>
, Rspamd will use just <major>.<minor>
versioning schema. This happens because the <major>
number has never been increased for many years and <minor>
number has been used as a real version indicator.
Upgrade notes
There are various important features in this release. The vast majority of those should not have any visible impact on the existing systems. However, you are recommended to read the Upgrade Notes.
The main potential source of incompatibilities is the deprecation of the surbl
and emails
modules that have been replaced with rbl module. The default Bayes backend is also changed to Redis now while the Sqlite backend is now marked as deprecated and is not recommended for use. ip_score
, neural
and ratelimit
modules users are strongly advised to read the upgrading notes!
Packages support
In this version of Rspamd, we have stopped support of the following OS variants:
- Ubuntu trusty (reached EOL)
- Centos 6 (almost reached EOL)
We have added Centos 8
packages instead.
As usually, Rspamd project strongly recommends NOT TO USE the packages that are provided by 3rd parties, including your own Linux distribution. These packages are usually out-of-date, built incorrectly and accordingly, they are not supported by Rspamd project. Please use the official packages only. FreeBSD ports are considered official packages as they are supported by Rspamd project directly (well, strictly speaking by myself).
Here is the list of the most important changes in this release.
Libevent has been replaced with bundled libev
After many years of using the libevent library Rspamd switched to libev library. The main reason was performance and control: there were many libevent versions shipped with various supported platforms and many of those lacked important features, such as inotify
support for Linux. Switching to libev allowed us to simplify the code, improve signals handling, improve timeouts handling and deal with file maps changes instantly due to inotify
.
Torch has been dropped from Rspamd
Lua torch has served as a powerful engine for ML and neural networks in Rspamd for quite a long time. However, it is no longer maintained or updated and its support has proven to be a nightmare. There were also important bugs that could not be fixed due to the code complexity. From version 2.0, Rspamd adopted kann library that is much more friendly for embedding and provides very convenient interfaces that are now exported via Lua.
RBL module improvements and replacement of the SURBL and Emails module
RBL module has replaced both emails
and surbl
modules unifying all Runtime Black Lists checks in a single place. It has added new RBL types, such as selectors, and the simplified extending of the existing rules to more powerful ones.
Emails
rules with maps instead of DNS RBLs are NO LONGER SUPPORTED. Please use multimap
with selectors instead.
New Lua Magic library
For file types detection, Rspamd now uses an own implementation of detection library based on Lua and Hyperscan
(where possible) instead of libmagic. There are 4 major concerns for that:
- Libmagic is a generic library that can easily detect pdp11
a.out
format but can fail in docx
detection surprisingly often
- We need performance and libmagic is not about performance at all
- We want to add new detection heuristics instead of relying on 3rd party strict rules
- Libmagic API is not very suitable for us
With the new library, Rspamd can detect part types in just a couple of microseconds and find the vast majority of the interesting things, such as executables, archives, images, html and so on and so forth.
Neural module rework
Neural networks module has been almost totally rewritten to support KANN library and symbols profiles. Now, Rspamd will not reset neural network on each individual symbol change - it will try to use the most appropriate network instead. Many issues with neural learning dead locks have also been addressed.
Clickhouse module improvement
- Added LowCardinality fields to improve storage requirements
- Fixed retention code
- Significantly optimized memory usage by using userdata instead of interned strings
Multimap module
Various new features, including maps combinations
and dependent maps
(/doc/modules/multimap.html#dependent-maps).
Maillist module
Improved mailing lists detection and reworked detection heuristic.
Heartbeats support
Rspamd workers now send heartbeat events to the main process. In turn, the main process can now kill hanged workers if a reasonable amount of heartbeats have been lost. This feature is not enabled by default for now.
Lua scanners improvements
There are lots of additions in lua scanners. Many of those have been contributed by Carsten Rosenberg from HeinleinSupport.
New antivirus engines support:
New external scanners:
- Razor support (by @c-rosenberg)
- Better oletools support (by @c-rosenberg)
- P0F support as a separate module (by Denis Paavilainen - @denpamusic)
Mime modifications
From version 2.0, Rspamd allows modifying messages via Lua API methods. This support required massive rework of the internal structures and have been tested by Migadu. These functions are implemented in the lua_mime
library.
Users settings improvements
Rspamd now treats settings differently if they are set via Settings-Id
: there are certain performance benefits and better logging in all modules. It is also possible to bind rules explicitly to certain settings id allowing to separate mail processing flows more efficiently.
Upstreams library improvements
- Added lazy resolving of the upstreams
- Added
SRV
upstreams to resolve SRV records for both names, ports, and priorities (e.g. by using Hashicorp Konsul DNS)
- Use random strings for monitoring sanity
- Improved base64 decoding for typical outputs
- Langdet: Limit number of stop words to be checked
- Added sanity limit for
task:get_urls()
method to avoid Lua memory blow
- Maps: Allow caching for complex maps
- Settings fast path have been added
- Lua core: use lightuserdata to index classes to avoid strings interning
- HTTP(s) keep-alive support has been added
Rules and other improvements
- Added
BITCOIN_ADDR
symbol to allow custom composite rules creation to block scam campaigns
- Support Litecoin addresses
- Implement syntax highlighting for Lua
- Allow execution of async events when hs compiles regexps
- Bayes expiry: eliminate
default
expiration mode (use lazy mode all the time)
- Eliminate lua_squeeze as it has shown no improvements
- Drop url tags
- Eliminate virtual scan time as it is useless
- Use replxx instead of linenoise
- Added SSL/STARTTLS support to lua_tcp library
- Implemented SSL graceful closing
This version of Rspamd contains a number of other minor and major improvements and fixes compared to the 1.9 branch. This includes some bugs that were fixed in 2.0 and that could cause certain issues, hangs or crashes with certain emails.
Preface
Rspamd has always been oriented on the performance but it was always quite hard to measure how fast it was as normally it runs just fast enough.
However, I was recently offered to process Abusix Intelligence feeds using Rspamd. These feeds are used to improve Rspamd fuzzy storage quality, to feed URLs and Emails to the DNS black lists provided by Rspamd project and used in SURBL module.
Problem statement
The amount of data that required to be processing is huge - it is about 100 millions of messages per day.
Here is an example to calculate connections count when processing these messages using Rspamd:
$ rspamc stat | \
grep 'Connections count' | \
cut -d' ' -f3 ; \
sleep 10 ; \
rspamc stat | \
grep 'Connections count' | \
cut -d' ' -f3
23548811
23564384
It means that over 10 seconds Rspamd has to process around 15 thousands of messages which gives us a rate of 1500 messages per second.
Rspamd setup
The settings used to process this amount of messages are pretty similar to those that are provided by default.
There is also some significant amount of home-crafted scripts written in Lua to provide the following functionality:
- Provides deduplication to save time on processing of duplicates
- Performs conditional checks for url and emails blacklisting:
- checks if an url is in whitelists (around 5 whitelists stored in Redis are used)
- check if an url is already listed
- check if it matches any suspicious patterns
- Checks if a message should be learned on fuzzy storage (various conditions)
- Stores messages in IMAP folders providing sorting, partitioning and sampling logic
- Doing various HTTP and Redis queries for servicing purposes
Hardware
Now some words about hardware being used.
Previously we have set the same setup on a small instance of AX-60 and it was loaded for around 80%. We have decided to move to a more powerful server to have some margin for processing more emails and doing some experiments.
Hence, we now have an AX-160 AMD server rented in Hetzner. This is quite a powerful machine and the current load pictures look like this one:
top - 14:36:26 up 23:26, 1 user, load average: 15.76, 13.22, 12.46
Tasks: 511 total, 3 running, 508 sleeping, 0 stopped, 0 zombie
%Cpu(s): 14.1 us, 4.6 sy, 0.0 ni, 78.9 id, 0.0 wa, 0.0 hi, 2.4 si, 0.0 st
MiB Mem : 128802.5 total, 56985.7 free, 27897.5 used, 43919.3 buff/cache
MiB Swap: 4092.0 total, 3925.5 free, 166.5 used. 100018.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14085 unbound 20 0 2058412 1.6g 6852 S 131.2 1.3 1478:04 unbound
66509 _rspamd 20 0 806976 733336 23592 S 68.8 0.6 169:52.21 rspamd
66498 _rspamd 20 0 780144 699540 23852 S 62.5 0.5 156:19.14 rspamd
66502 _rspamd 20 0 816152 744352 23796 S 56.2 0.6 164:26.39 rspamd
66468 _rspamd 20 0 773532 697084 23736 S 50.0 0.5 117:36.32 rspamd
66491 _rspamd 20 0 806652 722340 23728 S 50.0 0.5 148:04.54 rspamd
66476 _rspamd 20 0 767300 705996 23596 S 43.8 0.5 129:04.30 rspamd
66481 _rspamd 20 0 797944 730528 23896 S 43.8 0.6 139:34.35 rspamd
66443 _rspamd 20 0 727632 657104 23372 S 37.5 0.5 88:39.26 rspamd
66451 _rspamd 20 0 742192 665196 23632 S 37.5 0.5 94:49.75 rspamd
66456 _rspamd 20 0 790908 725784 23488 S 37.5 0.6 101:32.06 rspamd
66463 _rspamd 20 0 771540 696064 23692 S 37.5 0.5 108:08.65 rspamd
66487 _rspamd 20 0 780220 713024 23428 S 37.5 0.5 144:51.79 rspamd
66447 _rspamd 20 0 762440 689592 23736 S 31.2 0.5 90:23.93 rspamd
66455 _rspamd 20 0 763520 696108 23580 S 31.2 0.5 97:57.57 rspamd
66464 _rspamd 20 0 764644 688724 23696 S 31.2 0.5 111:32.74 rspamd
66469 _rspamd 20 0 756952 678704 23612 S 31.2 0.5 127:55.02 rspamd
127011 rbldns 20 0 358824 307700 2244 R 31.2 0.2 10:26.14 rbldnsd
10767 redis 20 0 9912104 7.7g 2532 S 25.0 6.1 236:29.63 redis-server
66438 _rspamd 20 0 746772 680624 23424 R 25.0 0.5 82:18.04 rspamd
66433 _rspamd 20 0 751180 687244 23472 S 18.8 0.5 80:12.21 rspamd
66437 _rspamd 20 0 737200 669428 23796 S 18.8 0.5 81:37.81 rspamd
10671 stunnel4 20 0 24.0g 77252 3644 S 12.5 0.1 269:06.53 stunnel4
26994 root 20 0 11900 3984 3072 R 12.5 0.0 0:00.02 top
66442 _rspamd 20 0 808808 707020 23608 S 12.5 0.5 85:11.64 rspamd
17821 clickho+ 20 0 21.8g 3.9g 18964 S 6.2 3.1 116:13.04 clickhouse-serv
Rspamd is also being fed via proxy worker that runs on another host and performs initial data collection and emitting messages via the Internet providing transport encryption using HTTPCrypt. However, its CPU usage is quite negligible - it uses only a single CPU core by around 40% in average.
Results analytics
As you can see, this machine runs also Clickhouse, Redis, own recursive resolver (Unbound), and it still has ~80% idle processing these 1500 messages per second.
If we look at the performance counters by attaching to some of the worker processes, we would see the following picture:
# timeout 30 perf record -p 66481
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.171 MB perf.data (29833 samples) ]
# perf report
# Overhead Command Shared Object Symbol
# ........ ....... ....................... .......................................................................
#
5.23% rspamd rspamd [.] lj_alloc_free
3.35% rspamd rspamd [.] lj_str_new
3.03% rspamd librspamd-server.so [.] gc_sweep
2.20% rspamd rspamd [.] lj_alloc_malloc
1.94% rspamd rspamd [.] gc_sweep
1.50% rspamd libc-2.28.so [.] __strlen_avx2
1.32% rspamd rspamd [.] release_unused_segments
1.24% rspamd rspamd [.] lj_BC_TGETS
1.17% rspamd libjemalloc.so.2 [.] free
1.04% rspamd librspamd-server.so [.] lj_BC_JLOOP
1.03% rspamd librspamd-server.so [.] propagatemark
1.01% rspamd libpthread-2.28.so [.] __pthread_mutex_lock
1.01% rspamd libglib-2.0.so.0.5800.3 [.] g_hash_table_lookup
0.94% rspamd libjemalloc.so.2 [.] malloc
0.77% rspamd rspamd [.] lj_func_newL_gc
0.76% rspamd rspamd [.] propagatemark
0.75% rspamd rspamd [.] lj_tab_get
0.69% rspamd libpthread-2.28.so [.] __pthread_mutex_unlock_usercnt
0.65% rspamd librspamd-server.so [.] t1ha2_atonce
0.61% rspamd librspamd-server.so [.] newtab
0.60% rspamd libicui18n.so.63.1 [.] icu_63::NGramParser::search
0.59% rspamd [kernel.kallsyms] [k] copy_user_generic_string
0.58% rspamd librspamd-server.so [.] match
0.58% rspamd librspamd-server.so [.] lj_tab_new1
0.56% rspamd librspamd-server.so [.] rspamd_task_find_symbol_result
0.52% rspamd [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0.48% rspamd librspamd-server.so [.] rspamd_vprintf_common
0.46% rspamd librspamd-server.so [.] lj_str_new
0.42% rspamd rspamd [.] index2adr
0.42% rspamd rspamd [.] lj_BC_CALL
0.42% rspamd libc-2.28.so [.] __strcmp_avx2
0.42% rspamd libc-2.28.so [.] __memmove_avx_unaligned_erms
The top consumers are Lua allocator and garbage collector. Since we are using Rspamd experimental package on Debian Buster, then it is built with bundled LuaJIT 2.1 beta3 and Jemalloc allocator, however, it seems that there is some issue with this allocator in Debian Buster, so I had to load it manually via the following command:
# systemctl edit rspamd.service
[Service]
Environment="LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2"
Followed by restarting of Rspamd.
It is interesting that this Rspamd setup accepts all connections encrypted using HTTPCrypt but chacha_blocks_avx2
takes less than 0.16% of CPU according to perf
report.
This particular instance of Rspamd is slightly tuned to use more memory to save some CPU cycles:
# local.d/options.inc
lua_gc_step = 100;
lua_gc_pause = 400;
full_gc_iters = 10000;
These options tell Rspamd to preserve Lua objects in memory for longer time, at the same time in this mode, we can also observe GC stats on workers that performs full GC loop each 10k messages being scanned:
$ tail -f /var/log/rspamd/rspamd.log | fgrep 'full gc'
perform full gc cycle; memory stats: 58.66MiB allocated, 62.01MiB active, 6.08MiB metadata, 84.71MiB resident, 90.64MiB mapped; lua memory: 107377 kb -> 38015 kb; 308.0022420035675 ms for gc iter
As you can see, full GC iter takes quite a significant time. However, it still keeps Lua memory usage sane. The ideas behind this GC mode have been taken from the generational GC idea in LuaJIT Wiki.
Resulting graphs
Here are some UI captures taken from a previous machine:
As you can observe, there was some HAM portion increase over the recent days, however, it was caused by adding new sampling logic and duplicates filtering to save CPU resources (these messages are marked as ham and excepted from scan).
There is also a Clickhouse based dashboard that’s created using Redash:
Since we have Clickhouse on board, we can do various analytics. Here is an average scan time for messages:
:) select avg(ScanTimeVirtual) from rspamd where Date=today();
SELECT avg(ScanTimeVirtual)
FROM rspamd
WHERE Date = today()
┌─avg(ScanTimeVirtual)─┐
│ 95.62269064131341 │
└──────────────────────┘
… and average size of messages:
:) select median(ScanTimeVirtual) from rspamd where Date=today();
:) select avg(Size) from rspamd where Date=today();
SELECT avg(Size)
FROM rspamd
WHERE Date = today()
┌──────────avg(Size)─┐
│ 1778.31 │
└────────────────────┘
Conclusions
So with this load rate (1500 messages per second) and with the average size of messages around 2Kb, Rspamd processes each message in around 100ms in average. I hope these numbers could give one some impression about Rspamd performance in general.
I would like to give the main kudos to Abusix who are constantly supporting Rspamd project and have generously provided their amazing spam feeds to improve Rspamd quality!