This tutorial presents a step-by-step guide on setting up statistics and fuzzy storage replication on FreeBSD. The configuration procedures for other operating systems are quite similar.
The tutorial focuses on a centralized model where Bayesian classifier and fuzzy storage learning occur on a single host and are then distributed among Rspamd installations in remote locations. For the sake of simplicity, the tutorial covers replication to a single replica
database for each of the masters
.
To achieve this, we need to replicate the bayes and fuzzy storage backend data to the remote host. Since we don’t want to mirror the entire Redis cache, we should use dedicated Redis instances. It would be wise to separate the bayes and fuzzy storage as well.
We will create three Redis instances on both the master
and replica
sides: bayes
, fuzzy
, and redis
for the remaining cache.
instance | Redis socket |
---|---|
redis |
localhost:6379 |
bayes |
localhost:6378 |
fuzzy |
localhost:6377 |
To begin, install the databases/redis
package by executing the following command:
# pkg install redis
Next, create separate working directories for the instances:
# cd /var/db/redis && mkdir bayes fuzzy && chown redis bayes fuzzy
To enable redis
and its specific instances, add the following lines to the /etc/rc.conf
file:
redis_enable="YES"
redis_profiles="redis bayes fuzzy"
To enable log rotation for Redis, create a newsyslog configuration file named /usr/local/etc/newsyslog.conf.d/redis.newsyslog.conf
:
# logfilename [owner:group] mode count size when flags [/pid_file] [sig_num]
/var/log/redis/redis.log redis:redis 644 5 100 * J
/var/log/redis/bayes.log redis:redis 644 5 100 * J
/var/log/redis/fuzzy.log redis:redis 644 5 100 * J
Generate the default configuration on both the master
and replica
hosts, which will be common for all instances:
# cp /usr/local/etc/redis.conf.sample /usr/local/etc/redis.conf
Due to security concerns, it is not advisable to expose Redis to external interfaces. Instead, configure Redis to listen on loopback interfaces and utilize stunnel to establish TLS tunnels between the replica
and master
hosts. However, please note that this approach also has its own security vulnerabilities. Therefore, do not set up replication if you cannot trust the users of the replica host.
Configure the listening sockets and memory limit (optional) as follows:
# diff -u1 /usr/local/etc/redis.conf.sample /usr/local/etc/redis.conf
--- /usr/local/etc/redis.conf.sample 2016-11-03 06:30:49.000000000 +0300
+++ /usr/local/etc/redis.conf 2016-11-27 13:10:43.671584000 +0300
@@ -60,3 +60,3 @@
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-bind 127.0.0.1
+bind 127.0.0.1 ::1
@@ -537,2 +537,3 @@
# maxmemory <bytes>
+maxmemory 200M
Configure the redis
instance on both the master
and replica
hosts in a way that maintains compatibility with a single instance configuration. This ensures that if you already have a single instance database, it will continue to function properly.
/usr/local/etc/redis-redis.conf
:
include /usr/local/etc/redis.conf
/usr/local/etc/redis-bayes.conf
:
include /usr/local/etc/redis.conf
port 6378
pidfile /var/run/redis/bayes.pid
logfile /var/log/redis/bayes.log
dbfilename bayes.rdb
dir /var/db/redis/bayes/
maxmemory 600M
/usr/local/etc/redis-fuzzy.conf
:
include /usr/local/etc/redis.conf
port 6377
pidfile /var/run/redis/fuzzy.pid
logfile /var/log/redis/fuzzy.log
dbfilename fuzzy.rdb
dir /var/db/redis/fuzzy/
If needed, the maxmemory
is adjusted for specific instances according to expected database size.
# service redis start
Please refer to the Setting up encrypted tunnel using stunnel guide.
/usr/local/etc/redis-bayes.conf
:
include /usr/local/etc/redis.conf
port 6378
pidfile /var/run/redis/bayes.pid
logfile /var/log/redis/bayes.log
dbfilename bayes.rdb
dir /var/db/redis/bayes/
replicaof localhost 6478
maxmemory 600M
/usr/local/etc/redis-fuzzy.conf
:
include /usr/local/etc/redis.conf
port 6377
pidfile /var/run/redis/fuzzy.pid
logfile /var/log/redis/fuzzy.log
dbfilename fuzzy.rdb
dir /var/db/redis/fuzzy/
replicaof localhost 6477
As replicas
do not connect to masters
directly, stunnel's
sockets are specified in the replicaof
directives.
# service redis start
Check replica instances logs. If resynchronization with the masters was successful, you are done.
On the master
side configure Rspamd to use distinct Redis instances respectively:
local.d/redis.conf
:
servers = "localhost";
local.d/classifier-bayes.conf
:
backend = "redis";
servers = "localhost:6378";
override.d/worker-fuzzy.inc
:
backend = "redis";
servers = "localhost:6377";
On the replica
side Rspamd should use local redis
instance for both reading and writing as it is not replicated.
local.d/redis.conf
:
servers = "localhost";
Since local bayes
and fuzzy
Redis instances are replicas, Rspamd should use them for reading, but write to the replication master
.
local.d/classifier-bayes.conf
:
backend = "redis";
read_servers = "localhost:6378";
write_servers = "localhost:6478";
override.d/worker-fuzzy.inc
:
backend = "redis";
read_servers = "localhost:6377";
write_servers = "localhost:6477";