This module is intended to check messages for specific fuzzy patterns stored in fuzzy storage workers. At the same time, this module is responsible for learning fuzzy storage with message patterns.
Rspamd uses shingles algorithm to perform fuzzy match of messages. This algorithm
is probabilistic and uses words chains to detect some common patterns and filter
thus spam or ham messages. Shingles algorithm is described in the following
research paper. We use 3-gramms for this
algorithm and a set of hash functions: siphash, mumhash and others. Currently,
rspamd uses 32 hashes for shingles.
Attachements and images are not currently matched against fuzzy hashes, however they are checked by means blake2 digests using strict match.
Fuzzy check module has several global options and allows to specify multiple match storages. Global options include:
symbol: default symbol to insert (if no flags matches)min_length: minimum length of text parts in words to perform fuzzy check (default - check all text parts)min_bytes: minimum length of attachements and images in bytes to check them in fuzzy storagewhitelist: IP list to skip all fuzzy checkstimeout: timeout for reply waitingFuzzy rules are defined as a set of rule definitions. Each rule must have servers
list to check or learn and a set of flags and optional parameters. Here is an example of
rule’s settings:
# local.d/fuzzy_check.conf
rule "FUZZY_CUSTOM" {
# List of servers, can be an array or multi-value item
servers = "127.0.0.1:11335";
# List of additional mime types to be checked in this fuzzy ("*" for any)
mime_types = ["application/*", "*/octet-stream"];
# Maximum global score for all maps
max_score = 20.0;
# Ignore flags that are not listed in maps for this rule
skip_unknown = yes;
# If this value is false, then allow learning for this fuzzy rule
read_only = no;
# Fast hash type
algorithm = "mumhash";
}
Each rule can have several maps defined by a flag value. For example, a single
fuzzy storage can contain both good and bad hashes that should have different symbols
and thus different weights. Maps are defined inside fuzzy rules as following:
# local.d/fuzzy_check.conf
rule "FUZZY_LOCAL" {
...
fuzzy_map = {
FUZZY_DENIED {
# Maximum weight for this list
max_score = 20.0;
# Flag value
flag = 1
}
FUZZY_PROB {
max_score = 10.0;
flag = 2
}
FUZZY_WHITE {
max_score = 2.0;
flag = 3
}
}
The meaning of max_score can be rather unclear. First of all, all hashes in
fuzzy storage have their own weights. For example, if we have a hash A and 100 users
marked it as spam hash, then it will have weight of 100 * single_vote_weight.
Therefore, if a single_vote_weight is 1 then the final weight will be 100 indeed.
max_score means the weight that is required for the rule to add symbol with the maximum
score 1.0 (that will be of course multiplied by metric’s weight). In our example,
if the weight of hash is 100 and max_score will be 99, then the rule will be
added with the weight of 1. If max_score is 200, then the rule will be added with the
weight likely 0.2 (the real function is hyperbolic tangent). In the following configuration:
metric {
name = "default";
...
symbol {
name = "FUZZY_DENIED";
weght = "10.0";
}
...
}
fuzzy_check {
rule {
...
fuzzy_map = {
FUZZY_DENIED {
# Maximum weight for this list
max_score = 20.0;
# Flag value
flag = 1
}
...
}
}
If a hash has value 10, then a symbol FUZZY_DENIED with weight of 2.0 will be added.
If a hash has value 100500, then FUZZY_DENIED will have weight 10.0.
Module fuzzy_check also allows to learn messages. You can use rspamc command or
connect to the controller worker using HTTP protocol. For learning you must check
the following settings:
rspamc or HTTP (check bind_socket)enable_password or allow_ip settings)fuzzy_check module configured to the servers specifiedfuzzy_key and fuzzy_shingles_key to operate with this storagefuzzy_check module should have fuzzy_map configured to the flags used by serverfuzzy_check rule must have read_only option being turned off - read_only = falsefuzzy_storage worker should allow updates from the controller’s host (allow_update option)UDP protocolIf all these conditions are met then you can learn messages with rspamc:
rspamc -w <weight> -f <flag> fuzzy_add ...
or delete hashes:
rspamc -f <flag> fuzzy_del ...
On learning, rspamd sends commands to all servers inside specific rule. On check, rspamd selects a server in round-robin matter.
TODO: add delhash description