Module rspamd_regexp

Rspamd regexp is an utility module that handles rspamd perl compatible regular expressions

Example:

local rspamd_regexp = require "rspamd_regexp"

local re = rspamd_regexp.create_cached('/^\\s*some_string\\s*$/i')
re:match('some_string')
local re = rspamd_regexp.create_cached('/\\s+/i')
re:split('word word   word') -- returns ['word', 'word', 'word']

Brief content:

Functions:

Function Description
rspamd_regexp.create(pattern[, flags]) Creates new rspamd_regexp.
rspamd_regexp.import_glob(glob_pattern[, flags]) Creates new rspamd_regexp from glob.
rspamd_regexp.import_plain(plain_string[, flags]) Creates new rspamd_regexp from plain string (escaping specials).
rspamd_regexp.get_cached(pattern) This function gets cached and pre-compiled regexp created by either create.
rspamd_regexp.create_cached(pattern[, flags]) This function is similar to create but it tries to search for regexp in the.

Methods:

Method Description
re:get_pattern() Get a pattern for specified regexp object.
re:set_limit(lim) Set maximum size of text length to be matched with this regexp (if lim is.
re:set_max_hits(lim) Set maximum number of hits returned by a regexp.
re:get_max_hits(lim) Get maximum number of hits returned by a regexp.
re:search(line[, raw[, capture]]) Search line in regular expression object.
re:match(line[, raw_match]) Matches line against the regular expression and return true if line matches.
re:matchn(line, max_matches, [, raw_match]) Matches line against the regular expression and return number of matches if line matches.
re:split(line) Split line using the specified regular expression.
re:destroy() Destroy regexp from caches if needed (the pointer is removed by garbage collector).

Functions

The module rspamd_regexp defines the following functions.

Function rspamd_regexp.create(pattern[, flags])

Creates new rspamd_regexp

Parameters:

  • pattern {string}: pattern to build regexp. If this pattern is enclosed in // then it is possible to specify flags after it
  • flags {string}: optional flags to create regular expression

Returns:

  • {regexp}: regexp argument that is not automatically destroyed

Example:

local regexp = require "rspamd_regexp"

local re = regexp.create('/^test.*[0-9]\\s*$/i')

Back to module description.

Function rspamd_regexp.import_glob(glob_pattern[, flags])

Creates new rspamd_regexp from glob

Parameters:

  • pattern {string}: pattern to build regexp.
  • flags {string}: optional flags to create regular expression

Returns:

  • {regexp}: regexp argument that is not automatically destroyed

Example:

local regexp = require "rspamd_regexp"

local re = regexp.import_glob('ab*', 'i')

Back to module description.

Function rspamd_regexp.import_plain(plain_string[, flags])

Creates new rspamd_regexp from plain string (escaping specials)

Parameters:

  • pattern {string}: pattern to build regexp.
  • flags {string}: optional flags to create regular expression

Returns:

  • {regexp}: regexp argument that is not automatically destroyed

Example:

local regexp = require "rspamd_regexp"

local re = regexp.import_plain('exact_string_with*', 'i')

Back to module description.

Function rspamd_regexp.get_cached(pattern)

This function gets cached and pre-compiled regexp created by either create or create_cached methods. If no cached regexp is found then nil is returned.

Parameters:

  • pattern {string}: regexp pattern

Returns:

  • {regexp}: cached regexp structure or nil

Back to module description.

Function rspamd_regexp.create_cached(pattern[, flags])

This function is similar to create but it tries to search for regexp in the cache first.

Parameters:

  • pattern {string}: pattern to build regexp. If this pattern is enclosed in // then it is possible to specify flags after it
  • flags {string}: optional flags to create regular expression

Returns:

  • {regexp}: regexp argument that is not automatically destroyed

Example:

local regexp = require "rspamd_regexp"

local re = regexp.create_cached('/^test.*[0-9]\\s*$/i')
...
-- This doesn't create new regexp object
local other_re = regexp.create_cached('/^test.*[0-9]\\s*$/i')

Back to module description.

Methods

The module rspamd_regexp defines the following methods.

Method re:get_pattern()

Get a pattern for specified regexp object

Parameters:

No parameters

Returns:

  • {string}: pattern line

Back to module description.

Method re:set_limit(lim)

Set maximum size of text length to be matched with this regexp (if lim is less or equal to zero then all texts are checked)

Parameters:

  • lim {number}: limit in bytes

Returns:

No return

Back to module description.

Method re:set_max_hits(lim)

Set maximum number of hits returned by a regexp

Parameters:

  • lim {number}: limit in hits count

Returns:

  • {number}: old number of max hits

Back to module description.

Method re:get_max_hits(lim)

Get maximum number of hits returned by a regexp

Parameters:

No parameters

Returns:

  • {number}: number of max hits

Back to module description.

Method re:search(line[, raw[, capture]])

Search line in regular expression object. If line matches then this function returns the table of captured strings. Otherwise, nil is returned. If raw is specified, then input is treated as raw data not encoded in utf-8. If capture is true, then this function saves all captures to the table of values, so the first element is the whole matched string and the subsequent elements are ordered captures defined within pattern.

Parameters:

  • line {string}: match the specified line against regexp object
  • match {bool}: raw regexp instead of utf8 one
  • capture {bool}: perform subpatterns capturing

Returns:

  • {table or nil}: table of strings or tables (if capture is true) or nil if not matched

Example:

local re = regexp.create_cached('/^\s*([0-9]+)\s*$/')
-- returns nil
local m1 = re:search('blah')
local m2 = re:search('   190   ')
-- prints '   190    '
print(m2[1])

local m3 = re:search('   100500 ')
-- prints '   100500 '
print(m3[1][1])
-- prints '100500' capture
print(m3[1][2])

Back to module description.

Method re:match(line[, raw_match])

Matches line against the regular expression and return true if line matches (partially or completely)

Parameters:

  • line {string}: match the specified line against regexp object
  • match {bool}: raw regexp instead of utf8 one

Returns:

  • {bool}: true if line matches

Back to module description.

Method re:matchn(line, max_matches, [, raw_match])

Matches line against the regular expression and return number of matches if line matches (partially or completely). This process stop when max_matches is reached. If max_matches is zero, then only a single match is counted which is equal to re:match If max_matches is negative, then all matches are considered.

Parameters:

  • line {string}: match the specified line against regexp object
  • max_matches {number}: maximum number of matches
  • match {bool}: raw regexp instead of utf8 one

Returns:

  • {number}: number of matches found in the line argument

Back to module description.

Method re:split(line)

Split line using the specified regular expression. Breaks the string on the pattern, and returns an array of the tokens. If the pattern contains capturing parentheses, then the text for each of the substrings will also be returned. If the pattern does not match anywhere in the string, then the whole string is returned as the first token.

Parameters:

  • line {string/text}: line to split

Returns:

  • {table}: table of split line portions (if text was the input, then text is used for return parts)

Back to module description.

Method re:destroy()

Destroy regexp from caches if needed (the pointer is removed by garbage collector)

Parameters:

No parameters

Returns:

No return

Back to module description.

Back to top.