Module rspamd_parsers

This module contains Lua-C interfaces to Rspamd parsers of different kind.

Brief content:

Functions:

Function Description
parsers.tokenize_text(input[, exceptions]) Create tokens from a text using optional exceptions list.
parsers.parse_html(input) Parses HTML and returns the according text.
parsers.parse_mail_address(str, [pool]) Parses email address and returns a table of tables in the following format.
parsers.parse_content_type(ct_string, mempool) Parses content-type string to a table.
parsers.parse_smtp_date(str[, local_tz]) Converts an SMTP date string to unix timestamp.

Functions

The module rspamd_parsers defines the following functions.

Function parsers.tokenize_text(input[, exceptions])

Create tokens from a text using optional exceptions list

Parameters:

  • input {text/string}: input data
  • exceptions, {table}: a table of pairs containing <start_pos,length> of exceptions in the input

Returns:

  • {table/strings}: list of strings representing words in the text

Back to module description.

Function parsers.parse_html(input)

Parses HTML and returns the according text

Parameters:

  • in {string|text}: input HTML

Returns:

  • {rspamd_text}: processed text with no HTML tags

Back to module description.

Function parsers.parse_mail_address(str, [pool])

Parses email address and returns a table of tables in the following format:

  • raw - the original value without any processing
  • name - name of internet address in UTF8, e.g. for Vsevolod Stakhov <blah@foo.com> it returns Vsevolod Stakhov
  • addr - address part of the address
  • user - user part (if present) of the address, e.g. blah
  • domain - domain part (if present), e.g. foo.com
  • flags - table with following keys set to true if given condition fulfilled:
  • [valid] - valid SMTP address in conformity with https://tools.ietf.org/html/rfc5321#section-4.1.
  • [ip] - domain is IPv4/IPv6 address
  • [braced] - angled <blah@foo.com> address
  • [quoted] - quoted user part
  • [empty] - empty address
  • [backslash] - user part contains backslash
  • [8bit] - contains 8bit characters

Parameters:

  • str {string}: input string
  • pool {rspamd_mempool}: memory pool to use

Returns:

  • {table/tables}: parsed list of mail addresses

Back to module description.

Function parsers.parse_content_type(ct_string, mempool)

Parses content-type string to a table:

  • type
  • subtype
  • charset
  • boundary
  • other attributes

Parameters:

  • ct_string {string}: content type as string
  • mempool {rspamd_mempool}: needed to store temporary data (e.g. task pool)

Returns:

  • table or nil if cannot parse content type

Back to module description.

Function parsers.parse_smtp_date(str[, local_tz])

Converts an SMTP date string to unix timestamp

Parameters:

  • str {string}: input string
  • local_tz {boolean}: convert to local tz if true

Returns:

  • {number}: time as unix timestamp (converted to float)

Back to module description.

Back to top.