Module rspamd_textpart

This module provides different methods to manipulate text parts data. Text parts could be obtained from the rspamd_task by using of method task:get_text_parts()

Example:

rspamd_config.R_EMPTY_IMAGE = function (task)
	parts = task:get_text_parts()
	if parts then
		for _,part in ipairs(parts) do
			if part:is_empty() then
				texts = task:get_texts()
				if texts then
					return true
				end
				return false
			end
		end
	end
	return false
end

Brief content:

Methods:

Methods

The module rspamd_textpart defines the following methods.

Method text_part:is_utf()

Return TRUE if part is a valid utf text

Parameters:

No parameters

Returns:

  • {boolean}: true if part is valid UTF8 part

Back to module description.

Method text_part:has_8bit_raw()

Return TRUE if a part has raw 8bit characters

Parameters:

No parameters

Returns:

  • {boolean}: true if a part has raw 8bit characters

Back to module description.

Method text_part:has_8bit()

Return TRUE if a part has raw 8bit characters

Parameters:

No parameters

Returns:

  • {boolean}: true if a part has encoded 8bit characters

Back to module description.

Method text_part:get_content([type])

Get the text of the part (html tags stripped). Optional type defines type of content to get:

  • content (default): utf8 content with HTML tags stripped and newlines preserved
  • content_oneline: utf8 content with HTML tags and newlines stripped
  • raw: raw content, not mime decoded nor utf8 converted
  • raw_parsed: raw content, mime decoded, not utf8 converted
  • raw_utf: raw content, mime decoded, utf8 converted (but with HTML tags and newlines)

Parameters:

No parameters

Returns:

  • {text}: UTF8 encoded content of the part (zero-copy if not converted to a lua string)

Back to module description.

Method text_part:get_raw_content()

Get the original text of the part

Parameters:

No parameters

Returns:

  • {text}: UTF8 encoded content of the part (zero-copy if not converted to a lua string)

Back to module description.

Method text_part:get_content_oneline()

Get the text of the part (html tags and newlines stripped)

Parameters:

No parameters

Returns:

  • {text}: UTF8 encoded content of the part (zero-copy if not converted to a lua string)

Back to module description.

Method text_part:get_length()

Get length of the text of the part

Parameters:

No parameters

Returns:

  • {integer}: length of part in bytes

Back to module description.

Method mime_part:get_raw_length()

Get length of the raw content of the part (e.g. HTML with tags unstripped)

Parameters:

No parameters

Returns:

  • {integer}: length of part in bytes

Back to module description.

Method mime_part:get_urls_length()

Get length of the urls within the part

Parameters:

No parameters

Returns:

  • {integer}: length of urls in bytes

Back to module description.

Method mime_part:get_lines_count()

Get lines number in the part

Parameters:

No parameters

Returns:

  • {integer}: number of lines in the part

Back to module description.

Method mime_part:get_stats()

Returns a table with the following data:

  • lines: number of lines
  • spaces: number of spaces
  • double_spaces: double spaces
  • empty_lines: number of empty lines
  • non_ascii_characters: number of non ascii characters
  • ascii_characters: number of ascii characters

Parameters:

No parameters

Returns:

  • {table}: table of stats

Back to module description.

Method mime_part:get_words_count()

Get words number in the part

Parameters:

No parameters

Returns:

  • {integer}: number of words in the part

Back to module description.

Method mime_part:get_words([how])

Get words in the part. Optional how argument defines type of words returned:

  • stem: stemmed words (default)
  • norm: normalised words (utf normalised + lowercased)
  • raw: raw words in utf (if possible)
  • full: list of tables, each table has the following fields:
  • [1] - stemmed word
  • [2] - normalised word
  • [3] - raw word
  • [4] - flags (table of strings)

Parameters:

No parameters

Returns:

  • {table/strings}: words in the part

Back to module description.

Method text_part:is_empty()

Returns true if the specified part is empty

Parameters:

No parameters

Returns:

  • {bool}: whether a part is empty

Back to module description.

Method text_part:is_html()

Returns true if the specified part has HTML content

Parameters:

No parameters

Returns:

  • {bool}: whether a part is HTML part

Back to module description.

Method text_part:get_html()

Returns html content of the specified part

Parameters:

No parameters

Returns:

  • {html}: html content

Back to module description.

Method text_part:get_language()

Returns the code of the most used unicode script in the text part. Does not work with raw parts

Parameters:

No parameters

Returns:

  • {string}: short abbreviation (such as ru) for the script’s language

Back to module description.

Method text_part:get_charset()

Returns part real charset

Parameters:

No parameters

Returns:

  • {string}: charset of the part

Back to module description.

Method text_part:get_languages()

Returns array of tables of all languages detected for a part:

  • ‘code’: language code (short string)
  • ‘prob’: logarithm of probability

Parameters:

No parameters

Returns:

  • {array|tables}: all languages detected for the part

Back to module description.

Method text_part:get_fuzzy_hashes(mempool)

Returns direct hash + array of shingles being calculated as following:

  • [1] - fuzzy digest as a string
  • [2..33] - fuzzy hashes as the following tables:
  • [1] - 64 bit integer represented as a string
  • [2..4] - strings used to generate this hash

Parameters:

  • mempool {rspamd_mempool}: - memory pool (usually task pool)

Returns:

  • {string,array|tables}: fuzzy hashes calculated

Back to module description.

Method text_part:get_mimepart()

Returns the mime part object corresponding to this text part

Parameters:

No parameters

Returns:

  • {mimepart}: mimepart object

Back to module description.

Back to top.

Module rspamd_mimepart

This module provides access to mime parts found in a message

Example:

rspamd_config.MISSING_CONTENT_TYPE = function(task)
	local parts = task:get_parts()
	if parts and #parts > 1 then
		-- We have more than one part
		for _,p in ipairs(parts) do
			local ct = p:get_header('Content-Type')
			-- And some parts have no Content-Type header
			if not ct then
				return true
			end
		end
	end
	return false
end

Brief content:

Methods:

Methods

The module rspamd_mimepart defines the following methods.

Method mime_part:get_header(name[, case_sensitive])

Get decoded value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter.

Parameters:

  • name {string}: name of header to get
  • case_sensitive {boolean}: case sensitiveness flag to search for a header

Returns:

  • {string}: decoded value of a header

Back to module description.

Method mime_part:get_header_raw(name[, case_sensitive])

Get raw value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter.

Parameters:

  • name {string}: name of header to get
  • case_sensitive {boolean}: case sensitiveness flag to search for a header

Returns:

  • {string}: raw value of a header

Back to module description.

Method mime_part:get_header_full(name[, case_sensitive])

Get raw value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter. This method returns more information about the header as a list of tables with the following structure:

  • name - name of a header
  • value - raw value of a header
  • decoded - decoded value of a header
  • tab_separated - true if a header and a value are separated by tab character
  • empty_separator - true if there are no separator between a header and a value

Parameters:

  • name {string}: name of header to get
  • case_sensitive {boolean}: case sensitiveness flag to search for a header

Returns:

  • {list of tables}: all values of a header as specified above

Example:

function check_header_delimiter_tab(task, header_name)
	for _,rh in ipairs(task:get_header_full(header_name)) do
		if rh['tab_separated'] then return true end
	end
	return false
end

Back to module description.

Method mimepart:get_header_count(name[, case_sensitive])

Lightweight version if you need just a header’s count

  • By default headers are searched in caseless matter.

Parameters:

  • name {string}: name of header to get
  • case_sensitive {boolean}: case sensitiveness flag to search for a header

Returns:

  • {number}: number of header’s occurrencies or 0 if not found

Back to module description.

Method mime_part:get_content()

Get the parsed content of part

Parameters:

No parameters

Returns:

  • {text}: opaque text object (zero-copy if not casted to lua string)

Back to module description.

Method mime_part:get_raw_content()

Get the raw content of part

Parameters:

No parameters

Returns:

  • {text}: opaque text object (zero-copy if not casted to lua string)

Back to module description.

Method mime_part:get_length()

Get length of the content of the part

Parameters:

No parameters

Returns:

  • {integer}: length of part in bytes

Back to module description.

Method mime_part:get_type()

Extract content-type string of the mime part

Parameters:

No parameters

Returns:

  • {string,string}: content type in form ‘type’,’subtype’

Back to module description.

Method mime_part:get_type_full()

Extract content-type string of the mime part with all attributes

Parameters:

No parameters

Returns:

  • {string,string,table}: content type in form ‘type’,’subtype’, {attrs}

Back to module description.

Method mime_part:get_cte()

Extract content-transfer-encoding for a part

Parameters:

No parameters

Returns:

  • {string}: content transfer encoding (e.g. base64 or 7bit)

Back to module description.

Method mime_part:get_filename()

Extract filename associated with mime part if it is an attachment

Parameters:

No parameters

Returns:

  • {string}: filename or nil if no file is associated with this part

Back to module description.

Method mime_part:is_image()

Returns true if mime part is an image

Parameters:

No parameters

Returns:

  • {bool}: true if a part is an image

Back to module description.

Method mime_part:get_image()

Returns rspamd_image structure associated with this part. This structure has the following methods:

  • get_width - return width of an image in pixels
  • get_height - return height of an image in pixels
  • get_type - return string representation of image’s type (e.g. ‘jpeg’)
  • get_filename - return string with image’s file name
  • get_size - return size in bytes

Parameters:

No parameters

Returns:

  • {rspamd_image}: image structure or nil if a part is not an image

Back to module description.

Method mime_part:is_archive()

Returns true if mime part is an archive

Parameters:

No parameters

Returns:

  • {bool}: true if a part is an archive

Back to module description.

Method mime_part:is_attachment()

Returns true if mime part looks like an attachment

Parameters:

No parameters

Returns:

  • {bool}: true if a part looks like an attachment

Back to module description.

Method mime_part:get_archive()

Returns rspamd_archive structure associated with this part. This structure has the following methods:

  • get_files - return list of strings with filenames inside archive
  • get_files_full - return list of tables with all information about files
  • is_encrypted - return true if an archive is encrypted
  • get_type - return string representation of image’s type (e.g. ‘zip’)
  • get_filename - return string with archive’s file name
  • get_size - return size in bytes

Parameters:

No parameters

Returns:

  • {rspamd_archive}: archive structure or nil if a part is not an archive

Back to module description.

Method mime_part:is_multipart()

Returns true if mime part is a multipart part

Parameters:

No parameters

Returns:

  • {bool}: true if a part is is a multipart part

Back to module description.

Method mime_part:get_children()

Returns rspamd_mimepart table of part’s childer. Returns nil if mime part is not multipart or a message part.

Parameters:

No parameters

Returns:

  • {rspamd_mimepart}: table of children

Back to module description.

Method mime_part:is_text()

Returns true if mime part is a text part

Parameters:

No parameters

Returns:

  • {bool}: true if a part is a text part

Back to module description.

Method mime_part:get_text()

Returns rspamd_textpart structure associated with this part.

Parameters:

No parameters

Returns:

  • {rspamd_textpart}: textpart structure or nil if a part is not an text

Back to module description.

Method mime_part:get_digest()

Returns the unique digest for this mime part

Parameters:

No parameters

Returns:

  • {string}: 128 characters hex string with digest of the part

Back to module description.

Method mime_part:get_id()

Returns the order of the part in parts list

Parameters:

No parameters

Returns:

  • {number}: index of the part (starting from 1 as it is Lua API)

Back to module description.

Method mime_part:is_broken()

Returns true if mime part has incorrectly specified content type

Parameters:

No parameters

Returns:

  • {bool}: true if a part has bad content type

Back to module description.

Method mime_part:headers_foreach(callback, [params])

This method calls callback for each header that satisfies some condition. By default, all headers are iterated unless callback returns true. Nil or false means continue of iterations. Params could be as following:

  • full: header value is full table of all attributes task:get_header_full for details
  • regexp: return headers that satisfies the specified regexp

Parameters:

  • callback {function}: function from header name and header value
  • params {table}: optional parameters

Returns:

No return

Back to module description.

Method mime_part:get_stats()

Returns a table with the following data:

  • -
  • lines: number of lines
  • spaces: number of spaces
  • double_spaces: double spaces
  • empty_lines: number of empty lines
  • non_ascii_characters: number of non ascii characters
  • ascii_characters: number of ascii characters

Parameters:

No parameters

Returns:

  • {table}: table of stats

Back to module description.

Back to top.