rspamd_htmlThis module provides different methods to access HTML tags. To get HTML context
from an HTML part you could use method part:get_html()
rspamd_config.R_EMPTY_IMAGE = function(task)
local tp = task:get_text_parts() -- get text parts in a message
for _,p in ipairs(tp) do -- iterate over text parts array using `ipairs`
if p:is_html() then -- if the current part is html part
local hc = p:get_html() -- we get HTML context
local len = p:get_length() -- and part's length
if len < 50 then -- if we have a part that has less than 50 bytes of text
local images = hc:get_images() -- then we check for HTML images
if images then -- if there are images
for _,i in ipairs(images) do -- then iterate over images in the part
if i['height'] + i['width'] >= 400 then -- if we have a large image
return true -- add symbol
end
end
end
end
end
end
endMethods:
The module rspamd_html defines the following methods.
html:has_tag(name)Checks if a specified tag name is presented in a part
Parameters:
name {string}: name of tag to checkReturns:
{boolean}: true if the tag exists in HTML treeBack to module description.
html:check_property(name)Checks if the HTML has a specific property. Here is the list of available properties:
no_html - no html tag presentedbad_element - part has some broken elementsxml - part is xhtmlunknown_element - part has some unknown elementsduplicate_element - part has some duplicate elements that should be unique (namely, title tag)unbalanced - part has unbalanced tagsParameters:
name {string}: name of propertyReturns:
{boolean}: true if the part has the specified propertyBack to module description.
html:get_images()Returns a table of images found in html. Each image is, in turn, a table with the following fields:
src - link to the sourceheight - height in pixelswidth - width in pixelsembedded - true if an image is embedded in a messageParameters:
No parameters
Returns:
{table}: table of images in html partBack to module description.
html:get_blocks()Returns a table of html blocks. Each block provides the following data:
tag - corresponding tag
color - a triplet (r g b) for font color
bgcolor - a triplet (r g b) for background color
style - rspamd{text} with the full style description
Parameters:
No parameters
Returns:
{table}: table of blocks in html partBack to module description.
html:foreach_tag(tagname, callback)Processes HTML tree calling the specified callback for each tag of the specified type.
Callback is called with the following attributes:
tag: html tag structurecontent_length: length of content within a tagCallback function should return true to stop processing and false to continue
Parameters:
No parameters
Returns:
Back to module description.
html_tag:get_type()Returns string representation of HTML type for a tag
Parameters:
No parameters
Returns:
{string}: type of tagBack to module description.
html_tag:get_extra()Returns extra data associated with the tag
Parameters:
No parameters
Returns:
{url|image|nil}: extra data associated with the tagBack to module description.
html_tag:get_parent()Returns parent node for a specified tag
Parameters:
No parameters
Returns:
{html_tag}: parent object for a specified tagBack to module description.
html_tag:get_flags()Returns flags a specified tag:
closed: tag is properly closedclosing: tag is a closing tagbroken: tag is somehow brokenunbalanced: tag is unbalancedxml: tag is xml tagParameters:
No parameters
Returns:
{table}: table of flagsBack to module description.
html_tag:get_content()Returns content of tag (approximate for some cases)
Parameters:
No parameters
Returns:
{rspamd_text}: rspamd text with tag’s contentBack to module description.
html_tag:get_content_length()Returns length of a tag’s content
Parameters:
No parameters
Returns:
{number}: size of content enclosed within a tagBack to module description.
Back to top.