rspamd_url
This module provides routines to handle URL’s and extract URL’s from the text.
Objects of this class are returned, for example, by task:get_urls()
or task:get_emails()
.
You can also create rspamd_url
from any text.
Functions:
Function | Description |
---|---|
url.create([mempool,] str, [{flags_table}]) |
No description |
url.init(tld_file) |
Initialize url library if not initialized yet by Rspamd. |
Methods:
Method | Description |
---|---|
url:get_length() |
Get length of the url. |
url:get_host() |
Get domain part of the url. |
url:get_port() |
Get port of the url. |
url:get_user() |
Get user part of the url (e.g. |
url:get_path() |
Get path of the url. |
url:get_query() |
Get query of the url. |
url:get_fragment() |
Get fragment of the url. |
url:get_text() |
Get full content of the url. |
url:tostring() |
Get full content of the url or user@domain in case of email. |
url:to_http() |
Get URL suitable for HTTP request (e.g. |
url:get_raw() |
Get full content of the url as it was parsed (e.g. |
url:is_phished() |
Check whether URL is treated as phished. |
url:is_redirected() |
Check whether URL was redirected. |
url:is_obscured() |
Check whether URL is treated as obscured or obfuscated (e.g. |
url:is_html_displayed() |
Check whether URL is just displayed in HTML (e.g. |
url:is_subject() |
Check whether URL is found in subject. |
url:get_phished() |
Get another URL that pretends to be this URL (e.g. |
url:set_redirected(url, pool) |
Set url as redirected to another url. |
url:get_tld() |
Get effective second level domain part (eSLD) of the url host. |
url:get_protocol() |
Get protocol name. |
url:get_count() |
Return number of occurrences for this particular URL. |
url:get_visible() |
Get visible part of the url with html tags stripped. |
url:to_table() |
Return url as a table with the following fields. |
url:get_flags() |
Return flags for a specified URL as map ‘flag’->true for all flags set,. |
The module rspamd_url
defines the following functions.
url.create([mempool,] str, [{flags_table}])
Parameters:
memory {rspamd_mempool}
: pool for URL, e.g. task:get_mempool()
text {string}
: that contains URL (can also contain other stuff)Returns:
{url}
: new url object that exists as long as the corresponding mempool existsBack to module description.
url.init(tld_file)
Initialize url library if not initialized yet by Rspamd
Parameters:
tld_file {string}
: path to effective_tld_names.dat file (public suffix list)Returns:
Back to module description.
The module rspamd_url
defines the following methods.
url:get_length()
Get length of the url
Parameters:
No parameters
Returns:
{number}
: length of url in bytesBack to module description.
url:get_host()
Get domain part of the url
Parameters:
No parameters
Returns:
{string}
: domain part of URLBack to module description.
url:get_port()
Get port of the url
Parameters:
No parameters
Returns:
{number}
: url portBack to module description.
url:get_user()
Get user part of the url (e.g. username in email)
Parameters:
No parameters
Returns:
{string}
: user part of URLBack to module description.
url:get_path()
Get path of the url
Parameters:
No parameters
Returns:
{string}
: path part of URLBack to module description.
url:get_query()
Get query of the url
Parameters:
No parameters
Returns:
{string}
: query part of URLBack to module description.
url:get_fragment()
Get fragment of the url
Parameters:
No parameters
Returns:
{string}
: fragment part of URLBack to module description.
url:get_text()
Get full content of the url
Parameters:
No parameters
Returns:
{string}
: url stringBack to module description.
url:tostring()
Get full content of the url or user@domain in case of email
Parameters:
No parameters
Returns:
{string}
: url as a stringBack to module description.
url:to_http()
Get URL suitable for HTTP request (e.g. by trimming fragment and user parts)
Parameters:
No parameters
Returns:
{string}
: url as a stringBack to module description.
url:get_raw()
Get full content of the url as it was parsed (e.g. with urldecode)
Parameters:
No parameters
Returns:
{string}
: url stringBack to module description.
url:is_phished()
Check whether URL is treated as phished
Parameters:
No parameters
Returns:
{boolean}
: true
if URL is phishedBack to module description.
url:is_redirected()
Check whether URL was redirected
Parameters:
No parameters
Returns:
{boolean}
: true
if URL is redirectedBack to module description.
url:is_obscured()
Check whether URL is treated as obscured or obfuscated (e.g. numbers in IP address or other hacks)
Parameters:
No parameters
Returns:
{boolean}
: true
if URL is obscuredBack to module description.
url:is_html_displayed()
Check whether URL is just displayed in HTML (e.g. NOT a real href)
Parameters:
No parameters
Returns:
{boolean}
: true
if URL is displayed onlyBack to module description.
url:is_subject()
Check whether URL is found in subject
Parameters:
No parameters
Returns:
{boolean}
: true
if URL is found in subjectBack to module description.
url:get_phished()
Get another URL that pretends to be this URL (e.g. used in phishing)
Parameters:
No parameters
Returns:
{url}
: phished URLBack to module description.
url:set_redirected(url, pool)
Set url as redirected to another url
Parameters:
url {string|url}
: new url that is redirecting an old onepool {pool}
: memory pool to allocate memory if neededReturns:
{url}
: parsed redirected url (if needed)Back to module description.
url:get_tld()
Get effective second level domain part (eSLD) of the url host
Parameters:
No parameters
Returns:
{string}
: effective second level domain part (eSLD) of the url hostBack to module description.
url:get_protocol()
Get protocol name
Parameters:
No parameters
Returns:
{string}
: protocol as a stringBack to module description.
url:get_count()
Return number of occurrences for this particular URL
Parameters:
No parameters
Returns:
{number}
: number of occurrencesBack to module description.
url:get_visible()
Get visible part of the url with html tags stripped
Parameters:
No parameters
Returns:
{string}
: url stringBack to module description.
url:to_table()
Return url as a table with the following fields:
url
: full contenthost
: hostname partuser
: user partpath
: path parttld
: top level domainprotocol
: url protocolParameters:
No parameters
Returns:
{table}
: URL as a tableBack to module description.
url:get_flags()
Return flags for a specified URL as map ‘flag’->true for all flags set, possible flags are:
phished
: URL is likely phishednumeric
: URL is numeric (e.g. IP address)obscured
: URL was obscuredredirected
: URL comes from redirectorhtml_displayed
: URL is used just for displaying purposestext
: URL comes from the textsubject
: URL comes from the subjecthost_encoded
: URL host part is encodedschema_encoded
: URL schema part is encodedquery_encoded
: URL query part is encodedmissing_slashes
: URL has some slashes missingidn
: URL has international charactershas_port
: URL has porthas_user
: URL has user partschemaless
: URL has no schemaunnormalised
: URL has some unicode unnormalitieszw_spaces
: URL has some zero width spacesurl_displayed
: URL has some other url-like string in visible partimage
: URL is from src attribute of img HTML tagParameters:
No parameters
Returns:
{table}
: URL flagsBack to module description.
Back to top.