Skip to main content
Version: Self Hosted Lite

Extract Functions

Overview of Extract Functions

Extract functions allow extracting specific information from the log line.

extractword

extractchar

extractnum

extracttime

extractuntil

extractpattern

extractjson

extractregex

extractregexgroup

extractkeyvalue

extractjsonkeys

extracttillend

inextractkeyvalue

extractword

This function, extractword(N), extracts N number words and stores the result as a value in a key. A word is a continuous set of characters and two words are separated by a space. After extraction, the pointer position moves to the end of the is always after the Nth word.

Syntax

var=extractword(N), where N is an integer which denotes the number of words to extract and var is the variable name.

Example

In this example, the pointer or imaginary cursor is denoted by |.

Consider the Log line Error404 received due to Authentication token failure

Initial pointer position

|Error404 received due to Authentication token failure

Using test=extractword(2), the variable test will now hold Error404 received.

Updated pointer position

Error404 received | due to Authentication token failure

extractchar

This function, extractchar(N), extracts N number of characters, including spaces and stores the result in a variable. Pointer position moves to the end of N characters.

Syntax

var=extractchar(N), where N is an integer which denotes the number of characters to extract and var is the variable name.

Example

In this example, the pointer or imaginary cursor is denoted by |.

Consider the Log line Running full sweep for node-116

Initial pointer position

|Error404 received due to Authentication token failure

Using new-word=extractchar(7), seven characters will be extracted and stored in new-word

Updated pointer position

Running | full sweep for node-116

extractnum

This function extracts a number at the pointer position, and stores it in a specified format, integer or floating point. Default is integer. After extraction, the pointer position moves to the end of the number.

Syntax

var=extractnum().format(type), where var is the variable name and type can be either "int" or "float"

Example

In this example, the pointer or imaginary cursor is denoted by |.

Consider the Log line 96.5 is the percentage usage of RAM

Initial pointer position

|96.5 is the percentage usage of RAM

Using RAM-util=extractnum() or RAM-util=extractnum().format(“int”) extracts the value 96 and stores it in the variable RAM-util.

Using RAM-util=extractnum().format(“float”)extracts the value 96.5 and stores it in the variable RAM-util

Updated pointer position

Note that the pointer position skips the whole number in both the use of integer and float

96.5| is the percentage usage of RAM

extracttime

This function extracts timestamps from the log line in any format and converts to UTC format (yyyy-mm-ddThh.mm.ss.msmsms) if required.

UTC Time format
YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)

where

YYYY = four-digit year
MM = two-digit month (01=January, etc.)
DD = two-digit day of month (01 through 31)
hh = two digits of hour (00 through 23) (am/pm NOT allowed)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm)

An offset can also be added to the time data. Offsets are useful when incoming logs have timestamps corresponding to a different timezone other than UTC. Post extraction, the pointer position moves to the end of the time data.

Syntax

extracttime(offset), where offset specifies the offset in hours and minutes in the format +hhmm. For example, +0530 means the log time is ahead of UTC time by 5 hours and 30 minutes.

If log time is the same as UTC time, offset will be +0000.

Usage: var=extracttime(offset), where ‘var’ denotes the variable to which the extracted value is assigned.

Example

For example,

Consider the log line Log index copied to Master-node-1 at 2022-10-20T12:45:35.865960748Z

Initial pointer position

Log index copied to Master-node-1 at |2022-10-20T12:45:35.865960748Z

Using res=extracttime(“+0000”), extracts the time 2022-01-05T17:38:46.606Z without any offset and stores it in the variable res

Using res=extracttime(“+0530”), extracts the time and offsets it by 5 hours and 30 minutes i.e. 2022-01-05T17:38:46.606Z and stores it in the variable res

Updated pointer position

In both the above examples, the pointer position moves to the end of the timestamp

Log index copied to Master-node-1 at 2022-10-20T12:45:35.865960748Z|

extractuntil

This function extracts all characters until

  • A specified string

    (or)

  • ANY number or

    (or)

  • ANY special character

is found. It extracts only until the first occurrence of the given input.

Syntax

extractuntil(wordToExtract, includeWord)

where, WordToExtract can be a string until which the extraction is to be done.

(or)

$numeric$ In this case, everything until the first occurrence of any number is extracted.

(or)

$special$ In this case, everything until the first occurrence of any special character is extracted.

includeWord is an integer and denotes whether to include the WordToExtract

  • Use 0 to include the WordToExtract
  • Use 1 to exclude the WordToExtact

Pointer position

The pointer position is always after the WordToExtract irrespective of whether includeWord is 0 or 1

Example

In this example, the pointer or imaginary cursor is denoted by |.

Consider the Log line 3675 errors in backend $audit-log

Initial pointer position

|3675 errors in backend $audit-log

Using test-word=extractuntil(“errors”,1), stores all characters in the log until the first occurrence of errors i.e. 3675 errorsin the variable test-word and pointer moves to the end of the string error.

Using test-word=extractuntil(“errors”,0), stores all characters in the log until the first occurrence of errors but excludes the string errorsi.e. 3675in the variable test-word and pointer moves to the end of the string error.

In both the above examples the pointer position moves to end of the string errors.

3675 errors| in backend $audit-log

extracttillend

This function is used to extract all characters till the end of the log line and store the extracted string as a value of a key as specified by the user.

Syntax

var=extracttillend() where var is the variable that will store the extracted string

Example

In this example, the pointer or imaginary cursor is denoted by |.

Consider the Log line Error404 received due to Authentication token failure

Initial pointer position

|Error404 received due to Authentication token failure

Using final-word=extracttillend() stores all characters until the end of the log line in the variable final-word

Updated pointer position

Error404 received due to Authentication token failure|

extractpattern

This function is used to extract predefined patterns like ip addresses, dns, urls etc. The function only identifies and extracts the first matching string and the pointer position updates to the end of the matching string.

Syntax

extractpattern(pattern) where pattern specifies the patter to be identified and extracted.

Supported patterns

"IPV4" - This extracts IPV4 address in the format x.x.x.x

"IPV6" - This extracts IPV6 address in the format y:y:y:y:y:y:y:y or y:y:y:y:y:y:x.x.x.x

"mac" - This extracts mac address in the format x : x : x : x :x : x

"url"- This extracts url in the format http://abc.xyz.abc or https://abc.xyz.abc

"hostname" - This extracts hostnames from urls

Example

Ipv4 address

Consider the log line 127.0.0.1 is an example IPv4 address

In this example, the pointer or imaginary cursor is denoted by |.

Initial pointer position

|127.0.0.1 is an example IPv4 address

Using res=extractPattern(“IPv4”), the ip address 127.0.0.1 is extracted and stored in the variable res

Updated pointer position

127.0.0.1| is an example IPv4 address


IPv6 address

Consider the log line 2001:0db8:85a3:0000:0000:8a2e:0370:7334 is an example IPv6 address

In this example, the pointer or imaginary cursor is denoted by |.

Initial pointer position

|2001:0db8:85a3:0000:0000:8a2e:0370:7334 is an example IPv6 address

Using res=extractPattern(“IPv6”), the ip address 001:0db8:85a3:0000:0000:8a2e:0370:7334 is extracted and stored in the variable res

Updated pointer position

2001:0db8:85a3:0000:0000:8a2e:0370:7334| is an example IPv6 address


MAC address

Consider the log line 00:00:5e:00:53:af is an example MAC address

In this example, the pointer or imaginary cursor is denoted by |.

Initial pointer position

|00:00:5e:00:53:af is an example MAC address

Using res=extractPattern(“MAC”), the MAC address 00:00:5e:00:53:af is extracted and stored in the variable res

Updated pointer position

00:00:5e:00:53:af |is an example MAC address


URL

Consider the log line https://www.google.com is an example URL address

In this example, the pointer or imaginary cursor is denoted by |.

Initial pointer position

|https://www.google.com is an example URL address

Using res=extractPattern(“URL”), the url https://www.google.com is extracted and stored in the variable res

Updated pointer position

https://www.google.com| is an example URL address


Hostname

Consider the log line ec2-35-88-174-201.us-west-2.compute.amazonaws.com is an example hostname

In this example, the pointer or imaginary cursor is denoted by |.

Initial pointer position

|ec2-35-88-174-201.us-west-2.compute.amazonaws.com is an example hostname

Using res=extractPattern(“hostname”), the hostname ec2-35-88-174-201.us-west-2.compute.amazonaws.com is extracted and stored in the variable res

Updated pointer position

ec2-35-88-174-201.us-west-2.compute.amazonaws.com| is an example hostname.

extractjson

This function is used to extract JSON string from the log line. The extracted JSON string will be stored in a variable as specified by the user. The pointer position updates to the end of the extracted JSON.

Limitation

This function doesn’t work if there are 2 or more consecutive jsons.

Syntax

var=extractJson() where var denotes the variable to which the extracted JSON string is stored.

Example

Log line: {“key1”: “value1”, “key2”: “value2”} is an example json.

Rule: res=extractjson()

Updated pointer position (|): {“key1”: “value1”, “key2”: “value2”} |is an example json.

Output: “res”: “””{“key1”: “value1”, “key2”: “value2”}”””

extractregex

This function extracts the string from the log line which matches the regex pattern as specified. The pointer position updates to the end of the extracted match. The limitation here is that it does not work for more than one match.

To learn more about regex patterns, go to https://regex101.com/ . Select flavor as Java 8.

Syntax

var=extractregex(toMatch), where toMatch is the regex pattern of type string that is used to match a particular string in the log line. The regex pattern is to be enclosed within the double quotes and forward slashes “/…/” without leaving any spaces. var denotes the variable to which the extracted value is stored

Example

For example,

Consider the log line: [a, b, c] is an example list

Using res=extractregex(“/^\[.*\]”/), “[a, b, c]” is extracted and stored in the variable res

extractregexgroup

This function extracts all the matched groups from the log line for a regex pattern as specified. The pointer position updates to the end of the last matched group.

Syntax

extractregexgroup(toMatch, extractToKey1.format1, extractToKey2.format2, …)

where,

  • toMatch is the regex pattern of type string which is used to match all the groups from the log line. The regex pattern is to be enclosed within the double quotes and forward slashes “/…/” without leaving any spaces.
  • extractToKey is the name of the key where the extracted matched group is to be stored.
  • format specifies the conversion format for the extracted value (string, int, float).
Example

Consider the log line root : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/rm –rf jmeter.log

Using extractregexgroup(“/TTY=\w+ ; PATH=([^\s]+) ; USER=(\w+) ; COMMAND=(.*)$/”, path.string, username.string, cmd.string) extracts

/home/centos to key path

root to key username

/bin/rm -rf jmeter.log to key cmd

extractkeyvalue

This function is used to extract the key-value pairs from the log line. It scans the line from the current position till the end for all the possible key-value pairs. Every key-value pair should end with a delimiter. After extraction, the pointer position updates to the end of the last delimiter.

Syntax

extractkeyvalue(sep, del, numOfPairs)

where,

  • sep is a string and denotes the separator separating key from value.

  • del is a string and denotes the delimiter which separates the key-value pairs from each other.

  • numOfPairs is an integer and denotes the number of pairs to be extracted.

Example

Consider the log line key1:val1; key2:val2; key3:val3; are example key-value pairs

Using var = extractkeyvalue(“:”, “;”, 2) extracts key1 : val1, key2 : val2 and stores them in the variable var

extractjsonkeys

This function is used to extract the values for the keys in the JSON. The limitation here is that the JSON cannot have duplicate keys. The pointer position is moved to the position after the JSON. The values extracted are stored in the format as specified by the user.

Syntax

extractjsonkeys(key_name1.format1, key_name2.format2, …) where key_name is the key to be extracted and format is the format for the extracted value.

Acceptable format types are string, int and float

Example

Consider the log line abc {“key1”: “val1”, “key2”: “val2”} is an example log line

Using skip(“abc”) extractjsonkeys(key1.string, key2.string) extracts and stores the following data

key1 : val1,

key2 : val2

inextractkeyvalue

This function supports position-independent parsing. It extracts all the key-value pairs separated by a separator and delimited by a delimiter from the whole log line and stores them in a map. The limitation here is that every key-value pair must end with a delimiter.

Syntax

inextractkeyvalue(sep, del)

where,

  • sep is a string and denotes the separator separating key from value.
  • del is a string and denotes the delimiter which separates the key-value pairs from each other.
Example

Consider the log line key1:val1; key2:val2; are example key-value pairs key3:val3;

Using extractkeyvalue(“:”, “;”)

extracts

“key1” : “val1”,

“key2” : “val2”,

“key3” : “val3”