Skip to main content
Version: 3.3.0

Enriching Data

After converting the raw data into the desired JSON format (see Working with Data), data is then enriched: reshaped and annotated so it can be consumed and stored more easily.

Constant value fields can be added with add and calculated fields can be conditionally added using script. enrich does general CSV lookup and inputs as actions can do arbitrary retrieval.

Fields with Agent-specific Values

Data needs to be tagged with its location at source. There are standard context variables that are different for each agent, and more can be added using to the pipe contexts:

- add:
output-fields:
- site: '{{name}}'
- pipe: '{{pipe}}'

Generated Fields

All data needs a timestamp - this is the processing time at the agent:

- time:
output-field: '@timestamp'

A sequence number can be added to each event (although this is not persistent across restarts):

- script:
let:
- seq: 'count()'

A better method would be to use uuid() which would give a unique id.

Calculated Fields

The script action can be used to calculate values for fields. For example, using script let one can anonymize data using hash functions.

script:
let:
- name_hash: md5(name)
- address_hash: md5(address)
remove:
- name
- address

This can be used to clean up data that will be stored and further processed outside your private networks.

Hashes are one-way functions, but encrypting sensitive fields with encrypt() can be done. See the available scripting functions.

Conditional Fields

If condition is defined, then script will only add fields if the condition is true.

When adding literal strings, it is easier to use set rather than let. Since add and script never overwrite existing fields by default, this snippet has the effect of adding the field quality to the event, displaying "good" if the field a value > 1, otherwise displaying "bad".

- script:
condition: a > 1
set:
- quality: good
- script:
set:
- quality: bad

The cond function provides a more elegant solution:

- script:
let:
- quality: cond(a > 1,"good","bad")

Table Lookup

enrich is an efficient and general way to enrich data with tables read from a CSV file. In the simplest case, if the value of an event matches a column then we take another column's value on the same row as a new field.

Note that the lookup files need to be attached to the pipe using a files: section, please see the example at the end of this section.

id,name,nick,office
23,Alice,bbye,head
12,Bob,wkr,kzn
13,John,nomo,wcape

So if iden in the event is the same as id in the table, then we will set nice_name to the value of name.

# input: {"iden":12}
# output: {"iden":12,"nice_name":"Bob"}
enrich
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
add:
event-field: nice_name
lookup-field: name

You have to specify a type for the match:

  • str text values
  • num numbers
  • ip IPv4 addresses
  • cidr IPv4 address ranges, like '192.168.1.0/16')
  • num-list separated by commas, like '10,20,30'
  • str-list separated by commas, like 'office,home'
  • num-range ranges, like '10-23'

There can be multiple matches that must be satisfied.

# input: {"iden":12,"office":"kzn"}
# output: {"iden":12,"office":"kzn","nice_name":"Bob"}
enrich
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
- type: str
event-field: office
lookup-field: office
add:
event-field: nice_name
lookup-field: name

Adding multiple values with enrich can be tedious, since the match must be repeated:

# input: {"iden":12}
# output: {"iden":12,"nice_name":"Bob","nickname":"wkr"}
enrich
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
add:
event-field: nice_name
lookup-field: name
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
add:
event-field: nickname
lookup-field: nick

If the fields to be added are the same as the lookup names, then there is a convenient shortcut:

# input: {"iden":12}
# output: {"iden":12,"name":"Bob","nick":"wkr"}
enrich
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
add:
event-fields:
- name: <unknown>
- nick: ''

event-fields gives the field name (which must match the same column in the CSV file) and the value after the colon is the default value.

If the lookup CSV file is modified, it will be reloaded. This allows other pipes to modify the enrichment globally.

A complete example illustrating the inclusion of a file called fruits.csv in an optional subdirectory called lookups.

# output: {"that":"b","this":"a","fruit":"Apple"}
name: simple_echo_with_enrich
files:
- lookups/fruits.csv
input:
echo:
json: true
event: |
{ "this": "a", "that": "b" }
actions:
- enrich:
lookup-file: fruits.csv
add:
event-field: fruit
lookup-field: output_column
match:
- type: str
event-field: this
lookup-field: input_column
output:
print: STDOUT

The lookups/fruits.csv file:

input_column,output_column
a,"Apple"
b,"Banana"

Enriching with Input

Using inputs as actions is a powerful technique. Say we have an HTTP endpoint that is given a name and returns the city where the person lives as {"city":"NAME"}. Then events containing name can get a city field as below:

name: http-enrich
input:
exec:
command: echo '{"name":"Joe"}'
raw: true
actions:
- input:
http-poll:
address: http://127.0.0.1:3030
query:
- name: ${name}
raw: true
output:
write: console
----
{"city":"Johnnesburg","name":"Joe"re

Much functionality on Linux of course is provided through the command-line. Fortunately, we can execute commands as actions. The basic command host can do straight or reverse DNS lookup:

name: host-enrich
input:
exec:
command: echo '{"ip":"98.137.246.7"}'
raw: true
actions:
- exec:
command: host ${ip}
result:
stdout-field: host
- raw:
extract:
input-field: host
pattern: '(\S+)\.$'
output:
write: console
----
{"ip":"98.137.246.7","host":"media-router-fp1.prod1.media.vip.gq1.yahoo.com"}

All that was needed was to extract the hostname from the end of the output afterward.

(If you wanted the actual ASN, then the script function ip2asn is more appropriate - it uses the Team Cymru service. In this case, it gives "YAHOO-GQ1, US")

The Redis input can be used to look up a field in a hash. This works particularly well if they are simple lookups that are often written.