Enriching Data
After converting the raw data into the desired JSON format (see Working with Data), data is then enriched: reshaped and annotated so it can be consumed and stored more easily.
Constant value fields can be added with add
and calculated fields can be conditionally added using script
.
enrich
does general CSV lookup and inputs as actions can do arbitrary retrieval.
Fields with Agent-specific Values
Data needs to be tagged with its location at source. There are standard context variables that are different for each agent, and more can be added using to the pipe contexts:
- add:
output-fields:
- site: '{{name}}'
- pipe: '{{pipe}}'
Generated Fields
All data needs a timestamp - this is the processing time at the agent:
- time:
output-field: '@timestamp'
A sequence number can be added to each event (although this is not persistent across restarts):
- script:
let:
- seq: 'count()'
A better method would be to use uuid()
which would give a unique id.
Calculated Fields
The script
action can be used to
calculate values for fields. For example, using script let
one can
anonymize data using hash functions.
script:
let:
- name_hash: md5(name)
- address_hash: md5(address)
remove:
- name
- address
This can be used to clean up data that will be stored and further processed outside your private networks.
Hashes are one-way functions, but encrypting sensitive fields with encrypt()
can be done.
See the available scripting functions.
Conditional Fields
If condition
is defined, then script
will only add fields if the condition is true.
When adding literal strings, it is easier to use set
rather than let
. Since add
and script
never
overwrite existing fields by default, this snippet has the effect of adding the field quality
to
the event, displaying "good" if the field a
value > 1, otherwise displaying "bad".
- script:
condition: a > 1
set:
- quality: good
- script:
set:
- quality: bad
The cond
function provides a more elegant solution:
- script:
let:
- quality: cond(a > 1,"good","bad")
Table Lookup
enrich
is an efficient and general way to enrich data with tables read from a CSV file.
In the simplest case, if the value of an event matches a column then we take another column's value
on the same row as a new field.
Note that the lookup files need to be attached to the pipe using a
files:
section, please see the example at the end of this section.
id,name,nick,office
23,Alice,bbye,head
12,Bob,wkr,kzn
13,John,nomo,wcape
So if iden
in the event is the same as id
in the table, then we will set nice_name
to the value
of name
.
# input: {"iden":12}
# output: {"iden":12,"nice_name":"Bob"}
enrich
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
add:
event-field: nice_name
lookup-field: name
You have to specify a type for the match:
str
text valuesnum
numbersip
IPv4 addressescidr
IPv4 address ranges, like '192.168.1.0/16')num-list
separated by commas, like '10,20,30'str-list
separated by commas, like 'office,home'num-range
ranges, like '10-23'
There can be multiple matches that must be satisfied.
# input: {"iden":12,"office":"kzn"}
# output: {"iden":12,"office":"kzn","nice_name":"Bob"}
enrich
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
- type: str
event-field: office
lookup-field: office
add:
event-field: nice_name
lookup-field: name
Adding multiple values with enrich
can be tedious, since the match must be repeated:
# input: {"iden":12}
# output: {"iden":12,"nice_name":"Bob","nickname":"wkr"}
enrich
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
add:
event-field: nice_name
lookup-field: name
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
add:
event-field: nickname
lookup-field: nick
If the fields to be added are the same as the lookup names, then there is a convenient shortcut:
# input: {"iden":12}
# output: {"iden":12,"name":"Bob","nick":"wkr"}
enrich
- lookup-file: names.csv
match:
- type: num
event-field: iden
lookup-field: id
add:
event-fields:
- name: <unknown>
- nick: ''
event-fields
gives the field name (which must match the same column in the CSV file) and the value after
the colon is the default value.
If the lookup CSV file is modified, it will be reloaded. This allows other pipes to modify the enrichment globally.
A complete example illustrating the inclusion of a file called fruits.csv
in an optional subdirectory called lookups
.
# output: {"that":"b","this":"a","fruit":"Apple"}
name: simple_echo_with_enrich
files:
- lookups/fruits.csv
input:
echo:
json: true
event: |
{ "this": "a", "that": "b" }
actions:
- enrich:
lookup-file: fruits.csv
add:
event-field: fruit
lookup-field: output_column
match:
- type: str
event-field: this
lookup-field: input_column
output:
print: STDOUT
The lookups/fruits.csv
file:
input_column,output_column
a,"Apple"
b,"Banana"
Enriching with Input
Using inputs as actions is a powerful technique. Say we have an HTTP endpoint that is given a name and
returns the city where the person lives as {"city":"NAME"}
.
Then events containing name
can get a city
field as below:
name: http-enrich
input:
exec:
command: echo '{"name":"Joe"}'
raw: true
actions:
- input:
http-poll:
address: http://127.0.0.1:3030
query:
- name: ${name}
raw: true
output:
write: console
----
{"city":"Johnnesburg","name":"Joe"re
Much functionality on Linux of course is provided through the command-line. Fortunately, we can execute commands
as actions. The basic command host
can do straight or reverse DNS lookup:
name: host-enrich
input:
exec:
command: echo '{"ip":"98.137.246.7"}'
raw: true
actions:
- exec:
command: host ${ip}
result:
stdout-field: host
- raw:
extract:
input-field: host
pattern: '(\S+)\.$'
output:
write: console
----
{"ip":"98.137.246.7","host":"media-router-fp1.prod1.media.vip.gq1.yahoo.com"}
All that was needed was to extract the hostname from the end of the output afterward.
(If you wanted the actual ASN, then the script function ip2asn
is more appropriate - it uses the Team
Cymru service. In this case, it gives "YAHOO-GQ1, US")
The Redis input can be used to look up a field in a hash. This works particularly well if they are simple lookups that are often written.