Working with Data
JSON
Generally, all actions operate on valid JSON data. Each input line is a JSON document
delimited by a line feed, which we call an 'event'. The JSON document is composed
of keys followed by values, e.g. "key":"value"
. Values can be string (text), number,
or boolean values (true
or false
). All numbers are stored as double-precision
floating-point numbers (no 'integer or float' distinction).
Also generally, all inputs provide JSON data. A line of output is made into a JSON
document, for example: {"_raw":"the line"}
. (There may be other fields as well, like with
TCP/UDP inputs).
So the default output of exec
with command uptime
will be something like
{"_raw":" 13:46:33 up 2 days, 4:25, 1 user, load average: 0.48, 0.39, 0.31"}
.
Using extract
to Extract Fields using Patterns
This can be passed to the extract
action as below:
- extract:
input-field: _raw
remove: true
pattern: 'load average: (\S+), (\S+), (\S+)'
output-fields: [m1, m5, m15]
# {"m1":"0.48","m5","0.39","m15","0.31"}
(If we did not say remove: true
then the output event would still contain _raw
.)
By default, extract
is tolerant: if it cannot match data it will let it pass through unaltered
unless you say drop: true
. It will also not complain if it does not match unless you say warning: true
.
The reason for such tolerance is that you might wish to pass the same data through various patterns.
This is the most general way to convert data and requires some familiarity with regular expressions. Here is a guide to the dialect understood by pipes. If possible, use expand for delimited data.
Number and Unit Conversion
extract
does not automatically convert strings into numbers. That is the function of convert
.
# {"m1":"0.48","m5","0.39","m15","0.31"}
- convert
- m1: num
- m5: num
- m15: num
# {"m1":0.48,"m5",0.39,"m15",0.31}
The usual JSON types are covered with "num", "str", and "bool" but can also convert from units of time and storage.
For example, if the field mem
was "512K" and the field time
was "252ms" then we can convert them into
different units:
- convert
- mem: M # memory as MB
- time: S # time as fractional seconds
# {"mem":0.5,"time":0.252}
Here is an example of extract
followed with convert
.
The output of hotrod server traffic
is a useful way to track the incoming and outgoing traffic of a Hotrod Server
hotrod server traffic:
metrics 644.00 B
logs 1.05 kiB
unsent logs 0.00 B
tarballs sent 213.34 kiB
The pattern in extract
can be multiline, and we can ask for whitespace insensitive patterns with "(?x)".
That is, any whitespace (like '\s' or '\n') has to be explicitly specified. The pattern itself can extend over
several lines, and even include comments beginning with '#'. This can make longer regular expressions easier to
read afterward!
Assume the above output is saved in traffic.txt
:
name: traffic1
input:
exec:
command: cat traffic.txt
ignore-linebreaks: true
interval: 1s
count: 1
actions:
- extract:
remove: true
pattern: |
(?x)
metrics\s+(.+)\n
logs\s+(.+)\n
unsent\slogs\s+.+\n
tarballs\ssent\s+(.+)
output-fields: [metrics,logs,tarballs]
- convert:
- metrics: K
- logs: K
- tarballs: K
output:
write: console
# {"metrics":0.62890625,"logs":1.05,"tarballs":213.34}
Working with Raw Text with raw
Sometimes data needs to enter the Pipe as raw text.
Suppose there is a tool with output like this:
netter v0.1
copyright Netter Corp
output
port,throughput
1334,45552
1335,5666
Suppose also that we would like to treat it as CSV (and assume there's no --shutup
flag).
So we need to skip until that header line. After that, just wrap up as _raw
for
later processing.
We've put this text into netter.txt
and run this pipe. We skip until the
line that starts with "port,". raw: true
to stop exec
'quoting' the line,
raw-discard-until
to skip, and raw-to-json
to quote the line as JSON.
name: netter
input:
exec:
command: 'cat netter.txt'
raw: true
actions:
- raw:
discard-until: '^port,'
- raw:
to-json: _raw
output:
write: console
# {"_raw":"port,throughput"}
# {"_raw":"1334,45552"}
# {"_raw":"1335,5666"}
The particular super-power of raw
is that it can work with any text, not just JSON.
raw
does other text operations, like replacement. It's clearer (and easier to maintain) to do
this rather than relying on shell commands like tr
# Hello Hound
- raw:
replace:
pattern: H
substitution: h
# hello hound
raw-extract
will extract matches from text:
# Hello Dolly
- raw:
extract:
pattern: Hello (\S+)
# Dolly
Both replace
and extract
can be provided with input-field
, where they will operate on the text
in that field, otherwise, they operate on the whole line.
A replacement can be provided, which can contain regex group specifiers as in this case - the first matched
group is $1
(Note this notation is different from \1
used with most Unix tools)
# {"greeting":"Hello Dolly"}
- raw:
extract:
input-field: greeting
pattern: Hello (\S+)
replace: Goodbye $1
# {"greeting":"Goodbye Dolly"}
If there's no pattern
, then all of the text is available as $0
.
In this way, we minimize the need for Unix pipeline tricks involving sed
etc, and the result is
guaranteed to work on all supported platforms in the same way.
Converting from CSV
Once input data is in this form, can use expand
to convert CSV data.
# {"_raw":"port,throughput"}
# {"_raw":"1334,45552"}
# {"_raw":"1335,5666"}
- expand:
remove: true
csv:
header: true
# {"port":1334,"throughput":45552}
# {"port":1335,"throughput":5666}
Please note that by default expand
assumes comma-separated fields, but you can specify the delimiter using
delim
.
Using an existing header is convenient but the actual types of the fields are worked out by auto-conversion. This may not be what you want.
With autoconvert: false
the fields will all remain text.
csv:
header: true
autoconvert: false
If the source generates headers each time it is run, say when scheduled with input-exec
,
then expand-csv
needs a field to flag these first lines. Use begin-marker-field
to
specify the field name, corresponding to the same in batch
with exec
.
Alternatively, fields
will specify the name and the type
of the columns. The allowed types are "str", "num", "null" or "bool".
Finally, field-file
is a file containing "name:type" lines. Provide either
fields
or field-file
.
Headers may also be specified as a field header-field
containing the column names
separated by the delimiter.
If header-field-types: true
then the format is 'name:type'.
This header-field
only needs to be specified at the start but can be specified again when the
schema changes (i.e. names and/or types of columns changes.). collapse
with header-field-on-change: true
will write events with this format.
In the total absence of any column information, we can use gen_headers
and
the column names will be "_0", "_1", etc.
Some formats use a special marker to indicate null
fields, like "-"; this
is the purpose of null
, which is an array.
If the fields were separated with space then we would add delim: ' '
to the csv
section.
(This is a special case and will skip any whitespace between fields.)
'\t' is also understood for tab-separated fields.
So expand
takes a field containing some data separated by a delimiter, and converts it into JSON, possibly removing the
original field. Prefer this to extract
in this case, because you will
not have to write regular expressions.
And expand
has more powers!
Converting from Key-Value Pairs
A fairly popular data format is 'key-value pairs'.
# {"_raw":"a=1 b=2"}
- expand:
input-field: _raw
remove: true
delim: ' '
key-value:
autoconvert: true
# output: {"a":1,"b":2}
You can also set the separator between the key and the value:
# {"_raw":"name:\"Arthur\",age:42"}
- expand:
input-field: _raw
remove: true
delim: ','
key-value:
autoconvert: true
key-value-delim: ':'
# output: {"name":"Arthur","age":42}
The separator can be a newline (delim: '\n'
). If your incoming string looked like this:
name=dolly
age=42
then you can easily convert this into {"name":"dolly","age":42}
.
Working with Input JSON
If a field contains quoted JSON, then expand
with json: true
will parse and extract the fields, merging
with the existing event.
Another option is expand events
. This is different because it converts one event into
multiple events by splitting the value of input-field
with the delimiter.
# json: {"family":"baggins","data":"frodo bilbo"}
- expand:
input-field: data
remove: true
delim: ' '
events:
output-split-field: name
# output:
{"family":"baggins","name":"frodo"}
{"family":"baggins","name":"bilbo"}
Output as Raw
Generally we pass on the final events as JSON, but sometimes the situation requires more unstructured lines. For instance, 'classic' Hotrod 2 pipes have their output captured by systemd, passed to the server through rsyslog, unpacked using logstash and routed into Elasticsearch.
To send events back using this route, you will need to prepend the
event with "@cee: " using the raw
action.
As the final action below:
- raw:
extract:
replace: "@cee: $0"
($0
is the full match over the whole line.)
Outputs usually receive events as JSON documents separated by lines (so-called 'streaming JSON'), but this is not essential - single lines of text can be passed in most cases.
But creating and passing multi-line data is possible.
With add
, if template-result-field
is provided, then the template can be in some arbitrary
format like YAML (note the ${field}
expansions.)
# {"one":1,"two":2}
- add:
template-result-field: result
template: |
results:
one: ${one}
two: ${two}
# {"one":1,"two": 2,"result":"results:\n one: 1\n two: 2\n"}
Let's say you need to POST this arbitrary data to a server - then set body-field
to be the 'result' field:
output:
http-post:
body-field: result
url: 'http://localhost:3030'
Similarly, exec
has input-field
:
input:
text: '{"name":"dolly"}'
actions:
- time:
output-field: tstamp
- add:
template-result-field: greeting
template: |
time: ${tstamp}
hello ${name}
goodbye ${name}
----------------
output:
exec:
command: 'cat'
input-field: greeting
# output
time: 2019-02-19T09:27:03.943Z
hello dolly
goodbye dolly
----------------
The command itself can contain field expansions, like ${name}
.
Assume there is also a field called 'file', then the document will be appended to that file:
output:
exec:
command: 'cat >> ${file}'
input-field: greeting