Skip to main content
Version: 3.4.0

Working with Data

JSON

Generally, all actions operate on valid JSON data. Each input line is a JSON document delimited by a line feed, which we call an 'event'. The JSON document is composed of keys followed by values, e.g. "key":"value". Values can be string (text), number, or boolean values (true or false). All numbers are stored as double-precision floating-point numbers (no 'integer or float' distinction).

Also generally, all inputs provide JSON data. A line of output is made into a JSON document, for example: {"_raw":"the line"}. (There may be other fields as well, like with TCP/UDP inputs).

So the default output of exec with command uptime will be something like {"_raw":" 13:46:33 up 2 days, 4:25, 1 user, load average: 0.48, 0.39, 0.31"}.

Using extract to Extract Fields using Patterns

This can be passed to the extract action as below:

- extract:
input-field: _raw
remove: true
pattern: 'load average: (\S+), (\S+), (\S+)'
output-fields: [m1, m5, m15]
# {"m1":"0.48","m5","0.39","m15","0.31"}

(If we did not say remove: true then the output event would still contain _raw.)

By default, extract is tolerant: if it cannot match data it will let it pass through unaltered unless you say drop: true. It will also not complain if it does not match unless you say warning: true. The reason for such tolerance is that you might wish to pass the same data through various patterns.

This is the most general way to convert data and requires some familiarity with regular expressions. Here is a guide to the dialect understood by pipes. If possible, use expand for delimited data.

Number and Unit Conversion

extract does not automatically convert strings into numbers. That is the function of convert.

# {"m1":"0.48","m5","0.39","m15","0.31"}
- convert
- m1: num
- m5: num
- m15: num
# {"m1":0.48,"m5",0.39,"m15",0.31}

The usual JSON types are covered with "num", "str", and "bool" but can also convert from units of time and storage.

For example, if the field mem was "512K" and the field time was "252ms" then we can convert them into different units:

- convert
- mem: M # memory as MB
- time: S # time as fractional seconds
# {"mem":0.5,"time":0.252}

Here is an example of extract followed with convert. The output of hotrod server traffic is a useful way to track the incoming and outgoing traffic of a Hotrod Server

hotrod server traffic:
metrics 644.00 B
logs 1.05 kiB
unsent logs 0.00 B
tarballs sent 213.34 kiB

The pattern in extract can be multiline, and we can ask for whitespace insensitive patterns with "(?x)". That is, any whitespace (like '\s' or '\n') has to be explicitly specified. The pattern itself can extend over several lines, and even include comments beginning with '#'. This can make longer regular expressions easier to read afterward!

Assume the above output is saved in traffic.txt:

name: traffic1
input:
exec:
command: cat traffic.txt
ignore-linebreaks: true
interval: 1s
count: 1
actions:
- extract:
remove: true
pattern: |
(?x)
metrics\s+(.+)\n
logs\s+(.+)\n
unsent\slogs\s+.+\n
tarballs\ssent\s+(.+)
output-fields: [metrics,logs,tarballs]
- convert:
- metrics: K
- logs: K
- tarballs: K
output:
write: console
# {"metrics":0.62890625,"logs":1.05,"tarballs":213.34}

Working with Raw Text with raw

Sometimes data needs to enter the Pipe as raw text.

Suppose there is a tool with output like this:

netter v0.1
copyright Netter Corp
output
port,throughput
1334,45552
1335,5666

Suppose also that we would like to treat it as CSV (and assume there's no --shutup flag). So we need to skip until that header line. After that, just wrap up as _raw for later processing.

We've put this text into netter.txt and run this pipe. We skip until the line that starts with "port,". raw: true to stop exec 'quoting' the line, raw-discard-until to skip, and raw-to-json to quote the line as JSON.

name: netter
input:
exec:
command: 'cat netter.txt'
raw: true
actions:
- raw:
discard-until: '^port,'
- raw:
to-json: _raw
output:
write: console
# {"_raw":"port,throughput"}
# {"_raw":"1334,45552"}
# {"_raw":"1335,5666"}

The particular super-power of raw is that it can work with any text, not just JSON.

raw does other text operations, like replacement. It's clearer (and easier to maintain) to do this rather than relying on shell commands like tr

# Hello Hound
- raw:
replace:
pattern: H
substitution: h
# hello hound

raw-extract will extract matches from text:

# Hello Dolly
- raw:
extract:
pattern: Hello (\S+)
# Dolly

Both replace and extract can be provided with input-field, where they will operate on the text in that field, otherwise, they operate on the whole line.

A replacement can be provided, which can contain regex group specifiers as in this case - the first matched group is $1 (Note this notation is different from \1 used with most Unix tools)

# {"greeting":"Hello Dolly"}
- raw:
extract:
input-field: greeting
pattern: Hello (\S+)
replace: Goodbye $1
# {"greeting":"Goodbye Dolly"}

If there's no pattern, then all of the text is available as $0.

In this way, we minimize the need for Unix pipeline tricks involving sed etc, and the result is guaranteed to work on all supported platforms in the same way.

Converting from CSV

Once input data is in this form, can use expand to convert CSV data.

# {"_raw":"port,throughput"}
# {"_raw":"1334,45552"}
# {"_raw":"1335,5666"}
- expand:
remove: true
csv:
header: true
# {"port":1334,"throughput":45552}
# {"port":1335,"throughput":5666}

Please note that by default expand assumes comma-separated fields, but you can specify the delimiter using delim.

Using an existing header is convenient but the actual types of the fields are worked out by auto-conversion. This may not be what you want.

With autoconvert: false the fields will all remain text.

    csv:
header: true
autoconvert: false

If the source generates headers each time it is run, say when scheduled with input-exec, then expand-csv needs a field to flag these first lines. Use begin-marker-field to specify the field name, corresponding to the same in batch with exec.

Alternatively, fields will specify the name and the type of the columns. The allowed types are "str", "num", "null" or "bool". Finally, field-file is a file containing "name:type" lines. Provide either fields or field-file.

Headers may also be specified as a field header-field containing the column names separated by the delimiter. If header-field-types: true then the format is 'name:type'.

This header-field only needs to be specified at the start but can be specified again when the schema changes (i.e. names and/or types of columns changes.). collapse with header-field-on-change: true will write events with this format.

In the total absence of any column information, we can use gen_headers and the column names will be "_0", "_1", etc.

Some formats use a special marker to indicate null fields, like "-"; this is the purpose of null, which is an array.

If the fields were separated with space then we would add delim: ' ' to the csv section. (This is a special case and will skip any whitespace between fields.) '\t' is also understood for tab-separated fields.

So expand takes a field containing some data separated by a delimiter, and converts it into JSON, possibly removing the original field. Prefer this to extract in this case, because you will not have to write regular expressions.

And expand has more powers!

Converting from Key-Value Pairs

A fairly popular data format is 'key-value pairs'.

# {"_raw":"a=1 b=2"}
- expand:
input-field: _raw
remove: true
delim: ' '
key-value:
autoconvert: true
# output: {"a":1,"b":2}

You can also set the separator between the key and the value:

# {"_raw":"name:\"Arthur\",age:42"}
- expand:
input-field: _raw
remove: true
delim: ','
key-value:
autoconvert: true
key-value-delim: ':'
# output: {"name":"Arthur","age":42}

The separator can be a newline (delim: '\n'). If your incoming string looked like this:

name=dolly
age=42

then you can easily convert this into {"name":"dolly","age":42}.

Working with Input JSON

If a field contains quoted JSON, then expand with json: true will parse and extract the fields, merging with the existing event.

Another option is expand events. This is different because it converts one event into multiple events by splitting the value of input-field with the delimiter.

# json: {"family":"baggins","data":"frodo bilbo"}
- expand:
input-field: data
remove: true
delim: ' '
events:
output-split-field: name
# output:
{"family":"baggins","name":"frodo"}
{"family":"baggins","name":"bilbo"}

Output as Raw

Generally we pass on the final events as JSON, but sometimes the situation requires more unstructured lines. For instance, 'classic' Hotrod 2 pipes have their output captured by systemd, passed to the server through rsyslog, unpacked using logstash and routed into Elasticsearch.

To send events back using this route, you will need to prepend the event with "@cee: " using the raw action.

As the final action below:

- raw:
extract:
replace: "@cee: $0"

($0 is the full match over the whole line.)

Outputs usually receive events as JSON documents separated by lines (so-called 'streaming JSON'), but this is not essential - single lines of text can be passed in most cases.

But creating and passing multi-line data is possible.

With add, if template-result-field is provided, then the template can be in some arbitrary format like YAML (note the ${field} expansions.)

# {"one":1,"two":2}
- add:
template-result-field: result
template: |
results:
one: ${one}
two: ${two}
# {"one":1,"two": 2,"result":"results:\n one: 1\n two: 2\n"}

Let's say you need to POST this arbitrary data to a server - then set body-field to be the 'result' field:

output:
http-post:
body-field: result
url: 'http://localhost:3030'

Similarly, exec has input-field:

input:
text: '{"name":"dolly"}'
actions:
- time:
output-field: tstamp
- add:
template-result-field: greeting
template: |
time: ${tstamp}
hello ${name}
goodbye ${name}
----------------
output:
exec:
command: 'cat'
input-field: greeting
# output
time: 2019-02-19T09:27:03.943Z
hello dolly
goodbye dolly
----------------

The command itself can contain field expansions, like ${name}.

Assume there is also a field called 'file', then the document will be appended to that file:

output:
exec:
command: 'cat >> ${file}'
input-field: greeting