Logging
Structured Audit Logging
The Server aggregates structured logs from the various platform components, which are stored in the Server DB. Various logs
commands are available to inspect the behaviour of the Server, Agents, and Pipes.
Let’s start with a simple Pipe:
name: uptimex
context:
interval: 1s
input:
exec:
command: uptime
interval: '{{interval}}'
actions:
- extract:
input-field: _raw
remove: true
pattern: 'load average: (\S+), (\S+), (\S+)'
output-fields: [m1, m5, m15]
- convert:
- m1: num
- m5: num
- m15: num
- script:
let:
- k: count()
- filter:
condition: k % 2 == 0
output:
exec:
command: cat
Assume we have an Agent, have loaded this Pipe, and have attached it to the Agent:
$> hotrod agents add test0 --id 0
$> hotrod pipes add --file uptimex.yml
$> hotrod agents update test0 --add-pipe uptimex
[INFO] user=admin adding pipe=uptimex to agent=test0
An example of the Server logs
is below and there is an audit trail detailing the operations and the responsible user
:
$> hotrod server logs
timestamp | level | message
---------------------+-------+--------------------------------------------------
2020-06-16T14:17:50 | INFO | user=admin successful login from 127.0.0.1:38212
2020-06-16T14:17:59 | INFO | user=admin assigning id=0 to agent='test0'
2020-06-16T14:18:22 | INFO | user=admin adding pipe=uptimex
2020-06-16T14:18:22 | INFO | user=admin rebuilding agents: []
2020-06-16T14:18:22 | INFO | user=admin rebuilding agents: []
2020-06-16T14:18:36 | INFO | user=admin adding pipe=uptimex to agent=test0
Agent logs
displays the Pipe being attached:
$> hotrod agents logs
timestamp | agent | level | message
---------------------+--------+-------+----------------------------------------
2020-06-16T14:18:13 | test0 | INFO | current []
2020-06-16T14:18:13 | test0 | INFO | allowed []
2020-06-16T14:18:13 | test0 | INFO | agent connected to hotrodd
2020-06-16T14:18:22 | test0 | INFO | current []
2020-06-16T14:18:22 | test0 | INFO | allowed []
2020-06-16T14:18:38 | test0 | INFO | New Pipe detected: hotrod-pipe-uptimex
2020-06-16T14:18:39 | test0 | INFO | current ["hotrod-pipe-uptimex"]
2020-06-16T14:18:39 | test0 | INFO | allowed ["hotrod-pipe-uptimex"]
2020-06-16T14:18:39 | test0 | INFO | Pipes Updated...
Pipe logs
displays the Pipe being launched:
$> hotrod pipes logs
timestamp | agent | pipe | level | step | function | message
---------------------+--------+---------+-------+------+----------+----------------------
2020-06-16T14:18:38 | test0 | uptimex | WARN | 0 | | pipe started
2020-06-16T14:18:39 | test0 | uptimex | INFO | 0 | | systemd pipe started
Pipe Logging
Logging from Pipes is not enabled by default. You need to enable logging on an Agent-by-Agent basis.
$> hotrod agents settings --enable-pipe-logging test0
You can switch on debug
mode for the Pipe using:
$> hotrod pipes trace uptimex --enable-debugging --agent test0
The Pipe may only be scheduled to run in a few minutes, but a restart
can be
forced:
$> hotrod pipes trace uptimex --command restart --agent test0
After waiting a few seconds for the samples to collect, you can inspect the Pipe logs
:
$> hotrod pipes logs
timestamp | agent | pipe | level | step | function | message
---------------------+--------+---------+-------+------+-------------+-------------------------------------------
2020-06-16T14:18:38 | test0 | uptimex | WARN | 0 | | pipe started
2020-06-16T14:18:39 | test0 | uptimex | INFO | 0 | | systemd pipe started
2020-06-16T14:42:49 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1452}
2020-06-16T14:42:51 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1454}
2020-06-16T14:42:53 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1456}
DEBUG
mode prints out the event that will be written out. In this case, events are sent to systemd-journald
using HTTP POST, plain TCP, etc. Inspecting Pipe data should not require shell access to remote devices. step
is the zero-based index of the Pipe step
, where the input is 0
and the output is 5
.
The field "k"
comes from the Lua function count()
which increments endlessly and is even — we use a filter to clear out odd values.
Debugging needs to be switched off explicitly using --disable-debugging
.
trace
provides further insights into the operation of a Pipe. It prints out the trace id
which then references the result of this trace.
$> hotrod pipes trace --enable-tracing uptimex -t test0
trace id 1
By default, this will run trace
for 5 samples. After a few seconds, pass the trace id
to view the resulting Pipe logs
:
$> hotrod pipes logs --trace-id 1
timestamp | agent | pipe | level | step | function | message
---------------------+--------+---------+-------+------+----------------+------------------------------------------------------------------------------------------------------------------------------
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 1 | action-extract | {"_raw":" 15:41:14 up 50 days, 21:20, 1 user, load average: 0.42, 0.44, 0.44"} -> [{"m1":"0.42","m5":"0.44","m15":"0.44"}]
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 2 | action-convert | {"m1":"0.42","m5":"0.44","m15":"0.44"} -> [{"m1":0.42,"m5":0.44,"m15":0.44}]
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 3 | action-script | {"m1":0.42,"m5":0.44,"m15":0.44} -> [{"m1":0.42,"m5":0.44,"m15":0.44,"k":4957}]
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 4 | action-filter | {"m1":0.42,"m5":0.44,"m15":0.44,"k":4957} -> []
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 1 | action-extract | {"_raw":" 15:41:15 up 50 days, 21:20, 1 user, load average: 0.39, 0.43, 0.44"} -> [{"m1":"0.39","m5":"0.43","m15":"0.44"}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 2 | action-convert | {"m1":"0.39","m5":"0.43","m15":"0.44"} -> [{"m1":0.39,"m5":0.43,"m15":0.44}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 3 | action-script | {"m1":0.39,"m5":0.43,"m15":0.44} -> [{"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 4 | action-filter | {"m1":0.39,"m5":0.43,"m15":0.44,"k":4958} -> [{"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 5 | output-exec | {"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}
...
This provides a step-by-step view of the data flow. Here, at line four, the value "k":4957
, is not passed through by action-filter
. In the next sample, data arrives at output
.
Traces can be enabled for Pipes running on multiple Agents. To distinguish the outputs, hotrod agents logs
and hotrod pipes logs
both have an --agent
option to specify one or more Agents to search for.
Pipe Errors
The general rule of the Pipe runtime is to not discard data; this leads to a tolerant way of viewing errors. But without good diagnostics, it is difficult to troubleshoot Pipe problems, since we are usually only one typo away from success.
Consider this Pipe, containing a flaw. The intention of this is to extract the number of days that this computer has been running for:
name: err
input:
exec:
command: uptime
interval: 2s
actions:
- extract:
warning: true
pattern: UP (\d+)
output-fields: [up]
- convert:
- up: num
output:
write: console
You will notice actions['extract'].warning: true
— this is because not matching a pattern is not always an error. However, we need to be sure of this.
After adding this Pipe and attaching it, here are the logs
(with --pipe err
to exclude uptimex logs
):
$> hotrod pipes logs --pipe err
timestamp | agent | pipe | level | step | function | message
---------------------+--------+------+-------+------+----------------+----------------------------------
2020-06-16T16:10:19 | test0 | err | INFO | 0 | | systemd pipe started
2020-06-16T16:10:20 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:22 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:24 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:26 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:28 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
To cover all the issues, switch on debugging as before:
$> hotrod pipes logs --pipe err
timestamp | agent | pipe | level | step | function | message
---------------------+--------+------+-------+------+----------------+----------------------------------------------------------------------------------
2020-06-16T16:06:14 | test0 | err | DEBUG | 3 | output-exec | {"_raw":" 16:06:14 up 50 days, 21:45, 1 user, load average: 0.59, 0.57, 0.51"}
2020-06-16T16:06:16 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:06:16 | test0 | err | WARN | 2 | action-convert | fields not found up
2020-06-16T16:06:16 | test0 | err | DEBUG | 3 | output-exec | {"_raw":" 16:06:16 up 50 days, 21:45, 1 user, load average: 0.94, 0.64, 0.53"}
2020-06-16T16:06:18 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:06:18 | test0 | err | WARN | 2 | action-convert | fields not found up
2020-06-16T16:06:18 | test0 | err | DEBUG | 3 | output-exec | {"_raw":" 16:06:18 up 50 days, 21:45, 1 user, load average: 0.94, 0.64, 0.53"}
Here we have two problems, flowing from the fact that UP
does not match up
:
The pattern extraction.
The failed conversion.
Again, a field not found in convert
might be fine, when dealing with events following different schemas.
So, the raw data — which may be valuable — will always flow through.
Controlling Log Output
Logs have a level: trace
, debug
, warn
, and error
.
The minimum logging level is set with --level
(default is trace
).
Use --level error
to only show errors.
It is always possible to display logs in csv
format — this is one of the strengths of structured logging.
$> hotrod pipes logs --format csv
timestamp,agent,pipe,level,step,function,message
2020-06-16T16:32:56,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:32:58,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:33:00,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:33:02,test0,err,WARN,1,action-extract,extract: no captures with regex
...
By default, you will receive the last 25 records — use --num-records
to change this.
A start time can be specified, in a fairly relaxed format similar to that used by the date
(see man date
) command:
$> hotrod pipes logs --start-time '16 June' --num-records 10
timestamp | agent | pipe | level | step | function | message
---------------------+--------+---------+-------+------+-------------+-------------------------------------------
2020-06-16T14:18:38 | test0 | uptimex | WARN | 0 | | pipe started
2020-06-16T14:18:39 | test0 | uptimex | INFO | 0 | | systemd pipe started
2020-06-16T14:42:49 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1452}
2020-06-16T14:42:51 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1454}
2020-06-16T14:42:53 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1456}
2020-06-16T14:42:55 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1458}
2020-06-16T14:42:57 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1460}
2020-06-16T14:42:59 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1462}
2020-06-16T14:43:01 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1464}
2020-06-16T14:43:03 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.3,"m5":0.37,"m15":0.44,"k":1466}
Specifying a date implies the time starts from midnight — in the above case the data was first created at 14:00
.
One can also specify a time-range explicitly. Times without dates are assumed to refer to the current day:
$> hotrod pipes logs --start-time '14:00' --end-time '17:00' --file csv >logs.csv
Metrics and Volumes
$> hotrod server traffic
metrics 43.47 kiB
logs 2.80 MiB
unsent logs 0.00 B
tarballs sent 599.50 kiB
$> hotrod volumes
date | agents | bboxes | pipes | lines | volume
------------+---------+--------+-------+-------+------------
2020.06.17 | 2 | 0 | 3 | 3629 | 287.11 kiB
2020.06.16 | 1 | 0 | 2 | 22992 | 1.78 MiB