Logging
Structured Audit Logging
Hotrod is a distributed system with remote structured logging, saved in a database. The various log commands can help you track the server, the agents, and the pipes running on those agents.
Starting with a simple pipe:
# uptimex.yml
name: uptimex
context:
interval: 1s
input:
exec:
command: uptime
interval: '{{interval}}'
actions:
- extract:
input-field: _raw
remove: true
pattern: 'load average: (\S+), (\S+), (\S+)'
output-fields: [m1, m5, m15]
- convert:
- m1: num
- m5: num
- m15: num
- script:
let:
- k: count()
- filter:
condition: k % 2 == 0
output:
exec:
command: cat
Assume we have an agent, have loaded this pipe, and attached it to the agent:
$ hotrod agents add test0 --id 0
$ hotrod pipes add --file uptimex.yml
$ hotrod agents update test0 --add-pipe uptimex
[INFO] user=admin adding pipe=uptimex to agent=test0
At this point, the server logs look like shown below - there's an audit trail detailing these operations and the responsible user:
$ hotrod server logs
timestamp | level | message
---------------------+-------+--------------------------------------------------
2020-06-16T14:17:50 | INFO | user=admin successful login from 127.0.0.1:38212
2020-06-16T14:17:59 | INFO | user=admin assigning id=0 to agent='test0'
2020-06-16T14:18:22 | INFO | user=admin adding pipe=uptimex
2020-06-16T14:18:22 | INFO | user=admin rebuilding agents: []
2020-06-16T14:18:22 | INFO | user=admin rebuilding agents: []
2020-06-16T14:18:36 | INFO | user=admin adding pipe=uptimex to agent=test0
The agent logs show the pipe being attached:
$ hotrod agents logs
timestamp | agent | level | message
---------------------+--------+-------+----------------------------------------
2020-06-16T14:18:13 | test0 | INFO | current []
2020-06-16T14:18:13 | test0 | INFO | allowed []
2020-06-16T14:18:13 | test0 | INFO | agent connected to hotrodd
2020-06-16T14:18:22 | test0 | INFO | current []
2020-06-16T14:18:22 | test0 | INFO | allowed []
2020-06-16T14:18:38 | test0 | INFO | New Pipe detected: hotrod-pipe-uptimex
2020-06-16T14:18:39 | test0 | INFO | current ["hotrod-pipe-uptimex"]
2020-06-16T14:18:39 | test0 | INFO | allowed ["hotrod-pipe-uptimex"]
2020-06-16T14:18:39 | test0 | INFO | Pipes Updated...
And the pipe logs show the pipe being launched:
$ hotrod pipes logs
timestamp | agent | pipe | level | step | function | message
---------------------+--------+---------+-------+------+----------+----------------------
2020-06-16T14:18:38 | test0 | uptimex | WARN | 0 | | pipe started
2020-06-16T14:18:39 | test0 | uptimex | INFO | 0 | | systemd pipe started
Pipe Logging
Logging from pipes is not enabled by default. It is opt-in, on an agent-by-agent basis.
$ hotrod agents settings --enable-pipe-logging test0
You can switch on debug mode for the pipe:
$ hotrod pipes trace uptimex --enable-debugging --agent test0
The pipe may only be scheduled to run in a few minutes, but you can always force a restart:
$ hotrod pipes trace uptimex --command restart --agent test0
After waiting a few seconds for samples to collect, you can inspect the pipe logs:
$ hotrod pipes logs
timestamp | agent | pipe | level | step | function | message
---------------------+--------+---------+-------+------+-------------+-------------------------------------------
2020-06-16T14:18:38 | test0 | uptimex | WARN | 0 | | pipe started
2020-06-16T14:18:39 | test0 | uptimex | INFO | 0 | | systemd pipe started
2020-06-16T14:42:49 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1452}
2020-06-16T14:42:51 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1454}
2020-06-16T14:42:53 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1456}
Debug mode prints out the event that will be written out. In this case, events just go to the journal, but they may be sent using HTTP POST, plain TCP, etc. Inspecting pipe data should not require shell access to remote devices. 'step' is the zero-based index of the pipe step, where the input is 0 and here the output is 5.
The field k
comes from the Lua function count()
which increments endlessly and is even, because we use filter
to drop
out odd values.
Debugging needs to be switched off explicitly with --disable-debugging
.
Tracing gives a lot more insight into the operation of a pipe. It prints out the trace id which thereafter references the result of this trace.
$ hotrod pipes trace --enable-tracing uptimex -t test0
trace id 1
By default, this will trace 5 samples and then switch off. After a few seconds, pass the trace id to see the resulting pipe logs:
$ hotrod pipes logs --trace-id 1
timestamp | agent | pipe | level | step | function | message
---------------------+--------+---------+-------+------+----------------+------------------------------------------------------------------------------------------------------------------------------
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 1 | action-extract | {"_raw":" 15:41:14 up 50 days, 21:20, 1 user, load average: 0.42, 0.44, 0.44"} -> [{"m1":"0.42","m5":"0.44","m15":"0.44"}]
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 2 | action-convert | {"m1":"0.42","m5":"0.44","m15":"0.44"} -> [{"m1":0.42,"m5":0.44,"m15":0.44}]
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 3 | action-script | {"m1":0.42,"m5":0.44,"m15":0.44} -> [{"m1":0.42,"m5":0.44,"m15":0.44,"k":4957}]
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 4 | action-filter | {"m1":0.42,"m5":0.44,"m15":0.44,"k":4957} -> []
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 1 | action-extract | {"_raw":" 15:41:15 up 50 days, 21:20, 1 user, load average: 0.39, 0.43, 0.44"} -> [{"m1":"0.39","m5":"0.43","m15":"0.44"}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 2 | action-convert | {"m1":"0.39","m5":"0.43","m15":"0.44"} -> [{"m1":0.39,"m5":0.43,"m15":0.44}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 3 | action-script | {"m1":0.39,"m5":0.43,"m15":0.44} -> [{"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 4 | action-filter | {"m1":0.39,"m5":0.43,"m15":0.44,"k":4958} -> [{"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 5 | output-exec | {"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}
...
This provides a step-by-step view of the data flow; we can see that when t
is 4957, it is not passed through
by action-filter
. In the next sample, data arrives at the output.
Traces can be enabled for pipes running on multiple agents. To distinguish the outputs, hotrod agents logs
and hotrod pipes logs
both have a --agent
flag to specify one or more agents to search for.
Pipe Errors
The general principle with the pipe framework is to not discard data. This leads to a tolerant way of viewing errors. But without good diagnostics this makes it hard to troubleshoot pipe problems, since we are often only one typo away from success.
Consider this pipe, containing a flaw. The intention is to extract the number of days this particular machine has been up.
# err.yml
name: err
input:
exec:
command: uptime
interval: 2s
actions:
- extract:
warning: true
pattern: UP (\d+)
output-fields: [up]
- convert:
- up: num
output:
write: console
You will notice warning: true
- Since not matching a pattern is not always an error. However, we want to be sure in this case.
After adding this pipe and attaching it, here are the logs (with --pipe err
to exclude uptimex
logs):
$ hotrod pipes logs --pipe err
timestamp | agent | pipe | level | step | function | message
---------------------+--------+------+-------+------+----------------+----------------------------------
2020-06-16T16:10:19 | test0 | err | INFO | 0 | | systemd pipe started
2020-06-16T16:10:20 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:22 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:24 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:26 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:28 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
To get all the issues, switch on debugging as before:
$ hotrod pipes logs --pipe err
timestamp | agent | pipe | level | step | function | message
---------------------+--------+------+-------+------+----------------+----------------------------------------------------------------------------------
2020-06-16T16:06:14 | test0 | err | DEBUG | 3 | output-exec | {"_raw":" 16:06:14 up 50 days, 21:45, 1 user, load average: 0.59, 0.57, 0.51"}
2020-06-16T16:06:16 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:06:16 | test0 | err | WARN | 2 | action-convert | fields not found up
2020-06-16T16:06:16 | test0 | err | DEBUG | 3 | output-exec | {"_raw":" 16:06:16 up 50 days, 21:45, 1 user, load average: 0.94, 0.64, 0.53"}
2020-06-16T16:06:18 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:06:18 | test0 | err | WARN | 2 | action-convert | fields not found up
2020-06-16T16:06:18 | test0 | err | DEBUG | 3 | output-exec | {"_raw":" 16:06:18 up 50 days, 21:45, 1 user, load average: 0.94, 0.64, 0.53"}
Here we have two problems, flowing from the fact that 'UP' does not match 'up':
- the pattern extraction.
- the failed conversion.
Again, a field not found in convert
might be fine, when dealing with events with different schemas.
So, the raw data - which may be valuable - will always flow through.
Controlling Log Output
Logs have a level: trace, debug, warn, error.
The minimum logging level is set with --level
(default 'trace').
Use --level error
to only show errors.
It is always possible to display logs in CSV format - this is one of the strengths of structured logging.
$ hotrod pipes logs --file csv
timestamp,agent,pipe,level,step,function,message
2020-06-16T16:32:56,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:32:58,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:33:00,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:33:02,test0,err,WARN,1,action-extract,extract: no captures with regex
...
By default, you will get the last 25 records - use --num-records
to change this.
A start time can be specified, in a fairly relaxed format similar to that used by the Unix date
command:
$ hotrod pipes logs --start-time '16 June' --num-records 10
timestamp | agent | pipe | level | step | function | message
---------------------+--------+---------+-------+------+-------------+-------------------------------------------
2020-06-16T14:18:38 | test0 | uptimex | WARN | 0 | | pipe started
2020-06-16T14:18:39 | test0 | uptimex | INFO | 0 | | systemd pipe started
2020-06-16T14:42:49 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1452}
2020-06-16T14:42:51 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1454}
2020-06-16T14:42:53 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1456}
2020-06-16T14:42:55 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1458}
2020-06-16T14:42:57 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1460}
2020-06-16T14:42:59 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1462}
2020-06-16T14:43:01 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1464}
2020-06-16T14:43:03 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.3,"m5":0.37,"m15":0.44,"k":1466}
$
Specifying a date implies the time starts from midnight - in the above case the data was first created at 14:00.
One can also specify a time-range explicitly. Times without dates are assumed to refer to the current day:
$ hotrod pipes logs --start-time '14:00' --end-time '17:00' --file csv > logs.csv
Metrics and Volumes
$ hotrod server traffic
metrics 43.47 kiB
logs 2.80 MiB
unsent logs 0.00 B
tarballs sent 599.50 kiB
$ hotrod volumes
date | agents | bboxes | pipes | lines | volume
------------+---------+--------+-------+-------+------------
2020.06.17 | 2 | 0 | 3 | 3629 | 287.11 kiB
2020.06.16 | 1 | 0 | 2 | 22992 | 1.78 MiB