Architecture
As a distributed system, the architecture enables the processing and management of data with efficiency and at scale.
Components
The below diagram provides an overview of the components and how they work in unison:
Pipes
Pipes are the foundation of the platform. A Pipe can be described as a data processing pipeline, representing a specific data processing workload and is defined with Pipe Language.
Agents
Agents are the management software for Pipes, responsible for scheduling and execution. Agents have the capability to run anywhere — on a VM, physical servers, on-premises, or in the cloud.
Server
The Server coordinates the allocation of Pipes to Agents, and provides other centralized management and monitoring capabilities. This is done through a CLI (Command Line Interface), HTTP API and a web-based UI (User Interface) that features a Pipe Editor for design, testing and deployment.
Both the CLI and UI utilize the Server HTTP API, which can be used for automation and introspection.
Deployment Model
A deployment consists of a single Server. A Server can manage hundreds or even thousands of Agents, which are in turn responsible for executing Pipes.
Agents can be co-located on the Server and with other Agents. They can also run on another machine or even in containers.
All components are deployed from a single binary, including the Server, Agents, and CLI. Each component is configured through environment variables or command line options. A locked-down binary that excludes Server components is available upon request. Please contact support to learn more.
Agent-Server Communication
Every Agent is configured to attach to one Server, communicating via HTTP(S). This communication is performed with the purpose of:
Obtaining the Pipes and their parameters to be scheduled to run.
Reporting on the state of the Pipes that the Agent is running.
Checking for configuration changes which controls how logs and metrics are sent.
The Server does not initiate requests to Agents, meaning that they do not need to be network-accessible from the Server. However, the Server HTTP API must be network-accessible from the Agents.
Agent-Server Authentication
The Server enforces Agent authentication in one of two ways:
Agent API Keys
This default behavior requires that each Agent be manually added to the Server before any communication from the Agent is allowed. Following this, an API key must be created on the Server, for use by one or more Agents. Agents are then configured with an Agent ID and an API key at startup.
Auto-Enrollment
In some deployment scenarios, manually adding Agents to the Server may be undesirable, so users can opt for an alternative authentication scheme via Auto-Enrollment. The Server is configured with an Auto-Enrollment key at startup. This same key is then used for all Agent configuration and authentication with the Server. The Server automatically creates internal Agent entries for each new Agent.