Skip to main content

Pipe Introduction

This section of the Guide covers the operation and design of Pipes. It's assumed you are familiar with the introduction and architecture sections of this guide.

Pipes are high-level descriptions of data engineering workloads or jobs. Pipes are defined in the Pipe Language, which is executed by the Pipe Runtime within Agents. Learning how to work with Hotrod Pipes involves the following general steps:

  • Understanding the mental model of Pipes
  • Learning a few details about the Pipe Runtime
  • Becoming familiar with the Pipe Language, and all it's various capabilities.

Pipes as a Mental Model

A key advantages of the platform is it's simplified mental model with which to handle a variety data engineering, data processing and integration problems. A little time investment in learning the Pipe Language therefore provides significant leverage. Having a simplified mental model also reduces the learning curve.

At minimum, a Pipe has an input and an output. Inputs read or ingest data from some external source. Outputs write data to some external source.

The source and destination may be a files, object storage, databases, queues, TCP/UDP ports, HTTP, or any transport supported by the Pipe Language.

A Pipe may further contain any number of actions, which perform different kinds of transformations and manipulations on data flowing through the Pipe.

The actions in a Pipe will run in the order they are specified in. Strict serial execution provides for easier reasoning about data transformations. Note however that the runtime will take advantage of as many CPU cores as it can to process data through actions, while maintaining strict serial execution. The Pipe Runtime section covers some strategies and optimizations to take advantage of this fact.

Pipes Definitions

When expressed in the Pipe Language, a Pipe definition looks like this:


# =================================== unique pipe name (required)
name: example-pipe

# =================================== input definition (required)
input:
s3:
# input-specific options...

# =================================== action definitions (optional)
actions:

# first action
- filter:
# action-specific options...

# second action
- add:
# action-specific options...

# =================================== output definition (required)
output:
http-post:
# output-specific options...

While the above example omits specific options, notice two important things:

  1. All Pipes will follow this basic pattern (name, input, actions, output).
  2. Data will flow from the input, through the actions (if any), to the output.

The Pipe Language defines all the available inputs, outputs, actions, along with their respective options.

Key Resources

There are two key resources for learning about working with Pipes:

  1. This Guide, which is organized by basic, intermediate and advanced sections.
  2. The complete Pipe Language Reference.