s3
Stream data from a S3 Object
Available from Hotrod: 3.1
Field Name | Description | Type | Default |
---|---|---|---|
interval | How often to run the command | duration | - |
cron | How often to run the command. Note that Hotrod uses a different format than Cron it includes a column for seconds. See full discussion | cron | - |
immediate | Run as soon as invoked, instead of waiting for the specified cron interval | bool | false |
random-offset | Sets a random offset to the schedule, then sticks to it | duration | 0s |
window | For resources that need a time window to be specified | Window | - |
block | Block further input schedules from triggering if the pipe output is retrying | bool | false |
bucket-name | The storage service container for created blobs | string | - |
object-names | The name for the blob | array of strings | - |
object-name-field | The field that a blob name from an operation should be stored in | field | - |
creation-time-field | The field that the blob creation time should be stored in | field | - |
last-modified-field | The field that the blob last modified time should be stored in | field | - |
content-length-field | The field that the blob content length information should be stored in | field | - |
content-type-field | The field that the blob content type information should be stored in | field | - |
etag-field | The field that the object ETag should be stored in | field | - |
data-field | A field that the blob data should be nested in | field | - |
region | Region | string | - |
endpoint | S3 Endpoint | string | - |
access-key | Access Key ID | string | - |
secret-key | Secret Key ID | string | - |
security-token | Security Token | string | - |
session-token | Session Token | string | - |
timestamp-mode | Derive a timestamp for this blob for filtering purposes based on the selected strategy. | S3ObjectTimestampMode | - |
maximum-age | Remove any object older than this many seconds from the candidate list | MaxAgeSpecifier | - |
mode | The operating mode for this input | S3BlockInputMode | - |
fingerprinting | Enable object fingerprinting, which will cause a object to only be downloaded once | bool | false |
maximum-fingerprint-age | Remove any object fingerprints older than this from the tracker | MaxAgeSpecifier | 30 days |
preprocessors | Preprocessors (process downloaded data before making it available to the pipeline) these processors will be run in the order they are specified | PreProcessor | - |
interval
How often to run the command
By default, interval: 0s
which means: once.
Note that scheduled inputs set document markers.
See full discussion
Type: duration
Example
action:
exec:
command: echo 'once a day'
interval: 1d
cron
How often to run the command. Note that Hotrod uses a different format than Cron it includes a column for seconds. See full discussion
Type: cron
Example: Once a day
action:
exec:
command: echo 'once a day'
cron: '0 0 0 * * *'
Example: Once a day, using a convenient shortcut
action:
exec:
command: echo 'once a day'
cron: '@daily'
immediate
Run as soon as invoked, instead of waiting for the specified cron interval
Type: bool
Example: Run immediately on invocation, and thereafter at 10h every morning
action:
exec:
command: echo 'hello'
immediate: true
cron: '0 0 10 * * *'
random-offset
Sets a random offset to the schedule, then sticks to it
This can help avoid the thundering herd problem, where you do not, for example, want to overload some service at 00:00:00
Type: duration
Example: Would fire up to a minute after every hour
action:
exec:
command: echo 'hello'
random-offset: 1m
cron: '0 0 * * * *'
window
For resources that need a time window to be specified
Type: Window
Field Name | Description | Type | Default |
---|---|---|---|
size | Window size | duration | - |
offset | Window offset | duration | 0s |
start-time | Allows the windowing to start at a specified time | time | - |
highwatermark-file | Specify file where timestamp would be stored in order to resume, for when Pipe has been restarted | path | - |
size
Window size
Type: duration
Example
action:
exec:
command: echo 'one two'
window:
size: 1m
offset
Window offset
Type: duration
Example
action:
exec:
command: echo 'one two'
window:
size: 1m
offset: 10s
start-time
Allows the windowing to start at a specified time
It should in the following format: 2019-07-10 18:45:00.000 +0200
Type: time
Example
action:
exec:
command: echo 'one two'
window:
size: 1m
start-time: 10s
highwatermark-file
Specify file where timestamp would be stored in order to resume, for when Pipe has been restarted
Type: path
Example
action:
exec:
command: echo 'one two'
window:
size: 1m
highwatermark-file:: /tmp/mark.txt
block
Block further input schedules from triggering if the pipe output is retrying
Type: bool
bucket-name
The storage service container for created blobs
Type: string
object-names
The name for the blob
Type: array of strings
object-name-field
The field that a blob name from an operation should be stored in
Type: field
creation-time-field
The field that the blob creation time should be stored in
Type: field
last-modified-field
The field that the blob last modified time should be stored in
Type: field
content-length-field
The field that the blob content length information should be stored in
Type: field
content-type-field
The field that the blob content type information should be stored in
Type: field
etag-field
The field that the object ETag should be stored in
Type: field
data-field
A field that the blob data should be nested in
Type: field
region
Region
Type: string
endpoint
S3 Endpoint
Type: string
access-key
Access Key ID
Type: string
secret-key
Secret Key ID
Type: string
security-token
Security Token
Type: string
session-token
Session Token
Type: string
timestamp-mode
Derive a timestamp for this blob for filtering purposes based on the selected strategy.
Type: S3ObjectTimestampMode
Field Name | Description | Type | Default |
---|---|---|---|
none | The default mode, do not filter object based on timestamps | - | - |
last-modified | Filter object on the last-modified timestamp reported by the service | - | - |
blob-name-pattern | Filter blobs on the timestamp derived from the object name for example: object-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/ | string | - |
none
The default mode, do not filter object based on timestamps
last-modified
Filter object on the last-modified timestamp reported by the service
blob-name-pattern
Filter blobs on the timestamp derived from the object name for example: object-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/
Type: string
maximum-age
Remove any object older than this many seconds from the candidate list
Type: MaxAgeSpecifier
Field Name | Description | Type | Default |
---|---|---|---|
seconds | Specify the maximum age in number of seconds | integer | - |
duration | Specify the maximum age as a human readable duration (example: 1 hour) | string | - |
seconds
Specify the maximum age in number of seconds
Type: integer
duration
Specify the maximum age as a human readable duration (example: 1 hour)
Type: string
mode
The operating mode for this input
Type: S3BlockInputMode
Field Name | Description | Type | Default |
---|---|---|---|
list-objects | List Objects | - | - |
download-objects | Download Given Objects | - | - |
list-and-download-objects | List Objects and Download | - | - |
list-objects
List Objects
download-objects
Download Given Objects
list-and-download-objects
List Objects and Download
fingerprinting
Enable object fingerprinting, which will cause a object to only be downloaded once
Type: bool
maximum-fingerprint-age
Remove any object fingerprints older than this from the tracker
Type: MaxAgeSpecifier
Field Name | Description | Type | Default |
---|---|---|---|
seconds | Specify the maximum age in number of seconds | integer | - |
duration | Specify the maximum age as a human readable duration (example: 1 hour) | string | - |
seconds
Specify the maximum age in number of seconds
Type: integer
duration
Specify the maximum age as a human readable duration (example: 1 hour)
Type: string
preprocessors
Preprocessors (process downloaded data before making it available to the pipeline) these processors will be run in the order they are specified
Type: PreProcessor
Field Name | Description | Type | Default |
---|---|---|---|
extension | Preprocess the object or blob based on the extension of the object or blob name (.gz, .parquet) | - | - |
gzip | UnGzip the received data | - | - |
parquet | Extract the received data as JSON rows from a parquet file | - | - |
extension
Preprocess the object or blob based on the extension of the object or blob name (.gz, .parquet)
gzip
UnGzip the received data
parquet
Extract the received data as JSON rows from a parquet file