Input: azure-blob
Send data to a Microsoft Azure Storage Blob (Block Storage)
Field Summary
Field Name | Type | Description | Default |
---|---|---|---|
when | message_filter | Fire this input when a specific internal message occurs | - |
interval | duration | How often to run the command | - |
cron | cron | How often to run the command. Note that unlike standard Cron, Pipes use a Cron syntax that includes a column for seconds. See full discussion | - |
immediate | bool | Run as soon as invoked, instead of waiting for the specified cron interval | false |
random-offset | duration | Sets a random offset to the schedule, then sticks to it | 0s |
window | Window | For resources that need a time window to be specified | - |
block | bool | Block further input schedules from triggering if the pipe output is retrying | false |
container-name | string | The storage service container for created blobs | - |
blob-names | array of strings | The name for the blob | - |
blob-name-field | field | The field that a blob name from an operation should be stored in | - |
creation-time-field | field | The field that the blob creation time should be stored in | - |
last-modified-field | field | The field that the blob last modified time should be stored in | - |
content-length-field | field | The field that the blob content length information should be stored in | - |
content-type-field | field | The field that the blob content type information should be stored in | - |
content-md5-field | field | The field that the blob content md5 should be stored in | - |
data-field | field | A field that the blob data should be nested in | - |
storage-account | string | The Storage Account Name to be used (credential) | - |
storage-master-key | string | The Storage Master Key to be used (credential) | - |
timestamp-mode | AzureBlobTimestampMode | Derive a timestamp for this blob for filtering purposes based on the selected strategy. | - |
maximum-age | MaxAgeSpecifier | Remove any blobs older than this many seconds from the candidate list | - |
mode | AzureBlobInputMode | The operating mode for this input | - |
fingerprinting | bool | Enable object fingerprinting, which will cause a object to only be downloaded once | false |
fingerprinting-db-path | path | Specify a custom path for the fingerprinting database | - |
maximum-fingerprint-age | MaxAgeSpecifier | Remove any object fingerprints older than this from the tracker | 30 days |
preprocessors | PreProcessor | Preprocessors (process downloaded data before making it available to the pipeline) these processors will be run in the order they are specified | - |
Fields
when
Type: message_filter
Fire this input when a specific internal message occurs
This field overloads time-based scheduling with a scheduler that fires on matching messages.
Example
Pipe Language Snippet:
input:
http-poll:
when:
message-received:
filter-type:
- pipe-idle
url: "http://localhost:8888"
raw: true
ignore-line-breaks: true
interval
Type: duration
How often to run the command
By default, interval: 0s
which means: once.
Note that scheduled inputs set document markers.
See full discussion
Example
Pipe Language Snippet:
exec:
command: echo 'once a day'
interval: 1d
cron
Type: cron
How often to run the command. Note that unlike standard Cron, Pipes use a Cron syntax that includes a column for seconds. See full discussion
Example: Once a day
Pipe Language Snippet:
exec:
command: echo 'once a day'
cron: '0 0 0 * * *'
Example: Once a day, using a convenient shortcut
Pipe Language Snippet:
exec:
command: echo 'once a day'
cron: '@daily'
immediate
Type: bool
Default: false
Run as soon as invoked, instead of waiting for the specified cron interval
Example: Run immediately on invocation, and thereafter at 10h every morning
Pipe Language Snippet:
exec:
command: echo 'hello'
immediate: true
cron: '0 0 10 * * *'
random-offset
Type: duration
Default: 0s
Sets a random offset to the schedule, then sticks to it
This can help avoid the thundering herd problem, where you do not, for example, want to overload some service at 00:00:00
Example: Would fire up to a minute after every hour
Pipe Language Snippet:
exec:
command: echo 'hello'
random-offset: 1m
cron: '0 0 * * * *'
window
Type: Window
For resources that need a time window to be specified
Field Name | Type | Description | Default |
---|---|---|---|
size | duration | Window size | - |
offset | duration | Window offset | 0s |
start-time | time | Allows the windowing to start at a specified time | - |
highwatermark-file | path | Specify file where timestamp would be stored in order to resume, for when Pipe has been restarted | - |
size
Type: duration
Window size
Example
Pipe Language Snippet:
exec:
command: echo 'one two'
window:
size: 1m
offset
Type: duration
Default: 0s
Window offset
Example
Pipe Language Snippet:
exec:
command: echo 'one two'
window:
size: 1m
offset: 10s
start-time
Type: time
Allows the windowing to start at a specified time
It should in the following format: 2019-07-10 18:45:00.000 +0200
Example
Pipe Language Snippet:
exec:
command: echo 'one two'
window:
size: 1m
start-time: 10s
highwatermark-file
Type: path
Specify file where timestamp would be stored in order to resume, for when Pipe has been restarted
Example
Pipe Language Snippet:
exec:
command: echo 'one two'
window:
size: 1m
highwatermark-file:: /tmp/mark.txt
block
Type: bool
Default: false
Block further input schedules from triggering if the pipe output is retrying
container-name
Type: string
The storage service container for created blobs
blob-names
Type: array of strings
The name for the blob
blob-name-field
Type: field
The field that a blob name from an operation should be stored in
creation-time-field
Type: field
The field that the blob creation time should be stored in
last-modified-field
Type: field
The field that the blob last modified time should be stored in
content-length-field
Type: field
The field that the blob content length information should be stored in
content-type-field
Type: field
The field that the blob content type information should be stored in
content-md5-field
Type: field
The field that the blob content md5 should be stored in
data-field
Type: field
A field that the blob data should be nested in
storage-account
Type: string
The Storage Account Name to be used (credential)
storage-master-key
Type: string
The Storage Master Key to be used (credential)
timestamp-mode
Type: AzureBlobTimestampMode
Derive a timestamp for this blob for filtering purposes based on the selected strategy.
Field Name | Type | Description | Default |
---|---|---|---|
none | The default mode, do not filter blobs based on timestamps | - | |
creation-time | Filter blobs on the creation-time timestamp reported by the service | - | |
last-modified | Filter blobs on the last-modified timestamp reported by the service | - | |
blob-name-pattern | string | Filter blobs on the timestamp derived from the blob name for example: blob-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/ | - |
none
The default mode, do not filter blobs based on timestamps
creation-time
Filter blobs on the creation-time timestamp reported by the service
last-modified
Filter blobs on the last-modified timestamp reported by the service
blob-name-pattern
Type: string
Filter blobs on the timestamp derived from the blob name for example: blob-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/
maximum-age
Type: MaxAgeSpecifier
Remove any blobs older than this many seconds from the candidate list
Field Name | Type | Description | Default |
---|---|---|---|
seconds | integer | Specify the maximum age in number of seconds | - |
duration | string | Specify the maximum age as a human readable duration (example: 1 hour) | - |
seconds
Type: integer
Specify the maximum age in number of seconds
duration
Type: string
Specify the maximum age as a human readable duration (example: 1 hour)
mode
Type: AzureBlobInputMode
The operating mode for this input
Field Name | Type | Description | Default |
---|---|---|---|
list-blobs | List Blobs | - | |
download-blob | Download Given Blobs | - | |
list-and-download-blobs | List Blobs and Download | - |
list-blobs
List Blobs
download-blob
Download Given Blobs
list-and-download-blobs
List Blobs and Download
fingerprinting
Type: bool
Default: false
Enable object fingerprinting, which will cause a object to only be downloaded once
fingerprinting-db-path
Type: path
Specify a custom path for the fingerprinting database
maximum-fingerprint-age
Type: MaxAgeSpecifier
Default: 30 days
Remove any object fingerprints older than this from the tracker
Field Name | Type | Description | Default |
---|---|---|---|
seconds | integer | Specify the maximum age in number of seconds | - |
duration | string | Specify the maximum age as a human readable duration (example: 1 hour) | - |
seconds
Type: integer
Specify the maximum age in number of seconds
duration
Type: string
Specify the maximum age as a human readable duration (example: 1 hour)
preprocessors
Type: PreProcessor
Preprocessors (process downloaded data before making it available to the pipeline) these processors will be run in the order they are specified
Field Name | Type | Description | Default |
---|---|---|---|
extension | Preprocess the object or blob based on the extension of the object or blob name (.gz, .parquet) | - | |
gzip | UnGzip the received data | - | |
parquet | Extract the received data as JSON rows from a parquet file | - |
extension
Preprocess the object or blob based on the extension of the object or blob name (.gz, .parquet)
gzip
UnGzip the received data
parquet
Extract the received data as JSON rows from a parquet file