Data flows perform tasks in parallel. A data flow begins executing all of its steps concurrently. Any rows that are generated by input steps, such as database query steps, or CSV reading steps, are passed on to connected steps down the line and processed as soon as they arrive.
Above is a basic data flow structure. Records are read from a database, enriched with master data coming from a file, and written back to the database. All steps execute concurrently, and only a subset of all procesed records are in memory at any given time.
- import standard libraries from
- import tweastreet libraries from
- import from your own modules using a path relative to the flow location such as
Parameters are named expression values that can be passed in when executing a flow. Parameter values are available in the entire flow. They are declared with default values which are used when the flow is invoked without specifying a value for a parameter.
Flow variables are named expression values that are available in the entire flow. They are typically used to specify flow-wide constants. They are also a good place to validate parameters, or calculate derived values from parameters.
Services are named expression values that are available in the entire flow. They describe various kinds of resources or configuration such as database connections, server credentials, etc.
Specifying these items in the services section of the flow makes it easier to define them and reference them when configuring steps to use them.
Flows provide data about themselves in additional flow variables.
Data flows start executing all their steps at once. There is no dedicated starting point. If a step has no predecessor step that feeds data into it, it is kickstarted with an empty row as input.
When a step executes, it performs its task, and potentially generates output rows passing them through its output gates to any connected steps. There are no semantic limitations on how many rows a step produces, or which output gates they are sent to.
A data flow finishes once all steps have finished processing rows.
Data rows are dicts that are carried along the execution path. Steps processing them have the opportunity to read, add, remove and replace fields.
A data flow finishes successfully when all steps finish processing without error.
Data flows that finish successfully can provide a return value called the result. By default the result value is
nil. The Set Flow Result step can set the result value explicitly. When running a flow through the Run Flow step, the flow result value is available as one of the step results.
A flow that fails does not provide a result value, it is always