Data flow concepts
Data flows perform tasks in parallel. A data flow begins executing all of its steps simultaneously when started. Any rows that are generated by input steps, such as database query steps, or CSV reading steps, are passed on to connected steps down the line and processed simultaneously.
Data flows start executing all their steps at once. There is no dedicated starting point. If a step has no predecessor step that feeds data into it, it is kickstarted with an empty row as input.
When a step executes, it performs its task, and generates output rows passing them through its output gates to the input gate of any successor step.
A data flow finishes once all steps have finished processing rows.
Data rows are dicts that are carried along the execution path. Steps processing them have the opportunity to add, remove and replace fields. This mechanism allows steps to record pieces of information that subsequent steps can access.
For steps that have a single input gate, the input record is available as the
row variable, and individual fields are available as
in.<field_name>. Steps that accept multiple input gates provide additional or different variables depending on the semantics of the step.
Steps can encounter errors during execution. Files can be missing, databases may be unavailable, data may be in bad format, etc. Some errors are recoverable in the sense that even though they occur, the flow can maintain integrity and continue processing. Some errors are not recoverable in the sense that further processing is not possible, or likely to generate heavily corrupted outcomes.
To deal with errors steps define an error gate. If the step encounters a recoverable error, it tries to continue through the error gate. If a step is connected, execution continues there and the field
error is set on the data record. The
error field contains details about what went wrong. If there is no step connected at the error gate, the flow aborts as a whole. If the step encounters an error that is not recoverable, no attempt is made to continue through the error gate and the flow aborts immediately.
A data flow finishes successfully when all steps finish processing withour error.
Data flows that finish successfully can provide a return value called the result. By default the result value is
nil. The Set Flow Result step can set the result value explicitly. When running a flow through the Run Flow step, the result value is available as one of the step results.
A flow that fails always has a result value of