This is not an official repo but an summary of this dicussion.
Each components consist of a Driver, Executor, Publisher. The driver and publisher interact with the metadata store, while the Executor is the actual entity that executes your code. Have a look the official guide for a detailed explanation.
- Artifacts: inputs / outputs of an execution. The artifacts are things that are produced by upstream components and consumed by downstream components. e.g. an example file, a model, etc
- Execution properties: other parameters that are used by an execution. The impact of execution properties stay within the execution and is used to describe / distinguish an execution.
- Execution: An execution takes input artifacts and process them based on potential execution properties and produces output artifacts.
- Channel: Channel stands for a collection of Artifacts that share the same Artifact type and (optional) other properties. Thus, any TfxArtifact in a Channel should have the same type.
You can think of an artifact as some kind of file. Your executor takes the input file(s), makes some changes to it and then writes it back to the output file(s).
A Head component is the first component in the pipeline, so it has to manually create the first artifact(s). All other downstream component only use the artifacts from its upstream components, so they don't have to create artifacts themselves since they have been created by their upstream components
- An artifact is represented as TfxArtifact (tfx.utils.types)
- A channel is represented as Channel (tfx.utils.channel)
- A Executor can be overloaded from a BaseExecutor (tfx.components.base.base_executor)
- A Component can be overloaded from a BaseComponent (tfx.components.base.base_component)
- Each Component expects a ComponentSpec (tfx.components.base.base_component)
consisting of:
- Inputs of type Channel
- Execution Parameter of no specific type
- Outputs of type Channel
Every Channel consist of one or more TfxArtifact. They all have to share the same type_name. The name itself can be chosen freely.
There are two examples in this repo, one for a head component , assuming your component is the first or only to run. The second one for an downstream component (in this case a modified version of the Model Validator)