Make IncrementalDataset
's confirms
"namespaced"
#4039
Labels
Issue: Feature Request
New feature or improvement to existing feature
Description
I have a namespace-based incremental dataset and wish to use the confirms attribute to trigger CHECKPOINT update further down my pipeline. However, based on discussions on Slack, it seems that incremental datasets are not meant to be used within namespaces and so
confirms
is not "namespaced" by design.Following discussion with @noklam on Slack, it seems that my use case could justify having "namespaced"
confirms
.Context
I have many devices that regularly record event files and push it to a S3 bucket. I would like to run a preprocessing pipeline that is different for each device and that would for each of them:
Then , I use the concatenation of all recorded preprocessed event seen so far for data science purposes.
The way I achieve this with Kedro is:
IncrementalDataset
and the concatenated dataframe is saved using a versioned ParquetDatasetPartionedDataset
that is able to find all preprocessed recorded event computer so far (withload_args
withdirs
andmax_depth
set accordingly)Those steps are done for each device, so I use namespace to reuse the same logic for all of them varying the S3 bucket path. I need the confirms to be at step 2 because only then I can consider new files to have been processed.
Workaround
@noklam suggested to try putting the namespace in the argument, e.g. confirms=namespace.data, as a workaround and I can confirm this worked.
The text was updated successfully, but these errors were encountered: