-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement caching #387
Implement caching #387
Conversation
31c9272
to
0e88ede
Compare
fd3e17e
to
39f6f20
Compare
Thanks @PhilippeMoussalli Still have to look at the code, but already two questions:
|
Yeah indeed, was a mistake. I will add documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PhilippeMoussalli. Looks good to me apart from some small comments.
src/fondant/filesystem.py
Outdated
FSSPEC_SCHEME_DICT = { | ||
"file": "file", | ||
"s3": "s3", | ||
"gs": "gcs", | ||
"abfs": "abfs", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this mapping. fsspec.filesystem()
works with the protocol directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, adjusted it
src/fondant/executor.py
Outdated
component, | ||
manifest=inner_input_manifest, | ||
) | ||
inner_output_manifest = input_manifest.evolve(component_spec=self.spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inner_output_manifest = input_manifest.evolve(component_spec=self.spec) | |
inner_output_manifest = inner_input_manifest.evolve(component_spec=self.spec) |
Maybe it's better to move this into a separate method to prevent these kind of issues.
Thanks @PhilippeMoussalli. Enormous effort which improves Fondant a lot! |
PR that implements caching: * The component cache is estimated at compile time based on the images tag, input parameters, component spec and component resources (GPU, nodepool). Caching is enabled by default at the componentOp level and disabled if the image has the `latest` tag * At runtime, the cache key is checked against the base path to check for matching executions based on a the cache key (currently appended to the manifest name). * If multiple matching executions are found, we return the latest one. This makes it easier to develop. For example, if we're working with the `dev` tag and we disable caching to force the execution of a new version of the component (scenario where the spec and parameters are the same). Then when we enable caching again, we would only load the manifest from the latest execution. * A component is cached only if it's enabled at the Op level, a matching manfiest is found and previous component's run is cached. To check whether the previous component is cached or not. We check if the run-id of the previous component (input manifest) has the same run-id as the current component. If they match, it means that the previous component had to execute (manifest not loaded from previous pipeline run). Note that this implies that we don't change the loaded manifest's run-id to that of the current pipeline run. This might not be ideal for lineage tracking but I see no other way of doing this check.
PR that implements caching:
The component cache is estimated at compile time based on the images tag, input parameters, component spec and component resources (GPU, nodepool). Caching is enabled by default at the componentOp level and disabled if the image has the
latest
tagAt runtime, the cache key is checked against the base path to check for matching executions based on a the cache key (currently appended to the manifest name).
If multiple matching executions are found, we return the latest one. This makes it easier to develop. For example, if we're working with the
dev
tag and we disable caching to force the execution of a new version of the component (scenario where the spec and parameters are the same). Then when we enable caching again, we would only load the manifest from the latest execution.A component is cached only if it's enabled at the Op level, a matching manfiest is found and previous component's run is cached. To check whether the previous component is cached or not. We check if the run-id of the previous component (input manifest) has the same run-id as the current component. If they match, it means that the previous component had to execute (manifest not loaded from previous pipeline run). Note that this implies that we don't change the loaded manifest's run-id to that of the current pipeline run. This might not be ideal for lineage tracking but I see no other way of doing this check.