Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement caching #387

Merged
merged 13 commits into from
Aug 29, 2023
Merged

Implement caching #387

merged 13 commits into from
Aug 29, 2023

Conversation

PhilippeMoussalli
Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli commented Aug 24, 2023

PR that implements caching:

  • The component cache is estimated at compile time based on the images tag, input parameters, component spec and component resources (GPU, nodepool). Caching is enabled by default at the componentOp level and disabled if the image has the latest tag

  • At runtime, the cache key is checked against the base path to check for matching executions based on a the cache key (currently appended to the manifest name).

  • If multiple matching executions are found, we return the latest one. This makes it easier to develop. For example, if we're working with the dev tag and we disable caching to force the execution of a new version of the component (scenario where the spec and parameters are the same). Then when we enable caching again, we would only load the manifest from the latest execution.

  • A component is cached only if it's enabled at the Op level, a matching manfiest is found and previous component's run is cached. To check whether the previous component is cached or not. We check if the run-id of the previous component (input manifest) has the same run-id as the current component. If they match, it means that the previous component had to execute (manifest not loaded from previous pipeline run). Note that this implies that we don't change the loaded manifest's run-id to that of the current pipeline run. This might not be ideal for lineage tracking but I see no other way of doing this check.

@RobbeSneyders
Copy link
Member

Thanks @PhilippeMoussalli

Still have to look at the code, but already two questions:

  • A component is cached only if it's enabled at the Op level, a matching manfiest is found and previous component's run is not cached.

    I assume this "not" is a mistake, and we only cache if the previous component was also cached?

  • Can you add some documentation on the caching as well?

@PhilippeMoussalli
Copy link
Contributor Author

Thanks @PhilippeMoussalli

Still have to look at the code, but already two questions:

  • A component is cached only if it's enabled at the Op level, a matching manfiest is found and previous component's run is not cached.

    I assume this "not" is a mistake, and we only cache if the previous component was also cached?

  • Can you add some documentation on the caching as well?

Yeah indeed, was a mistake.

I will add documentation

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PhilippeMoussalli. Looks good to me apart from some small comments.

Comment on lines 8 to 13
FSSPEC_SCHEME_DICT = {
"file": "file",
"s3": "s3",
"gs": "gcs",
"abfs": "abfs",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this mapping. fsspec.filesystem() works with the protocol directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, adjusted it

component,
manifest=inner_input_manifest,
)
inner_output_manifest = input_manifest.evolve(component_spec=self.spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inner_output_manifest = input_manifest.evolve(component_spec=self.spec)
inner_output_manifest = inner_input_manifest.evolve(component_spec=self.spec)

Maybe it's better to move this into a separate method to prevent these kind of issues.

@RobbeSneyders RobbeSneyders merged commit fe97740 into main Aug 29, 2023
5 checks passed
@RobbeSneyders RobbeSneyders deleted the implement-caching branch August 29, 2023 09:28
@RobbeSneyders
Copy link
Member

Thanks @PhilippeMoussalli. Enormous effort which improves Fondant a lot!

Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
PR that implements caching: 

* The component cache is estimated at compile time based on the images
tag, input parameters, component spec and component resources (GPU,
nodepool). Caching is enabled by default at the componentOp level and
disabled if the image has the `latest` tag

* At runtime, the cache key is checked against the base path to check
for matching executions based on a the cache key (currently appended to
the manifest name).

* If multiple matching executions are found, we return the latest one.
This makes it easier to develop. For example, if we're working with the
`dev` tag and we disable caching to force the execution of a new version
of the component (scenario where the spec and parameters are the same).
Then when we enable caching again, we would only load the manifest from
the latest execution.
 
* A component is cached only if it's enabled at the Op level, a matching
manfiest is found and previous component's run is cached. To check
whether the previous component is cached or not. We check if the run-id
of the previous component (input manifest) has the same run-id as the
current component. If they match, it means that the previous component
had to execute (manifest not loaded from previous pipeline run). Note
that this implies that we don't change the loaded manifest's run-id to
that of the current pipeline run. This might not be ideal for lineage
tracking but I see no other way of doing this check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants