-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial pipeline execution #292
Conversation
8824bfd
to
6f2be86
Compare
6f2be86
to
1345e43
Compare
f818264
to
35a825b
Compare
Thanks @PhilippeMoussalli! I think we first need to think about how we want the partial execution to work:
Full blown caching is probably too much at this moment, but I'd like to make sure that the steps we take are towards it. |
Thanks for the elaborate feedback.
I think this might be the easiest approach since I'm not sure if it can be done at the runner level (at least for the docker based one) We need to check two things in this case:
So the advantage here over the current implemented approach is that now we have an automatic detection of when components have been changed and the user will no longer have to specify a That being said, it's still not very clear to me where the comparison of the components should be done. The easy way would be to generate a new Let me know what you think. |
Thanks for the response @PhilippeMoussalli and sorry for the delay, I needed some time to wrap my head around this 🤯 I'm just going to try and summarize my view based on your response: I think there are two high level ways to tackle this:
So let's continue with that last option. I would not leverage the The new run should always have a new run id, we should never overwrite any older runs, as it breaks the lineage. We do need the run_id of the previous run to compare though, and I'm not yet sure how we should decide which run this is. We can probably start by having the user pass it in manually. We probably also need a flag to disable caching, as you might want to re-run a component because something external changed (eg. external data you're reading). |
I still think it needs to both a combination of compile and runtime. I don't see how else would we run the pipeline with different arguments without generating a different pipeline spec. I'v draw a diagram to make this a little more concrete This has the advantage that you don't need to specify which pipeline run to resume from. It is a bit similar to how Vertex describes their caching approach link. It seems like they do compile again and check for matching component execution Some caveats based on your previous comments
Yes this is especially relevant if your images get updated and you're running the same specification twice, we could instead move the calculation of the cache key to the component itself and calculate it at runtime.
This can still be equivalent to the hash key, not sure if we need to log a full summary. Otherwise, I would just include it in the manifest as part of the metadata.
I agree with running it with a new
Yes I agree, we can introduce this at the |
closed in favor of #313 |
PR that enables executing components using the local runner with partial execution (starting from a specific checkpoint). The user can either resume from the last un-run component. This will make it easier to work on specific component during development especially if the pipeline consists of multiple components or if one of the components is difficult to run (e.g. laion or GPU dependent component).
Usage example:
or specify a specific component to resume the component run from to run from an earlier executed component
I changed the structure in which the manifest was written locally from
component_name/manifest.json
tocomponent_name/run_id/manifest.json
to avoid overwriting the manifest on different pipeline runsThe approach depends on modifying the docker compose file.
if you only want to run the second component, it becomes
I don't think there is a way to only work with the original
docker-compose
file and only specify the services to run because the chaining of the components needs to be modified in the file. More details on this here.