Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use placeholder runs to show pipeline runs in the dashboard without delay #2048

Merged
merged 58 commits into from
Jan 3, 2024

Conversation

schustmi
Copy link
Contributor

@schustmi schustmi commented Nov 14, 2023

Describe changes

This PR allows us to create a PipelineRun in the database before actually executing the pipeline using the orchestrator.
With this in place, we can now return a reference to this pipeline run when someone is running a pipeline, and can also show the pipeline run in the dashboard immediately.

Notes regarding the migration

This PR adds a unique constraint for the combination of the deployment_id and orchestrator_run_id of the pipeline_run table. These columns were introduced (and have been set to non-null values since)

  • release 0.21.0 for the orchestrator_run_id
  • release 0.34.0 for the deployment_id

For this unique constraint to work, we have to consider these scenarios:

  • pipeline runs that happened before release 0.21.0: For these both columns are NULL. We solve this by writing some unique dummy value in the orchestrator_run_id column.
  • pipeline runs that happened between releases 0.21.0 and 0.34.0: In this case only the orchestrator_run_id is set. This is only a problem if we assume people run with custom orchestrators that do not generate a globally unique orchestrator_run_id.
  • pipeline runs that happened after release 0.34.0: For these both deployment_id and orchestrator_run_id are set and the combination of the two is unique, otherwise it would have failed earlier when trying to run those pipelines.

What the migration currently does not account for:

  • If users manually modified their database to set/delete values of the orchestrator_run_id column.
  • If users deleted deployments in their database, which sets the deployment_id column to None. Similar to above, this is only a problem if we assume people run with custom orchestrators that do not generate a globally unique orchestrator_run_id.

TODO

  • Add a new icon for the initializing state of a pipeline run in the dashboard.
  • Make sure the dashboard can handle an empty orchestrator_environment in a PipelineRun response. It is optional on the model but might not be handled correctly in the dashboard.
    • This works and just displays nothing.

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • If I have added an integration, I have updated the integrations table and the corresponding website section.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

@schustmi schustmi requested a review from fa9r November 14, 2023 16:48
@schustmi
Copy link
Contributor Author

@fa9r Very early/rough draft of an idea I had to show pipeline runs in the dashboard immediately. Let me know if you see any immediate concerns with this, especially when it comes to concurrency. The with_for_update() clause on the select statement IMO should cover this, but maybe I'm missing something.

@github-actions github-actions bot added the internal To filter out internal PRs and issues label Nov 14, 2023
Copy link
Contributor

@fa9r fa9r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally I don't see a reason why this shouldn't work, but let me summarize to check whether I understand it right:

  • If we run a pipeline with a schedule, we'll do the same as before: create a schedule in DB, hand over to orchestrator, create run when the orchestrator starts the first step and link it to the schedule, link other steps to same run by searching for runs by orchestrator ID
  • If we run a pipeline without schedule, we directly write a run to the DB with empty orchestrator ID, hand over to orchestrator, set orchestrator ID when the orchestrator starts the first step, link other steps to same run by searching for similar orchestrator ID

Logically this makes sense to me and I think it should work 👍

The only possible failure case I can see is if we try to create a placeholder run for a deployment that already has a placeholder run. But I'm not sure whether that scenario can be reached or not, @schustmi you should know this best 😁

src/zenml/models/pipeline_run_models.py Outdated Show resolved Hide resolved
src/zenml/new/pipelines/pipeline.py Outdated Show resolved Hide resolved
src/zenml/new/pipelines/pipeline.py Outdated Show resolved Hide resolved
src/zenml/new/pipelines/pipeline.py Outdated Show resolved Hide resolved
src/zenml/new/pipelines/pipeline.py Outdated Show resolved Hide resolved
src/zenml/zen_stores/schemas/pipeline_run_schemas.py Outdated Show resolved Hide resolved
src/zenml/zen_stores/sql_zen_store.py Outdated Show resolved Hide resolved
src/zenml/zen_stores/sql_zen_store.py Outdated Show resolved Hide resolved
src/zenml/zen_stores/schemas/pipeline_run_schemas.py Outdated Show resolved Hide resolved
@schustmi schustmi changed the title Use placeholder pipeline run to show it in the dashboard immediately Use placeholder runs to show pipeline runs in the dashboard immediately Nov 15, 2023
@schustmi schustmi changed the title Use placeholder runs to show pipeline runs in the dashboard immediately Use placeholder runs to show pipeline runs in the dashboard without delay Nov 15, 2023
@schustmi schustmi force-pushed the placeholder-pipeline-run-poc branch 4 times, most recently from a7aed3b to b09053a Compare November 25, 2023 16:19
Copy link
Contributor

E2E template updates in examples/e2e have been pushed.

Copy link
Contributor

coderabbitai bot commented Dec 20, 2023

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • You can directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • You can tag CodeRabbit on specific lines of code or entire files in the PR by tagging @coderabbitai in a comment. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • You can tag @coderabbitai in a PR comment and ask questions about the PR and the codebase. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid.
    • @coderabbitai read the files in the src/scheduler package and generate README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@schustmi schustmi merged commit 79d967e into develop Jan 3, 2024
31 of 33 checks passed
@schustmi schustmi deleted the placeholder-pipeline-run-poc branch January 3, 2024 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal To filter out internal PRs and issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants