Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: caching SQLiteMetadataStore.get_run_ids() #1205

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Oct 24, 2024

This PR follows issue #1204

It includes a few changes to the MetadataStore base class and sets a clearer contract for .get_run() and .get_run_ids().

  • .get_run_ids() should return a list of runs sorted by their start time, i.e., when .initialize() is called. This order may differ from the order of the first recorded node execution. An empty list will be returned if no run were initialized.

  • .get_run() should return a list of dictionaries, where each dict is associated with a particular node execution. Each dict must minimally return the data_version and the cache_key (which can be decoded to retrieve the node_name, code_version, and dependencies_data_versions), but may include more information. An IndexError is raised if run_id doesn't exist. An empty list is returned if the run was initialized, but recorded no node execution.

Tests were added to catch the bug reported in the original issue.

ellipsis-dev[bot]

This comment was marked as off-topic.

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, unclear about a few of the other API changes (**kwargs), but I think we should ship it! Also docs

Comment on lines +112 to +113
node_name: str = None,
code_version: str = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be backwards incompatible. But if we're the only ones using this, then all good.

Copy link
Collaborator Author

@zilto zilto Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not because the HamiltonCacheAdapter is the only object passing values here. As shown in the updated logic of SQLiteMetadataStore.set() it will resolve the values of node_name and code_version from the cache_key so these values will never be None (for instance, the SQL table still enforce NOT NULL constraints)

For instance, the MetadataStore base class doesn't expose node_name and code_version as arguments because other implementations might no provide them directly. This information is redundant with the content of cache_key.

The SQLiteMetadataStore implementation uses node_name and code_version explicitly to allow indexing and cache management (e.g., delete all instance of this node). These values could be derived by decoding cache_key but that would be a wasteful operation

Copy link
Collaborator

@skrawcz skrawcz Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more that if people were not using keyword args it would be break due to the change in argument order; fix is easy though as you just update all callers to the new order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't see any update to callers of this function hence why I asked)

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on e6958eb in 14 seconds

More details
  • Looked at 38 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. hamilton/caching/stores/sqlite.py:210
  • Draft comment:
    Consider implementing a caching mechanism for get_run_ids() to improve performance, especially if this method is called frequently. This could involve storing the result in memory and invalidating the cache when new runs are initialized.
  • Reason this comment was not posted:
    Comment was on unchanged code.
2. hamilton/caching/stores/sqlite.py:210
  • Draft comment:
    This function is complex and would benefit from a comment explaining its purpose and main steps.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_YCg2stvuI4USVCy2


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@elijahbenizzy elijahbenizzy merged commit 2ff4de5 into main Oct 24, 2024
24 checks passed
@elijahbenizzy elijahbenizzy deleted the fix/sqlite-metadata-store branch October 24, 2024 19:42
sikehish pushed a commit to sikehish/hamilton that referenced this pull request Oct 29, 2024
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182901 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182885 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182883 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182877 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182862 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182842 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182748 +0530

Update materialization.rst

Fix mutate docstring

Docstring raised Pytest escape error due to /* so we wrap it in quotes.

Improve pipe_output first node naming

Current name convention is very prone to name clashes with user naming.

We assign the same naming convention using namespace.raw to indicate the
first node.

Provide dockerx setup for docker builds (DAGWorks-Inc#1194)

This change adds a script for multi-platform docker builds as
well as a github workflow to automatically build them when a new sf-hamilton-ui version
has been published.

Squashed commits:

* Created buildx_and_push.sh script in ui directory to create multi-platform docker builds

* Fixed content related issue in buildx_and_push.sh

* buildx_and_push.sh: Added functionality to fecth the latest version from PyPi

* buildx_and_push.sh: Added check_buildx_installed

* buildx_and_push.sh: Added check_buildx_installed

* buildx_and_push.sh: Enhanced error handling(checking if jq exists, curl response handling and docker buildx error handling)

* Adding build args to buildx_and_push.sh

* buildx_and_push.sh: Changes made to test a new workflow

* Created a new workflow: hamilton-ui-build-and-push

* buildx_and_push.sh: echo statement added to debug

* buildx_and_push.sh: cd'ing to the directory(ui) where the shell script is located to prevent context related errors in workflow.

* hamilton-ui-build-and-push.yml: Added Docker Hub login step

* buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork.

* hamilton-ui-build-and-push.yml: Replaced previous version from cache with version tag from Dockerhub image.

* buildx_and_push.sh: Changed dockerhub username for testing

* hamilton-ui-build-and-push.yml: Minor change in the docker registry URL(version)

* hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script

* hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT

* hamilton-ui-build-and-push.yml: Conditional execution of steps implemented

* Undid change in dockerhub username

* Update ui/buildx_and_push.sh

* Update ui/buildx_and_push.sh

* chore: fix pre-commit whitespace issues

---------

Co-authored-by: Stefan Krawczyk <stefan@dagworks.io>

Fix `keep_dot` propagation in `Driver` display functions

Bumps hamilton version from 1.81.0 to 1.81.1

fix: caching `SQLiteMetadataStore.get_run_ids()` (DAGWorks-Inc#1205)

* fixed .get_run_ids() and standardized .get_run() + tests

* fixed docstrings formatting errors

---------

Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>

Bumps hamilton version from 1.81.1 to 1.81.2

buildx_and_push.sh: Changes made to test a new workflow

buildx_and_push.sh: echo statement added to debug

buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork.

buildx_and_push.sh: Changed dockerhub username for testing

Undid change in dockerhub username

Created a new update external blogs script which updates External Blogs section in learning_resources.md with the latest blogs(with a date cutoff)

Added docstring in update_blogs_in_learning_resources.py

chore: fix pre-commit whitespace issues

update_blogs_in_learning_resources.py: Added a "print to standard out" option (using --print flag)

update_blogs_in_learning_resources.py: Argument parser with expaned help text

update_blogs_in_learning_resources.py: Expanded docstring

Fixes whitespace and small docs typo

buildx_and_push.sh: Changes made to test a new workflow

buildx_and_push.sh: echo statement added to debug

hamilton-ui-build-and-push.yml: Added Docker Hub login step

buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork.

buildx_and_push.sh: Changed dockerhub username for testing

hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script

hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT

chore: fix pre-commit whitespace issues

Update ui/buildx_and_push.sh

Update ui/buildx_and_push.sh

Fixes whitespace and small docs typo
skrawcz pushed a commit to sikehish/hamilton that referenced this pull request Oct 30, 2024
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182901 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182885 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182883 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182877 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182862 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182842 +0530

parent 7b9f52a
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100
committer sikehish <hisham0502@gmail.com> 1730182748 +0530

Update materialization.rst

Fix mutate docstring

Docstring raised Pytest escape error due to /* so we wrap it in quotes.

Improve pipe_output first node naming

Current name convention is very prone to name clashes with user naming.

We assign the same naming convention using namespace.raw to indicate the
first node.

Provide dockerx setup for docker builds (DAGWorks-Inc#1194)

This change adds a script for multi-platform docker builds as
well as a github workflow to automatically build them when a new sf-hamilton-ui version
has been published.

Squashed commits:

* Created buildx_and_push.sh script in ui directory to create multi-platform docker builds

* Fixed content related issue in buildx_and_push.sh

* buildx_and_push.sh: Added functionality to fecth the latest version from PyPi

* buildx_and_push.sh: Added check_buildx_installed

* buildx_and_push.sh: Added check_buildx_installed

* buildx_and_push.sh: Enhanced error handling(checking if jq exists, curl response handling and docker buildx error handling)

* Adding build args to buildx_and_push.sh

* buildx_and_push.sh: Changes made to test a new workflow

* Created a new workflow: hamilton-ui-build-and-push

* buildx_and_push.sh: echo statement added to debug

* buildx_and_push.sh: cd'ing to the directory(ui) where the shell script is located to prevent context related errors in workflow.

* hamilton-ui-build-and-push.yml: Added Docker Hub login step

* buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork.

* hamilton-ui-build-and-push.yml: Replaced previous version from cache with version tag from Dockerhub image.

* buildx_and_push.sh: Changed dockerhub username for testing

* hamilton-ui-build-and-push.yml: Minor change in the docker registry URL(version)

* hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script

* hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT

* hamilton-ui-build-and-push.yml: Conditional execution of steps implemented

* Undid change in dockerhub username

* Update ui/buildx_and_push.sh

* Update ui/buildx_and_push.sh

* chore: fix pre-commit whitespace issues

---------

Co-authored-by: Stefan Krawczyk <stefan@dagworks.io>

Fix `keep_dot` propagation in `Driver` display functions

Bumps hamilton version from 1.81.0 to 1.81.1

fix: caching `SQLiteMetadataStore.get_run_ids()` (DAGWorks-Inc#1205)

* fixed .get_run_ids() and standardized .get_run() + tests

* fixed docstrings formatting errors

---------

Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>

Bumps hamilton version from 1.81.1 to 1.81.2

buildx_and_push.sh: Changes made to test a new workflow

buildx_and_push.sh: echo statement added to debug

buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork.

buildx_and_push.sh: Changed dockerhub username for testing

Undid change in dockerhub username

Created a new update external blogs script which updates External Blogs section in learning_resources.md with the latest blogs(with a date cutoff)

Added docstring in update_blogs_in_learning_resources.py

chore: fix pre-commit whitespace issues

update_blogs_in_learning_resources.py: Added a "print to standard out" option (using --print flag)

update_blogs_in_learning_resources.py: Argument parser with expaned help text

update_blogs_in_learning_resources.py: Expanded docstring

Fixes whitespace and small docs typo

buildx_and_push.sh: Changes made to test a new workflow

buildx_and_push.sh: echo statement added to debug

hamilton-ui-build-and-push.yml: Added Docker Hub login step

buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork.

buildx_and_push.sh: Changed dockerhub username for testing

hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script

hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT

chore: fix pre-commit whitespace issues

Update ui/buildx_and_push.sh

Update ui/buildx_and_push.sh

Fixes whitespace and small docs typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants