-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: caching SQLiteMetadataStore.get_run_ids()
#1205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, unclear about a few of the other API changes (**kwargs
), but I think we should ship it! Also docs
node_name: str = None, | ||
code_version: str = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be backwards incompatible. But if we're the only ones using this, then all good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not because the HamiltonCacheAdapter
is the only object passing values here. As shown in the updated logic of SQLiteMetadataStore.set()
it will resolve the values of node_name
and code_version
from the cache_key
so these values will never be None
(for instance, the SQL table still enforce NOT NULL
constraints)
For instance, the MetadataStore
base class doesn't expose node_name
and code_version
as arguments because other implementations might no provide them directly. This information is redundant with the content of cache_key
.
The SQLiteMetadataStore
implementation uses node_name
and code_version
explicitly to allow indexing and cache management (e.g., delete all instance of this node). These values could be derived by decoding cache_key
but that would be a wasteful operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's more that if people were not using keyword args it would be break due to the change in argument order; fix is easy though as you just update all callers to the new order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I didn't see any update to callers of this function hence why I asked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on e6958eb in 14 seconds
More details
- Looked at
38
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. hamilton/caching/stores/sqlite.py:210
- Draft comment:
Consider implementing a caching mechanism forget_run_ids()
to improve performance, especially if this method is called frequently. This could involve storing the result in memory and invalidating the cache when new runs are initialized. - Reason this comment was not posted:
Comment was on unchanged code.
2. hamilton/caching/stores/sqlite.py:210
- Draft comment:
This function is complex and would benefit from a comment explaining its purpose and main steps. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_YCg2stvuI4USVCy2
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182901 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182885 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182883 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182877 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182862 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182842 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182748 +0530 Update materialization.rst Fix mutate docstring Docstring raised Pytest escape error due to /* so we wrap it in quotes. Improve pipe_output first node naming Current name convention is very prone to name clashes with user naming. We assign the same naming convention using namespace.raw to indicate the first node. Provide dockerx setup for docker builds (DAGWorks-Inc#1194) This change adds a script for multi-platform docker builds as well as a github workflow to automatically build them when a new sf-hamilton-ui version has been published. Squashed commits: * Created buildx_and_push.sh script in ui directory to create multi-platform docker builds * Fixed content related issue in buildx_and_push.sh * buildx_and_push.sh: Added functionality to fecth the latest version from PyPi * buildx_and_push.sh: Added check_buildx_installed * buildx_and_push.sh: Added check_buildx_installed * buildx_and_push.sh: Enhanced error handling(checking if jq exists, curl response handling and docker buildx error handling) * Adding build args to buildx_and_push.sh * buildx_and_push.sh: Changes made to test a new workflow * Created a new workflow: hamilton-ui-build-and-push * buildx_and_push.sh: echo statement added to debug * buildx_and_push.sh: cd'ing to the directory(ui) where the shell script is located to prevent context related errors in workflow. * hamilton-ui-build-and-push.yml: Added Docker Hub login step * buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork. * hamilton-ui-build-and-push.yml: Replaced previous version from cache with version tag from Dockerhub image. * buildx_and_push.sh: Changed dockerhub username for testing * hamilton-ui-build-and-push.yml: Minor change in the docker registry URL(version) * hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script * hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT * hamilton-ui-build-and-push.yml: Conditional execution of steps implemented * Undid change in dockerhub username * Update ui/buildx_and_push.sh * Update ui/buildx_and_push.sh * chore: fix pre-commit whitespace issues --------- Co-authored-by: Stefan Krawczyk <stefan@dagworks.io> Fix `keep_dot` propagation in `Driver` display functions Bumps hamilton version from 1.81.0 to 1.81.1 fix: caching `SQLiteMetadataStore.get_run_ids()` (DAGWorks-Inc#1205) * fixed .get_run_ids() and standardized .get_run() + tests * fixed docstrings formatting errors --------- Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2> Bumps hamilton version from 1.81.1 to 1.81.2 buildx_and_push.sh: Changes made to test a new workflow buildx_and_push.sh: echo statement added to debug buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork. buildx_and_push.sh: Changed dockerhub username for testing Undid change in dockerhub username Created a new update external blogs script which updates External Blogs section in learning_resources.md with the latest blogs(with a date cutoff) Added docstring in update_blogs_in_learning_resources.py chore: fix pre-commit whitespace issues update_blogs_in_learning_resources.py: Added a "print to standard out" option (using --print flag) update_blogs_in_learning_resources.py: Argument parser with expaned help text update_blogs_in_learning_resources.py: Expanded docstring Fixes whitespace and small docs typo buildx_and_push.sh: Changes made to test a new workflow buildx_and_push.sh: echo statement added to debug hamilton-ui-build-and-push.yml: Added Docker Hub login step buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork. buildx_and_push.sh: Changed dockerhub username for testing hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT chore: fix pre-commit whitespace issues Update ui/buildx_and_push.sh Update ui/buildx_and_push.sh Fixes whitespace and small docs typo
author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182901 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182885 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182883 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182877 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182862 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182842 +0530 parent 7b9f52a author Dimitry Foures <dimitry.foures@gmail.com> 1729160952 +0100 committer sikehish <hisham0502@gmail.com> 1730182748 +0530 Update materialization.rst Fix mutate docstring Docstring raised Pytest escape error due to /* so we wrap it in quotes. Improve pipe_output first node naming Current name convention is very prone to name clashes with user naming. We assign the same naming convention using namespace.raw to indicate the first node. Provide dockerx setup for docker builds (DAGWorks-Inc#1194) This change adds a script for multi-platform docker builds as well as a github workflow to automatically build them when a new sf-hamilton-ui version has been published. Squashed commits: * Created buildx_and_push.sh script in ui directory to create multi-platform docker builds * Fixed content related issue in buildx_and_push.sh * buildx_and_push.sh: Added functionality to fecth the latest version from PyPi * buildx_and_push.sh: Added check_buildx_installed * buildx_and_push.sh: Added check_buildx_installed * buildx_and_push.sh: Enhanced error handling(checking if jq exists, curl response handling and docker buildx error handling) * Adding build args to buildx_and_push.sh * buildx_and_push.sh: Changes made to test a new workflow * Created a new workflow: hamilton-ui-build-and-push * buildx_and_push.sh: echo statement added to debug * buildx_and_push.sh: cd'ing to the directory(ui) where the shell script is located to prevent context related errors in workflow. * hamilton-ui-build-and-push.yml: Added Docker Hub login step * buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork. * hamilton-ui-build-and-push.yml: Replaced previous version from cache with version tag from Dockerhub image. * buildx_and_push.sh: Changed dockerhub username for testing * hamilton-ui-build-and-push.yml: Minor change in the docker registry URL(version) * hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script * hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT * hamilton-ui-build-and-push.yml: Conditional execution of steps implemented * Undid change in dockerhub username * Update ui/buildx_and_push.sh * Update ui/buildx_and_push.sh * chore: fix pre-commit whitespace issues --------- Co-authored-by: Stefan Krawczyk <stefan@dagworks.io> Fix `keep_dot` propagation in `Driver` display functions Bumps hamilton version from 1.81.0 to 1.81.1 fix: caching `SQLiteMetadataStore.get_run_ids()` (DAGWorks-Inc#1205) * fixed .get_run_ids() and standardized .get_run() + tests * fixed docstrings formatting errors --------- Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2> Bumps hamilton version from 1.81.1 to 1.81.2 buildx_and_push.sh: Changes made to test a new workflow buildx_and_push.sh: echo statement added to debug buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork. buildx_and_push.sh: Changed dockerhub username for testing Undid change in dockerhub username Created a new update external blogs script which updates External Blogs section in learning_resources.md with the latest blogs(with a date cutoff) Added docstring in update_blogs_in_learning_resources.py chore: fix pre-commit whitespace issues update_blogs_in_learning_resources.py: Added a "print to standard out" option (using --print flag) update_blogs_in_learning_resources.py: Argument parser with expaned help text update_blogs_in_learning_resources.py: Expanded docstring Fixes whitespace and small docs typo buildx_and_push.sh: Changes made to test a new workflow buildx_and_push.sh: echo statement added to debug hamilton-ui-build-and-push.yml: Added Docker Hub login step buildx_and_push.sh: added dagworks dockerhub username; workflow worked on the fork. buildx_and_push.sh: Changed dockerhub username for testing hamilton-ui-build-and-push.yml: Minor change in the "Fetch highest version from Docker Hub" step's shell script hamilton-ui-build-and-push.yml: Replacing deprecated set-output with GITHUB_OUTPUT chore: fix pre-commit whitespace issues Update ui/buildx_and_push.sh Update ui/buildx_and_push.sh Fixes whitespace and small docs typo
This PR follows issue #1204
It includes a few changes to the
MetadataStore
base class and sets a clearer contract for.get_run()
and.get_run_ids()
..get_run_ids()
should return a list of runs sorted by their start time, i.e., when.initialize()
is called. This order may differ from the order of the first recorded node execution. An empty list will be returned if no run were initialized..get_run()
should return a list of dictionaries, where eachdict
is associated with a particular node execution. Eachdict
must minimally return thedata_version
and thecache_key
(which can be decoded to retrieve thenode_name
,code_version
, anddependencies_data_versions
), but may include more information. AnIndexError
is raised ifrun_id
doesn't exist. An empty list is returned if the run was initialized, but recorded no node execution.Tests were added to catch the bug reported in the original issue.