-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: in-memory caching #1207
feat: in-memory caching #1207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 1dcf954 in 50 seconds
More details
- Looked at
1126
lines of code in12
files - Skipped
1
files when reviewing. - Skipped posting
6
drafted comments based on config settings.
1. hamilton/caching/stores/memory.py:52
- Draft comment:
Consider handling the case wherecache_key
does not exist to avoidKeyError
. You can useself._data_versions.pop(cache_key, None)
to safely delete without raising an error if the key is not found. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is suggesting a change to prevent potential runtime errors. This is a valid concern as usingdel
without checking if the key exists can lead to aKeyError
. The suggestion to usepop
is a practical solution to handle this case safely.
The current implementation might be intentional if the author wants to ensure that the key exists before deletion. However, the comment is still valid as it suggests a safer alternative.
Even if the current implementation is intentional, the comment provides a safer alternative that could prevent potential errors.
The comment is useful as it suggests a safer way to handle key deletion, which could prevent runtime errors. It should be kept.
2. hamilton/caching/stores/memory.py:189
- Draft comment:
Consider handling the case wheredata_version
does not exist to avoidKeyError
. You can useself._results.pop(data_version, None)
to safely delete without raising an error if the key is not found. - Reason this comment was not posted:
Marked as duplicate.
3. hamilton/caching/stores/file.py:5
- Draft comment:
The pattern for handling the ImportError for theoverride
decorator is repeated in multiple files. Consider refactoring this into a utility function or a common module to adhere to the DRY principle. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is about a change made in the diff, specifically the handling of the ImportError for the 'override' decorator. The suggestion to refactor this pattern into a utility function aligns with the DRY principle, which is a valid code quality improvement.
The comment assumes that this pattern is repeated in multiple files, which is not verifiable from the current diff. Without evidence of repetition, the suggestion might not be necessary.
Even if the pattern is not repeated in this diff, the suggestion to refactor for potential reuse is a proactive code quality improvement that aligns with best practices.
Keep the comment as it suggests a valid code quality improvement by adhering to the DRY principle, even if the repetition is not evident in this diff alone.
4. hamilton/caching/stores/memory.py:4
- Draft comment:
The pattern for handling the ImportError for theoverride
decorator is repeated in multiple files. Consider refactoring this into a utility function or a common module to adhere to the DRY principle. This comment also applies tohamilton/caching/stores/file.py
. - Reason this comment was not posted:
Marked as duplicate.
5. hamilton/caching/stores/memory.py:5
- Draft comment:
The pattern for handling the ImportError for theoverride
decorator is repeated in multiple files. Consider refactoring this into a utility function or a common module to adhere to the DRY principle. This comment also applies tohamilton/caching/stores/file.py
. - Reason this comment was not posted:
Marked as duplicate.
6. hamilton/caching/stores/sqlite.py:27
- Draft comment:
The pattern for handling the ImportError for theoverride
decorator is repeated in multiple files. Consider refactoring this into a utility function or a common module to adhere to the DRY principle. This comment also applies tohamilton/caching/stores/file.py
andhamilton/caching/stores/memory.py
. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_SiaYX41SEhZlQiVm
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 7305b81 in 5 seconds
More details
- Looked at
11
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. hamilton/caching/stores/memory.py:1
- Draft comment:
The import ofSequence
fromcollections.abc
is replaced by importing it directly fromtyping
. This is a valid change asSequence
is available intyping
. - Reason this comment was not posted:
Confidence changes required:0%
The import ofSequence
fromcollections.abc
is replaced by importing it directly fromtyping
. This is a valid change asSequence
is available intyping
.
2. hamilton/caching/stores/memory.py:1
- Draft comment:
Consider adding this change to thedocs/
to note the import source change forSequence
. - Reason this comment was not posted:
Confidence changes required:33%
The import statement forSequence
was moved fromcollections.abc
totyping
, which is a minor change but should be noted for documentation purposes.
Workflow ID: wflow_E7c8x7XeehMSZSUW
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A few nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on e1a7b00 in 29 seconds
More details
- Looked at
274
lines of code in6
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. hamilton/caching/stores/sqlite.py:33
- Draft comment:
self._directory
should be converted to a string for serialization.
state["path"] = str(self._directory)
- Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment seems to address a potential issue with serialization ofpathlib.Path
objects. Since__getstate__
is meant for serialization, andpathlib.Path
might not serialize correctly in all contexts, converting it to a string could be a valid suggestion. However, I need to ensure that this is indeed a necessary change and not just a preference.
I might be overestimating the importance of convertingpathlib.Path
to a string. If the current implementation works correctly in all intended contexts, the suggestion might be unnecessary.
Given that__getstate__
is used for serialization, andpathlib.Path
objects might not serialize as expected, converting to a string could prevent potential issues. It's better to ensure compatibility across different serialization contexts.
The comment is likely valid as it addresses a potential serialization issue withpathlib.Path
. Converting to a string ensures compatibility in serialization contexts.
2. tests/caching/metadata_store/test_base.py:54
- Draft comment:
Consider re-addingmetadata_store.delete_all()
to ensure a clean test environment after each test. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The removal ofmetadata_store.delete_all()
could lead to tests not having a clean state, which is important for test reliability. The comment is suggesting a change that could prevent potential test contamination. This seems like a valid concern, as ensuring a clean test environment is crucial for accurate test results.
The comment assumes that the removal ofmetadata_store.delete_all()
will lead to issues, but without more context on how the tests are isolated, it's speculative. The tests might be designed to handle state in another way.
Even if the tests are designed to handle state differently, the removal of a cleanup step is a significant change that could impact test reliability. It's better to err on the side of caution and ensure tests are isolated.
The comment should be kept as it addresses a potential issue with test isolation due to the removal of a cleanup step in the fixture.
3. tests/caching/test_integration.py:1
- Draft comment:
The code in this file is complex and would benefit from comments explaining the purpose and flow of the tests. - Reason this comment was not posted:
Comment was on unchanged code.
4. tests/caching/result_store/test_base.py:28
- Draft comment:
This file is missing tests for shared properties across result store implementations. Consider adding tests to ensure compliance with the Single Responsibility Principle. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_iSWCNrxoyZZRJBPG
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on 29dbe1c in 28 seconds
More details
- Looked at
17
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. hamilton/caching/stores/file.py:23
- Draft comment:
Theset
method is handling both path creation and result saving, which violates the Single Responsibility Principle. Consider refactoring to separate these responsibilities. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_YTmC6Qa2eF4VEKkg
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
@@ -14,6 +20,12 @@ def __init__(self, path: str, create_dir: bool = True) -> None: | |||
if self.create_dir: | |||
self.path.mkdir(exist_ok=True, parents=True) | |||
|
|||
def __getstate__(self) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __getstate__
method only serializes the path
attribute. If create_dir
is also important for the object's state, consider including it in the serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a few more small nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 13a0ec3 in 14 seconds
More details
- Looked at
87
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
3
drafted comments based on config settings.
1. tests/caching/metadata_store/test_memory.py:37
- Draft comment:
InMemoryMetadataStore
does not have asize
attribute. Uselen(in_memory_metadata_store)
andlen(metadata_store)
to compare sizes instead. - Reason this comment was not posted:
Comment was on unchanged code.
2. hamilton/caching/stores/memory.py:276
- Draft comment:
Consider handling the case whereresult
isNone
to avoid ambiguity when adata_version
is not found inresult_store
. - Reason this comment was not posted:
Comment was on unchanged code.
3. hamilton/caching/stores/memory.py:276
- Draft comment:
Consider adding comments to theload_from
method to explain the logic, especially around how data is retrieved fromresult_store
and set inin_memory_result_store
. This will aid in understanding the data flow and the purpose of each step. - Reason this comment was not posted:
Confidence changes required:50%
Theload_from
method inInMemoryResultStore
is complex and involves multiple steps to retrieve and set data. Adding comments would help clarify the process.
Workflow ID: wflow_d2C4hfGZ5MXVLHcU
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Added in-memory caching. Stores exist in the Python process, but are not persisted to disk after each node execution. The entire stored can be persisted or loaded via new methods. Slight refactor to the `HamiltonCacheAdapter` means that implementations supporting multiprocessing need to implement a `__getstate__` method. Note that for `Python >= 3.11` all objects have a default implementation of `__getstate__` but it's not the case for earlier versions. --------- Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>
This feature was requested by user. The goal is to enable caching without persisting results to disk. This is useful when using interactive Python sessions, notebooks, or even webservices, and don't want to use disk space.
In-memory caching requires implementing both a metadata and a result store in memory. It was also the opportunity to introduce
.load_from()
and.persist_to()
which allows to load and save stores via other implementations.sidenote: This feature could exist on any implementation actually (e.g.,
S3ResultStore.load_from(FileResultStore(...))
). In the mean time, a similar pattern is currently possible through the in-memory store: