[WIP] Attempt to continue LRU cache for decoded chunks #1214

croth1 · 2022-10-23T18:41:41Z

Continuation attempt of #306 - do not merge yet - ~~very likely still not correct behaviour when chunks get deleted from store by #738~~.

Still have very limited knowledge of the code base - will require me to dig a bit deeper to gain bit more understanding.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

…type=object. See this: zarr-developers#306 (comment)

… decode round trip for object arrays(with tests)

sync with master

factoring out mapping code from LRUStoreCache and LRUChunkCache

croth1 · 2022-10-23T20:58:37Z

~~Hmh, ok - the diff of the merge commit looks still a bit insane - some code from other PRs somehow leaked in/got duplicated - or~~ I am just bad at interpreting the diff. ~~Will investigate tomorrow what happened there.~~

croth1 · 2022-10-23T21:20:08Z

zarr/tests/test_storage.py

 class StoreTests(MutableMappingStoreTests):
    """Abstract store tests."""
+


IIRC, I needed to move this one, because LRUChunkCache doesn't implement the contextmanager interface - maybe implement it instead?

croth1 · 2022-10-24T06:00:16Z

zarr/core.py

@@ -87,6 +87,15 @@ class Array:
        read and decompressed when possible.

        .. versionadded:: 2.7


this will need to get updated.

croth1 · 2022-10-24T06:01:40Z

zarr/creation.py

@@ -478,6 +489,16 @@ def open_array(
        non-fill-value data are stored, at the expense of overhead associated
        with checking the data of each chunk.

+        .. versionadded:: 2.7


needs to get updated

joshmoore · 2022-10-24T15:31:34Z

Kicked off the GHA workflows.

codecov · 2022-10-24T15:47:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.99%. Comparing base (f361631) to head (16793f5).
Report is 298 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #1214    +/-   ##
========================================
  Coverage   99.99%   99.99%            
========================================
  Files          35       35            
  Lines       14136    14366   +230     
========================================
+ Hits        14135    14365   +230     
  Misses          1        1

Files with missing lines	Coverage Δ
zarr/__init__.py	`100.00% <ø> (ø)`
zarr/core.py	`100.00% <100.00%> (ø)`
zarr/creation.py	`100.00% <ø> (ø)`
zarr/hierarchy.py	`99.78% <100.00%> (+<0.01%)`	⬆️
zarr/storage.py	`100.00% <100.00%> (ø)`
zarr/tests/test_core.py	`100.00% <100.00%> (ø)`
zarr/tests/test_storage.py	`100.00% <100.00%> (ø)`

joshmoore · 2022-10-31T20:53:13Z

Green. Are there any other commits expected from your side, @croth1?

croth1 · 2022-11-01T07:46:15Z

I am still not super sure whether this is the final form or I would rather re-implement it as a caching store ontop of other stores, just like the LRUStoreCache - it feels a bit more intuitive to me. Also I still want to check whether I can do write caching. If I remember correctly, this implementation is write-through. Not sure whether in combination with LRUStoreCache cached writes can be achieved. Need to do more research, but I'm a bit busy right now.

FarzanT · 2023-05-20T03:25:05Z

Hi, any news on when this feature will get pushed to main? It seems like a very useful feature!
I want to reduce the overhead of sampling from the same chunk which is expected to be in RAM. If I'm not mistaken, right now, every time you pass an index to the zarr array, it has to decompress that index. I want to have an option to keep the entire chunk decompressed in the cache until it's discarded. Specifically, I'm using a large zarr array in my pytorch dataset module, and need to randomly iterate through samples in the same chunk before randomly picking the next chunk. Avoiding the decompression overhead is quite useful, and unfortunately I can't leave the data fully decompressed on disk, as it would be too large.

croth1 · 2023-06-02T06:32:28Z

@FarzanT @joshmoore, I have in the meantime changed my approach and would not need this anymore - hence the long silence. If that's useful for other people, we can try finishing it up as is. Would need some rebasing and performance testing, though.

IIRC, last time I checked all tests were green, although this would need a very thorough review because frequently I was not 100% sure what I was doing during the quite substantial rebase.

joshmoore · 2023-06-03T15:07:16Z

I'll leave @FarzanT and others to say how pressing their need is. Happy to help how I can @croth1 if you'd like to pursue this.

FarzanT · 2023-06-06T15:19:37Z

Thank you @croth1 and @joshmoore, my primary use case has also been addressed by #278 (comment), so it's not a pressing issue for me at the moment. But I'd say it would be much nicer to just flip a switch and have zarr handle this internally. If this pull request can address this, then it shouldn't be abandoned IMO!

jhamman · 2024-10-11T23:27:59Z

I'm going to close this as stale. Folks should feel free to reopen if there is interest in continuing this work.

shikharsg and others added 30 commits August 29, 2018 13:17

first attempt at chunk_cache layer

d62febb

ChunkCache test with MockChunkCacheArray

f796ea7

np.copy not needed when accessing a subset of a chunk

32141a9

fixed 'Mm' dtype error for buffersize function

b35139b

renamed ChunkCache to LRUChunkCache

3c45176

LRUChunkCache in zarr root namespace

46dcf94

LRUChunkCache example

c69c751

write caching of chunk should be done after encoding

2cb143e

ensure cached chunk has been round tripped through encode-decode if d…

2fb169e

…type=object. See this: zarr-developers#306 (comment)

flake8 fixes

31e4dfb

read write cache for 0-d arrays

5559c4f

added tutorial and api docs

2a0124a

separated store tests from mutable mapping tests in test_storage.py

6fac2ad

fixed pickle, __delitem__ and ordered dict iteration bugs

4e79d0b

documenting slowdown when using write cache with object arrays

5fd6fc8

factoring out mapping code from LRUStoreCache and LRUChunkCache

422f9eb

consistent variable naming in _chunk_getitem

44cea83

removed unnecesary code from _set_basic_selection_zd and added encode…

1b67e90

… decode round trip for object arrays(with tests)

flake 8 fixes

9b0cc29

Merge remote-tracking branch 'upstream/master' into chunk_cache

715f86d

fixed coverage

0013f95

Merge branch 'chunk_cache' into master

b70c348

Merge pull request #4 from shikharsg/master

c4f2487

sync with master

Merge branch 'master' into chunk_cache

245f661

Merge branch 'master' into chunk_cache

a2a05fb

Merge branch 'chunk_cache' into chunk_cache_mapping_refactor

b8b9056

Merge pull request #3 from shikharsg/chunk_cache_mapping_refactor

04f0367

factoring out mapping code from LRUStoreCache and LRUChunkCache

bug fix

f19d43e

Merge branch 'master' into chunk_cache

52a43bf

python 2 and 3 compatibility

697d46e

Shikhar Goenka and others added 14 commits November 20, 2019 15:39

merge with master

dcd4ee7

added chunk_cache to all relevant function

4a1baa9

Merge branch 'master' into chunk_cache

e6540e1

merge with master

9f9d176

fixed failing doctest

6571382

Merge remote-tracking branch 'origin/master' into pr-306

8c8a69f

fixed setitem caching order

e0e5254

Merge branch 'master' into chunk_cache

992b48a

refactor

38ee622

Merge branch 'master' into chunk_cache

8b6a699

Merge 'origin/master' into pr-306

ba5c0ed

Remove use of unittest

7cdce5f

Merge branch 'master' into chunk_cache

06c899b

Merge remote-tracking branch 'origin/main' into chunk_cache

c9be163

croth1 commented Oct 23, 2022

View reviewed changes

croth1 commented Oct 24, 2022

View reviewed changes

code style cleanup

16793f5

ericpre mentioned this pull request May 29, 2023

Integration with hyperspy py4dstem/py4DSTEM#409

Open

jhamman closed this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Attempt to continue LRU cache for decoded chunks #1214

[WIP] Attempt to continue LRU cache for decoded chunks #1214

croth1 commented Oct 23, 2022 •

edited

Loading

croth1 commented Oct 23, 2022 •

edited

Loading

croth1 Oct 23, 2022

croth1 Oct 24, 2022

croth1 Oct 24, 2022

joshmoore commented Oct 24, 2022

codecov bot commented Oct 24, 2022 •

edited

Loading

joshmoore commented Oct 31, 2022

croth1 commented Nov 1, 2022

FarzanT commented May 20, 2023 •

edited

Loading

croth1 commented Jun 2, 2023

joshmoore commented Jun 3, 2023

FarzanT commented Jun 6, 2023

jhamman commented Oct 11, 2024

		class StoreTests(MutableMappingStoreTests):
		"""Abstract store tests."""

		@@ -87,6 +87,15 @@ class Array:
		read and decompressed when possible.

		.. versionadded:: 2.7

[WIP] Attempt to continue LRU cache for decoded chunks #1214

[WIP] Attempt to continue LRU cache for decoded chunks #1214

Conversation

croth1 commented Oct 23, 2022 • edited Loading

croth1 commented Oct 23, 2022 • edited Loading

croth1 Oct 23, 2022

Choose a reason for hiding this comment

croth1 Oct 24, 2022

Choose a reason for hiding this comment

croth1 Oct 24, 2022

Choose a reason for hiding this comment

joshmoore commented Oct 24, 2022

codecov bot commented Oct 24, 2022 • edited Loading

Codecov Report

joshmoore commented Oct 31, 2022

croth1 commented Nov 1, 2022

FarzanT commented May 20, 2023 • edited Loading

croth1 commented Jun 2, 2023

joshmoore commented Jun 3, 2023

FarzanT commented Jun 6, 2023

jhamman commented Oct 11, 2024

croth1 commented Oct 23, 2022 •

edited

Loading

croth1 commented Oct 23, 2022 •

edited

Loading

codecov bot commented Oct 24, 2022 •

edited

Loading

FarzanT commented May 20, 2023 •

edited

Loading