Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VirtualInMemoryArray that keeps small arrays in memory #336

Merged
merged 5 commits into from
Dec 22, 2023

Conversation

tomwhite
Copy link
Member

This is a continuation of the idea in #247 and #290 - there is no need to materialize arrays to Zarr if they can be computed from the block ID (#247 and #290) or if they are small enough to be in memory anyway (this PR).

I've also added a check that the arrays created in this way (via asarray) are smaller than 1MB. Anything bigger is best written to shared storage (typically as a Zarr file), rather than be serialized in the plan (which is currently what is happening).

@dcherian
Copy link

a check that the arrays created in this way (via asarray) are smaller than 1MB.

Perhaps you can avoid this with the "broadcast trick" (dask/dask#9517)

np.full(shape, 1) == np.ones(shape) == np.array([1]).broadcast_to(shape)

@tomwhite
Copy link
Member Author

That's a clever trick, thanks for pointing it out @dcherian! In this case I don't think it applies, since we have a small array of arbitrary values, not a constant array created by full - but it might be something we can use in that case.

@tomwhite
Copy link
Member Author

I just did a comparison of the number of Zarr arrays created during a unit test run:

  • Main: 431 arrays
  • This PR: 296 arrays

This probably isn't representative of real workloads since asarray isn't used much (and if it is, the arrays are small and don't dominate the computation).

@tomwhite tomwhite merged commit 39cf477 into main Dec 22, 2023
7 checks passed
@tomwhite tomwhite deleted the virtual-in-memory branch December 22, 2023 10:47
tomwhite added a commit that referenced this pull request Jan 2, 2024
tomwhite added a commit that referenced this pull request Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants