Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): read_elem_as_dask method #1469

Merged
merged 160 commits into from
Jul 23, 2024
Merged

(feat): read_elem_as_dask method #1469

merged 160 commits into from
Jul 23, 2024

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Apr 11, 2024

My thinking in terms of the naming for this would be to use this for xarray as well but we don't have to.

src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Apr 11, 2024

Codecov Report

Attention: Patch coverage is 96.80000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 84.17%. Comparing base (d51f84c) to head (cc67a9b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1469      +/-   ##
==========================================
- Coverage   86.48%   84.17%   -2.32%     
==========================================
  Files          36       37       +1     
  Lines        5823     5932     +109     
==========================================
- Hits         5036     4993      -43     
- Misses        787      939     +152     
Files Coverage Δ
src/anndata/_io/specs/__init__.py 100.00% <100.00%> (ø)
src/anndata/_io/specs/lazy_methods.py 100.00% <100.00%> (ø)
src/anndata/experimental/__init__.py 100.00% <100.00%> (ø)
src/anndata/_core/file_backing.py 91.07% <90.00%> (-0.20%) ⬇️
src/anndata/_types.py 88.09% <90.90%> (-0.48%) ⬇️
src/anndata/_io/specs/registry.py 95.45% <90.90%> (-0.87%) ⬇️

... and 7 files with indirect coverage changes

@ilan-gold ilan-gold changed the title (feat): read_elem_lazy method (feat): read_elem_dask method Apr 17, 2024
@ilan-gold ilan-gold added this to the 0.10.8 milestone Apr 17, 2024
@ilan-gold ilan-gold changed the title (feat): read_elem_dask method (feat): read_elem_as_dask method Apr 17, 2024
@ilan-gold ilan-gold marked this pull request as ready for review April 18, 2024 13:45
@ilan-gold ilan-gold self-assigned this Apr 18, 2024
@ilan-gold ilan-gold requested a review from ivirshup May 8, 2024 12:14
src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
tests/test_io_elementwise.py Outdated Show resolved Hide resolved
src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
Comment on lines 66 to 87
def create_dense_store(store):
X = np.random.randn(SIZE, SIZE)

write_elem(store, "X", X)
return store


def create_sparse_store(sparse_format, store):
import dask.array as da

X = sparse.random(
SIZE,
SIZE,
format=sparse_format,
density=0.01,
random_state=np.random.default_rng(),
)
X_dask = da.from_array(X, chunks=(100, 100))

write_elem(store, "X", X)
write_elem(store, "X_dask", X_dask)
return store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use these in more places, I would suggest not making them square. This has definitely caused problems for us in the past. Maybe even make the shape/ chunk size a parameter?

I would also write a short doc string about what is actually being returned. I think it's a little strange that one of these groups has two things in it, while other only has one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just do the chunking along the axis. The shape shouldn't matter. We can reoslve it from sparse_format

Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I'm actually surprised how easily the subclassing seems to have worked out.

I've left one comment, which I think should be addressed but I hope is simple. One more minor nit which is simple, but can be left.

I think get_elem_name should live next to anndata._core.file_backing.filename. I don't know that file_backing is actually the best module for that, and maybe it would fit better in a util module. Also, get_elem_name seems like an actually implemented version of AnnData._io.specs.registry.elem_key. I think we should just delete elem_key.

tests/test_io_elementwise.py Outdated Show resolved Hide resolved
src/anndata/_io/specs/lazy_methods.py Show resolved Hide resolved
src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
src/anndata/_io/specs/lazy_methods.py Outdated Show resolved Hide resolved
@flying-sheep
Copy link
Member

OK now I’m done lol

@ilan-gold ilan-gold merged commit 6d70535 into main Jul 23, 2024
14 of 15 checks passed
@ilan-gold ilan-gold deleted the ig/read_dask_elem branch July 23, 2024 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_elem_as_dask Distributed writing for H5ad format due to h5py objects being unserializable
3 participants