Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): raising errors where backed is not supported #3048

Merged
merged 40 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
87450d0
(feat): first step raising errors where `backed` is not supported
ilan-gold May 8, 2024
b6e13eb
(fix): fix `chunked` with `log1p`
ilan-gold May 8, 2024
75a703b
(fix): complications in `log1p` with `chunked`
ilan-gold May 8, 2024
4a654d1
Merge branch 'main' into ig/backed_not_implemented
flying-sheep May 13, 2024
a80dece
remove duplicated import
flying-sheep May 13, 2024
4c22705
(feat): pca check
ilan-gold May 13, 2024
b3c4bb7
Merge branch 'ig/backed_not_implemented' of github.com:scverse/scanpy…
ilan-gold May 13, 2024
cb6116b
(feat): `ingest` check added
ilan-gold May 13, 2024
5b7a3e9
(feat): add `dendrogram` test
ilan-gold May 13, 2024
53e68c1
(feat): add `tsne` check
ilan-gold May 13, 2024
cfa8d57
(feat): `rank_genes_groups`
ilan-gold May 13, 2024
138c9be
(feat): `score_genes` check
ilan-gold May 13, 2024
d68ee47
(chore): add plotting backed tests
ilan-gold May 13, 2024
6632910
Merge branch 'main' into ig/backed_not_implemented
ilan-gold May 14, 2024
788a4cb
(chore): check type instead of `isbacked`
ilan-gold May 14, 2024
06ab4f1
Merge branch 'ig/backed_not_implemented' of github.com:scverse/scanpy…
ilan-gold May 14, 2024
dafda50
(chore): release note
ilan-gold May 14, 2024
786f096
(fix): string formatting error
ilan-gold May 14, 2024
4eb3487
(fix): `worker_id` default
ilan-gold May 14, 2024
4f78c23
Merge branch 'main' into ig/backed_not_implemented
ilan-gold May 14, 2024
3a36196
(fix): try `SparseDataset`
ilan-gold May 14, 2024
9480b1c
Merge branch 'ig/backed_not_implemented' of github.com:scverse/scanpy…
ilan-gold May 14, 2024
8af759a
(fix): correct fixture creation
ilan-gold May 14, 2024
13633c6
Apply suggestions from code review
ilan-gold May 14, 2024
99968db
(chore): consolidate tests
ilan-gold May 14, 2024
9fc9ab0
(chore): remove erroneous plotting tests
ilan-gold May 14, 2024
27c3d08
(chore): remove other `tempfile` import
ilan-gold May 14, 2024
218067a
Merge branch 'main' into ig/backed_not_implemented
ilan-gold May 15, 2024
01a0bcf
(fix): try moving scanpy install
ilan-gold May 15, 2024
7e161bf
(fix): `np.Inf` -> `np.inf`
ilan-gold May 15, 2024
88d51ab
(fix): fail if pynn is newest
ilan-gold May 16, 2024
e4ebaf7
(fix): name
ilan-gold May 16, 2024
037b77f
(fix): `score_genes` name
ilan-gold May 16, 2024
0d96e89
Merge branch 'main' into ig/backed_not_implemented
flying-sheep May 17, 2024
e2e9252
revert 88d51ab
flying-sheep May 17, 2024
54ee86f
session-scoped backed_adata
flying-sheep May 17, 2024
2e8ba99
Merge branch 'main' into ig/backed_not_implemented
ilan-gold May 17, 2024
6b03ac8
(fix): move umap import back to top
ilan-gold May 17, 2024
352166b
(fix): try block shape
ilan-gold May 17, 2024
b0ca228
(fix): revert chunks fix
ilan-gold May 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/release-notes/1.10.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
```

* Compatibility with `matplotlib` 3.9 {pr}`2999` {smaller}`I Virshup`
* Add clear errors where `backed` mode-like matrices (i.e., from `sparse_dataset`) are not supported {pr}`3048` {smaller}`I gold`

```{rubric} Performance
```
Expand Down
19 changes: 19 additions & 0 deletions scanpy/_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
)
from weakref import WeakSet

import h5py
import numpy as np
from anndata import AnnData
from anndata import __version__ as anndata_version
Expand All @@ -43,6 +44,13 @@
from .._settings import settings
from .compute.is_constant import is_constant # noqa: F401

if Version(anndata_version) >= Version("0.10.0"):
from anndata._core.sparse_dataset import (
BaseCompressedSparseDataset as SparseDataset,
)
flying-sheep marked this conversation as resolved.
Show resolved Hide resolved
else:
from anndata._core.sparse_dataset import SparseDataset

Check warning on line 52 in scanpy/_utils/__init__.py

View check run for this annotation

Codecov / codecov/patch

scanpy/_utils/__init__.py#L52

Added line #L52 was not covered by tests

if TYPE_CHECKING:
from collections.abc import Mapping
from pathlib import Path
Expand Down Expand Up @@ -1090,3 +1098,14 @@
if axis in {1, "var"}:
return (1, "var")
raise ValueError(f"`axis` must be either 0, 1, 'obs', or 'var', was {axis!r}")


def is_backed_type(X: object) -> bool:
return isinstance(X, (SparseDataset, h5py.File, h5py.Dataset))

Check warning on line 1104 in scanpy/_utils/__init__.py

View check run for this annotation

Codecov / codecov/patch

scanpy/_utils/__init__.py#L1104

Added line #L1104 was not covered by tests


def raise_not_implemented_error_if_backed_type(X: object, method_name: str) -> None:
if is_backed_type(X):
raise NotImplementedError(

Check warning on line 1109 in scanpy/_utils/__init__.py

View check run for this annotation

Codecov / codecov/patch

scanpy/_utils/__init__.py#L1108-L1109

Added lines #L1108 - L1109 were not covered by tests
f"{method_name} is not implemented for matrices of type {type(X)}"
)
4 changes: 2 additions & 2 deletions scanpy/experimental/_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@
theta
The negative binomial overdispersion parameter `theta` for Pearson residuals.
Higher values correspond to less overdispersion \
(`var = mean + mean^2/theta`), and `theta=np.Inf` corresponds to a Poisson model.
(`var = mean + mean^2/theta`), and `theta=np.inf` corresponds to a Poisson model.
clip
Determines if and how residuals are clipped:

* If `None`, residuals are clipped to the interval \
`[-sqrt(n_obs), sqrt(n_obs)]`, where `n_obs` is the number of cells in the dataset (default behavior).
* If any scalar `c`, residuals are clipped to the interval `[-c, c]`. Set \
`clip=np.Inf` for no clipping.
`clip=np.inf` for no clipping.
"""

doc_check_values = """\
Expand Down
4 changes: 2 additions & 2 deletions scanpy/plotting/_tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,8 +407,8 @@
gs = gridspec.GridSpec(nrows=n_panels_y, ncols=n_panels_x, wspace=0.22, hspace=0.3)

ax0 = None
ymin = np.Inf
ymax = -np.Inf
ymin = np.inf
ymax = -np.inf

Check warning on line 411 in scanpy/plotting/_tools/__init__.py

View check run for this annotation

Codecov / codecov/patch

scanpy/plotting/_tools/__init__.py#L410-L411

Added lines #L410 - L411 were not covered by tests
for count, group_name in enumerate(group_names):
gene_names = adata.uns[key]["names"][group_name][:n_genes]
scores = adata.uns[key]["scores"][group_name][:n_genes]
Expand Down
11 changes: 9 additions & 2 deletions scanpy/preprocessing/_pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from .. import logging as logg
from .._compat import DaskArray, pkg_version
from .._settings import settings
from .._utils import AnyRandom, Empty, _doc_params, _empty
from .._utils import AnyRandom, Empty, _doc_params, _empty, is_backed_type
from ..get import _check_mask, _get_obs_rep
from ._docs import doc_mask_var_hvg
from ._utils import _get_mean_var
Expand Down Expand Up @@ -170,6 +170,10 @@
)
data_is_AnnData = isinstance(data, AnnData)
if data_is_AnnData:
if layer is None and not chunked and is_backed_type(data.X):
raise NotImplementedError(

Check warning on line 174 in scanpy/preprocessing/_pca.py

View check run for this annotation

Codecov / codecov/patch

scanpy/preprocessing/_pca.py#L173-L174

Added lines #L173 - L174 were not covered by tests
f"PCA is not implemented for matrices of type {type(data.X)} with chunked as False"
)
adata = data.copy() if copy else data
else:
if pkg_version("anndata") < Version("0.8.0rc1"):
Expand All @@ -192,7 +196,10 @@
logg.info(f" with n_comps={n_comps}")

X = _get_obs_rep(adata_comp, layer=layer)

if is_backed_type(X) and layer is not None:
raise NotImplementedError(

Check warning on line 200 in scanpy/preprocessing/_pca.py

View check run for this annotation

Codecov / codecov/patch

scanpy/preprocessing/_pca.py#L199-L200

Added lines #L199 - L200 were not covered by tests
f"PCA is not implemented for matrices of type {type(X)} from layers"
)
# See: https://github.com/scverse/scanpy/pull/2816#issuecomment-1932650529
if (
Version(ad.__version__) < Version("0.9")
Expand Down
2 changes: 2 additions & 0 deletions scanpy/preprocessing/_scale.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from .._utils import (
_check_array_function_arguments,
axis_mul_or_truediv,
raise_not_implemented_error_if_backed_type,
renamed_arg,
view_to_actual,
)
Expand Down Expand Up @@ -298,6 +299,7 @@
mask_obs = _check_mask(adata, mask_obs, "obs")
view_to_actual(adata)
X = _get_obs_rep(adata, layer=layer, obsm=obsm)
raise_not_implemented_error_if_backed_type(X, "scale")

Check warning on line 302 in scanpy/preprocessing/_scale.py

View check run for this annotation

Codecov / codecov/patch

scanpy/preprocessing/_scale.py#L302

Added line #L302 was not covered by tests
X, adata.var[str_mean_std[0]], adata.var[str_mean_std[1]] = scale(
X,
zero_center=zero_center,
Expand Down
15 changes: 15 additions & 0 deletions scanpy/preprocessing/_simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
AnyRandom,
_check_array_function_arguments,
axis_sum,
is_backed_type,
raise_not_implemented_error_if_backed_type,
renamed_arg,
sanitize_anndata,
view_to_actual,
Expand Down Expand Up @@ -142,6 +144,7 @@
"`min_genes`, `max_counts`, `max_genes` per call."
)
if isinstance(data, AnnData):
raise_not_implemented_error_if_backed_type(data.X, "filter_cells")

Check warning on line 147 in scanpy/preprocessing/_simple.py

View check run for this annotation

Codecov / codecov/patch

scanpy/preprocessing/_simple.py#L147

Added line #L147 was not covered by tests
adata = data.copy() if copy else data
cell_subset, number = materialize_as_ndarray(
filter_cells(
Expand Down Expand Up @@ -257,6 +260,7 @@
)

if isinstance(data, AnnData):
raise_not_implemented_error_if_backed_type(data.X, "filter_genes")

Check warning on line 263 in scanpy/preprocessing/_simple.py

View check run for this annotation

Codecov / codecov/patch

scanpy/preprocessing/_simple.py#L263

Added line #L263 was not covered by tests
adata = data.copy() if copy else data
gene_subset, number = materialize_as_ndarray(
filter_genes(
Expand Down Expand Up @@ -402,10 +406,19 @@
raise NotImplementedError(
"Currently cannot perform chunked operations on arrays not stored in X."
)
if adata.isbacked and adata.file._filemode != "r+":
raise NotImplementedError(

Check warning on line 410 in scanpy/preprocessing/_simple.py

View check run for this annotation

Codecov / codecov/patch

scanpy/preprocessing/_simple.py#L409-L410

Added lines #L409 - L410 were not covered by tests
"log1p is not implemented for backed AnnData with backed mode not r+"
)
for chunk, start, end in adata.chunked_X(chunk_size):
adata.X[start:end] = log1p(chunk, base=base, copy=False)
else:
X = _get_obs_rep(adata, layer=layer, obsm=obsm)
if is_backed_type(X):
msg = f"log1p is not implemented for matrices of type {type(X)}"
if layer is not None:
raise NotImplementedError(f"{msg} from layers")
raise NotImplementedError(f"{msg} without `chunked=True`")

Check warning on line 421 in scanpy/preprocessing/_simple.py

View check run for this annotation

Codecov / codecov/patch

scanpy/preprocessing/_simple.py#L417-L421

Added lines #L417 - L421 were not covered by tests
X = log1p(X, copy=False, base=base)
_set_obs_rep(adata, X, layer=layer, obsm=obsm)

Expand Down Expand Up @@ -644,6 +657,7 @@
keys = [keys]

X = _get_obs_rep(adata, layer=layer)
raise_not_implemented_error_if_backed_type(X, "regress_out")

Check warning on line 660 in scanpy/preprocessing/_simple.py

View check run for this annotation

Codecov / codecov/patch

scanpy/preprocessing/_simple.py#L660

Added line #L660 was not covered by tests

if issparse(X):
logg.info(" sparse input is densified and may " "lead to high memory use")
Expand Down Expand Up @@ -852,6 +866,7 @@
`adata.X` : :class:`numpy.ndarray` | :class:`scipy.sparse.spmatrix` (dtype `float`)
Downsampled counts matrix.
"""
raise_not_implemented_error_if_backed_type(adata.X, "downsample_counts")

Check warning on line 869 in scanpy/preprocessing/_simple.py

View check run for this annotation

Codecov / codecov/patch

scanpy/preprocessing/_simple.py#L869

Added line #L869 was not covered by tests
# This logic is all dispatch
total_counts_call = total_counts is not None
counts_per_cell_call = counts_per_cell is not None
Expand Down
98 changes: 98 additions & 0 deletions scanpy/tests/test_backed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
from __future__ import annotations

from functools import partial

import pytest
from anndata import read_h5ad

import scanpy as sc


@pytest.mark.parametrize(
("name", "func", "msg"),
[
pytest.param("PCA", sc.pp.pca, " with chunked as False", id="pca"),
pytest.param(
"PCA", partial(sc.pp.pca, layer="X_copy"), " from layers", id="pca_layer"
),
pytest.param(
"regress_out",
partial(sc.pp.regress_out, keys=["n_counts", "percent_mito"]),
"",
id="regress_out",
),
pytest.param(
"dendrogram", partial(sc.tl.dendrogram, groupby="cat"), "", id="dendrogram"
),
pytest.param("tsne", sc.tl.tsne, "", id="tsne"),
pytest.param("scale", sc.pp.scale, "", id="scale"),
pytest.param(
"downsample_counts",
partial(sc.pp.downsample_counts, counts_per_cell=1000),
"",
id="downsample_counts",
),
pytest.param(
"filter_genes",
partial(sc.pp.filter_genes, max_cells=1000),
"",
id="filter_genes",
),
pytest.param(
"filter_cells",
partial(sc.pp.filter_cells, max_genes=1000),
"",
id="filter_cells",
),
pytest.param(
"rank_genes_groups",
partial(sc.tl.rank_genes_groups, groupby="cat"),
"",
id="rank_genes_groups",
),
pytest.param(
"rank_genes_groups",
partial(sc.tl.score_genes, gene_list=map(str, range(100))),
"",
id="score_genes",
),
],
)
def test_backed_error(backed_adata, name, func, msg):
with pytest.raises(
NotImplementedError,
match=f"{name} is not implemented for matrices of type {type(backed_adata.X)}{msg}",
):
func(backed_adata)


def test_log1p_backed_errors(backed_adata):
with pytest.raises(
NotImplementedError,
match="log1p is not implemented for backed AnnData with backed mode not r+",
):
sc.pp.log1p(backed_adata, chunked=True)
backed_adata.file.close()
backed_adata = read_h5ad(backed_adata.filename, backed="r+")
with pytest.raises(
NotImplementedError,
match=f"log1p is not implemented for matrices of type {type(backed_adata.X)} without `chunked=True`",
):
sc.pp.log1p(backed_adata)
backed_adata.layers["X_copy"] = backed_adata.X
layer_type = type(backed_adata.layers["X_copy"])
with pytest.raises(
NotImplementedError,
match=f"log1p is not implemented for matrices of type {layer_type} from layers",
):
sc.pp.log1p(backed_adata, layer="X_copy")
backed_adata.file.close()


def test_scatter_backed(backed_adata):
sc.pp.pca(backed_adata, chunked=True)
sc.pl.scatter(backed_adata, color="0", basis="pca")


def test_dotplot_backed(backed_adata):
sc.pl.dotplot(backed_adata, ["0", "1", "2", "3"], groupby="cat")
4 changes: 2 additions & 2 deletions scanpy/tests/test_highly_variable_genes.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,9 @@ def test_pearson_residuals_inputchecks(pbmc3k_parametrized_small):

@pytest.mark.parametrize("subset", [True, False], ids=["subset", "full"])
@pytest.mark.parametrize(
"clip", [None, np.Inf, 30], ids=["noclip", "infclip", "30clip"]
"clip", [None, np.inf, 30], ids=["noclip", "infclip", "30clip"]
flying-sheep marked this conversation as resolved.
Show resolved Hide resolved
)
@pytest.mark.parametrize("theta", [100, np.Inf], ids=["100theta", "inftheta"])
@pytest.mark.parametrize("theta", [100, np.inf], ids=["100theta", "inftheta"])
@pytest.mark.parametrize("n_top_genes", [100, 200], ids=["100n", "200n"])
def test_pearson_residuals_general(
pbmc3k_parametrized_small, subset, clip, theta, n_top_genes
Expand Down
19 changes: 19 additions & 0 deletions scanpy/tests/test_ingest.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import anndata
import numpy as np
import pytest
from sklearn.neighbors import KDTree
Expand Down Expand Up @@ -153,3 +154,21 @@ def test_ingest_map_embedding_umap():
umap_transformed_t = reducer.transform(T)

assert np.allclose(ing._obsm["X_umap"], umap_transformed_t)


def test_ingest_backed(adatas, tmp_path):
adata_ref = adatas[0].copy()
adata_new = adatas[1].copy()

tmp_path = tmp_path

adata_new.write_h5ad(f"{tmp_path}/new.h5ad")

adata_new = anndata.read_h5ad(f"{tmp_path}/new.h5ad", backed="r")

ing = sc.tl.Ingest(adata_ref)
with pytest.raises(
NotImplementedError,
match=f"Ingest.fit is not implemented for matrices of type {type(adata_new.X)}",
):
ing.fit(adata_new)
4 changes: 2 additions & 2 deletions scanpy/tests/test_normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,8 @@ def test_normalize_pearson_residuals_errors(pbmc3k_parametrized, params, match):
"sparsity_func", [np.array, csr_matrix], ids=lambda x: x.__name__
)
@pytest.mark.parametrize("dtype", ["float32", "int64"])
@pytest.mark.parametrize("theta", [0.01, 1, 100, np.Inf])
@pytest.mark.parametrize("clip", [None, 1, np.Inf])
@pytest.mark.parametrize("theta", [0.01, 1, 100, np.inf])
@pytest.mark.parametrize("clip", [None, 1, np.inf])
def test_normalize_pearson_residuals_values(sparsity_func, dtype, theta, clip):
# toy data
X = np.array([[3, 6], [2, 4], [1, 0]])
Expand Down
4 changes: 3 additions & 1 deletion scanpy/tools/_dendrogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from .. import logging as logg
from .._compat import old_positionals
from .._utils import _doc_params
from .._utils import _doc_params, raise_not_implemented_error_if_backed_type
from ..neighbors._doc import doc_n_pcs, doc_use_rep
from ._utils import _choose_representation

Expand Down Expand Up @@ -116,6 +116,8 @@
>>> markers = ['C1QA', 'PSAP', 'CD79A', 'CD79B', 'CST3', 'LYZ']
>>> sc.pl.dotplot(adata, markers, groupby='bulk_labels', dendrogram=True)
"""

raise_not_implemented_error_if_backed_type(adata.X, "dendrogram")

Check warning on line 120 in scanpy/tools/_dendrogram.py

View check run for this annotation

Codecov / codecov/patch

scanpy/tools/_dendrogram.py#L120

Added line #L120 was not covered by tests
if isinstance(groupby, str):
# if not a list, turn into a list
groupby = [groupby]
Expand Down
3 changes: 2 additions & 1 deletion scanpy/tools/_ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from .. import logging as logg
from .._compat import old_positionals, pkg_version
from .._settings import settings
from .._utils import NeighborsView
from .._utils import NeighborsView, raise_not_implemented_error_if_backed_type
from .._utils._doctests import doctest_skip
from ..neighbors import FlatTree, RPForestDict

Expand Down Expand Up @@ -388,6 +388,7 @@
`adata` refers to the :class:`~anndata.AnnData` object
that is passed during the initialization of an Ingest instance.
"""
raise_not_implemented_error_if_backed_type(adata_new.X, "Ingest.fit")

Check warning on line 391 in scanpy/tools/_ingest.py

View check run for this annotation

Codecov / codecov/patch

scanpy/tools/_ingest.py#L391

Added line #L391 was not covered by tests
ref_var_names = self._adata_ref.var_names.str.upper()
new_var_names = adata_new.var_names.str.upper()

Expand Down
7 changes: 5 additions & 2 deletions scanpy/tools/_rank_genes_groups.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@
from .. import _utils
from .. import logging as logg
from .._compat import old_positionals
from .._utils import check_nonnegative_integers
from .._utils import (
check_nonnegative_integers,
raise_not_implemented_error_if_backed_type,
)
from ..get import _check_mask
from ..preprocessing._utils import _get_mean_var

Expand Down Expand Up @@ -132,6 +135,7 @@
if use_raw and adata.raw is not None:
adata_comp = adata.raw
X = adata_comp.X
raise_not_implemented_error_if_backed_type(X, "rank_genes_groups")

Check warning on line 138 in scanpy/tools/_rank_genes_groups.py

View check run for this annotation

Codecov / codecov/patch

scanpy/tools/_rank_genes_groups.py#L138

Added line #L138 was not covered by tests

# for correct getnnz calculation
if issparse(X):
Expand Down Expand Up @@ -592,7 +596,6 @@
>>> # to visualize the results
>>> sc.pl.rank_genes_groups(adata)
"""

if mask_var is not None:
mask_var = _check_mask(adata, mask_var, "var")

Expand Down
Loading
Loading