Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read/write errors when / is in var names #1834

Open
2 of 3 tasks
LucaMarconato opened this issue Jan 20, 2025 · 2 comments
Open
2 of 3 tasks

Read/write errors when / is in var names #1834

LucaMarconato opened this issue Jan 20, 2025 · 2 comments

Comments

@LucaMarconato
Copy link
Member

LucaMarconato commented Jan 20, 2025

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

Related to #321.

When a var name contains a slash, for instance a/b, this is written on disk as a folder a, and then b a subfolder of a. See screenshot. This creates problems, such as in the code below. Previously in spatialdata we observed also this OS-dependent problem (appearing on Windows for some users, but that we couldn't reproduce) #1447.

Image

Code:

import anndata as ad
import pandas as pd

adata0 = ad.AnnData(var=pd.DataFrame([[1, 2], [3, 4]], columns=['a/b', 'a']))
adata1 = ad.AnnData(var=pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'a/b']))
adata2 = ad.AnnData(var=pd.DataFrame([1, 2], columns=['a/b']))

# ok
adata0.write_h5ad('temp/adata.h5ad')
adata0.write_zarr('temp/adata.zarr')

# error
_ = ad.read_h5ad('temp/adata.h5ad')

# error
_ = ad.read_zarr('temp/adata.zarr')

# error
adata1.write_h5ad('temp/adata.h5ad')

# error
adata1.write_zarr('temp/adata.zarr')

# ok
adata2.write_h5ad('temp/adata.h5ad')
_ = ad.read_h5ad('temp/adata.h5ad')
adata2.write_zarr('temp/adata.zarr')
_ = ad.read_zarr('temp/adata.zarr')

The reasons of the errors is that on disk a/b is saved as a folder a with b as a subfolder.

In the first 2 errors, b is not present in a since the var name a overwrites the previous a folder.

In the latest 2 error the a/b folder cannot be written because a was written first.

In the last part of the script (no error), you can reproduce having b written as a subfolder of a.

Proposed solution

I propose to disallow the character / for column names of obs, var, obsm, varm, obsp, varp, uns.
See also #321 (comment)

Versions

/opt/miniconda3/envs/ome311/lib/python3.11/site-packages/session_info2/__init__.py:124: UserWarning: The '__version__' attribute is deprecated and will be removed in MarkupSafe 3.1. Use feature detection, or `importlib.metadata.version("markupsafe")`, instead.
  and (v := getattr(pkg, "__version__", None))
anndata	0.11.3
----	----
scipy	1.15.0
session-info2	0.1.2
typing_extensions	4.12.2
MarkupSafe	3.0.2
Jinja2	3.1.4 (3.1.5)
tblib	3.0.0
natsort	8.4.0
zarr	2.18.3
PyYAML	6.0.2
zipp	3.21.0
sphinxcontrib-qthelp	2.0.0
sphinxcontrib-jsmath	1.0.1
asciitree	0.3.3
charset-normalizer	3.4.1
packaging	24.2
torch	2.5.1
numcodecs	0.14.1
importlib_metadata	8.5.0
pyarrow	18.1.0
sphinxcontrib-bibtex	2.6.3
tqdm	4.67.1
h5py	3.12.1
sphinxcontrib-applehelp	2.0.0
psutil	6.1.0
dask	2024.11.2
pytz	2024.1
msgpack	1.1.0
numpy	1.26.4 (2.0.0)
sphinxcontrib-serializinghtml	2.0.0
six	1.16.0 (1.17.0)
python-dateutil	2.9.0.post0
sphinxcontrib-htmlhelp	2.1.0
pandas	2.2.3
cloudpickle	3.1.0
toolz	1.0.0
setuptools	75.6.0
sphinxcontrib-devhelp	2.0.0
----	----
Python	3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:26:40) [Clang 14.0.6 ]
OS	macOS-14.6.1-arm64-arm-64bit
Updated	2025-01-20 11:18
@LucaMarconato LucaMarconato changed the title Problem with / in var names Read/write errors when / is in var names Jan 20, 2025
@flying-sheep
Copy link
Member

strange, why is that only a problem on windows?

@LucaMarconato
Copy link
Member Author

The bug that I am reporting now occurs on macOS. The linked bug was a problem on windows because on macOS, if I write only a/b and not also a, the read write works, but when moving such anndata Zarr store (on disk) to a Windows machine, I guess something gets messed up due to the difference between / being a path separator on macOS, but not on Windows.

@ilan-gold ilan-gold self-assigned this Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants