Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Scanpy to 1.10, preparing for GTA #6162

Merged
merged 57 commits into from
Sep 14, 2024

Conversation

pavanvidem
Copy link
Member

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

@bgruening
Copy link
Member

@pavanvidem not sure if you have seen it, but we also need a rebase here.

@nilchia
Copy link
Contributor

nilchia commented Aug 22, 2024

mnnpy conda recipe needs update

@bgruening
Copy link
Member

Please rebase this PR

@bgruening
Copy link
Member

What do you mean with mnnoy update? What is broken?

@nilchia
Copy link
Contributor

nilchia commented Aug 24, 2024

All tests pass now :D
I'll rebase the PR and resolve the conflicts next.

@nilchia
Copy link
Contributor

nilchia commented Aug 24, 2024

I removed sc.pl.dpt_groups_pseudotime() because it has problem in upstream source code.
It is incompatible with pandas version I think. The way scanpy tries to index a pandas.series object is not supported by pandas.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nilchia/miniconda3/envs/scanpy/lib/python3.12/site-packages/legacy_api_wrap/__init__.py", line 80, in fn_compatible
    return fn(*args_all, **kw)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/nilchia/miniconda3/envs/scanpy/lib/python3.12/site-packages/scanpy/plotting/_tools/__init__.py", line 276, in dpt_groups_pseudotime
    timeseries_subplot(
  File "/home/nilchia/miniconda3/envs/scanpy/lib/python3.12/site-packages/scanpy/plotting/_utils.py", line 140, in timeseries_subplot
    X = X[:, None]
        ~^^^^^^^^^
  File "/home/nilchia/miniconda3/envs/scanpy/lib/python3.12/site-packages/pandas/core/series.py", line 1153, in __getitem__
    return self._get_with(key)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/nilchia/miniconda3/envs/scanpy/lib/python3.12/site-packages/pandas/core/series.py", line 1163, in _get_with
    return self._get_values_tuple(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nilchia/miniconda3/envs/scanpy/lib/python3.12/site-packages/pandas/core/series.py", line 1203, in _get_values_tuple
    disallow_ndim_indexing(result)
  File "/home/nilchia/miniconda3/envs/scanpy/lib/python3.12/site-packages/pandas/core/indexers/utils.py", line 341, in disallow_ndim_indexing
    raise ValueError(
ValueError: Multi-dimensional indexing (e.g. obj[:, None]) is no longer supported. Convert to a numpy array before indexing instead.
>>> 

@bgruening
Copy link
Member

Try using numpy <2.0. Maybe that helps.

@nilchia
Copy link
Contributor

nilchia commented Aug 24, 2024

my package versions are these:

numpy                     1.26.4          py312heda63a1_0    conda-forge
pandas                    2.2.2           py312h1d6d2e6_1    conda-forge
scanpy                    1.10.2             pyhd8ed1ab_0    conda-forge

@nilchia
Copy link
Contributor

nilchia commented Aug 24, 2024

I did rebase and resolved conflicts locally but still, here it said there was a conflict :(

I reset back to my last commit.

@nilchia
Copy link
Contributor

nilchia commented Aug 25, 2024

The tests added from #6113 need a proper input file I guess.
error:

KeyError: "Could not find keys '['C1orf54', 'CLEC9A', 'CPNE3', 'DNASE1L3', 'HSP90B1', 'ID2', 'IGLL1', 'IRF8', 'LGALS1', 'NASP', 'PCNA', 'PPIB', 'RNASE6', 'SEC11C', 'SNX3', 'TK1']' in columns of `adata.obs` or in adata.var_names."

@nilchia
Copy link
Contributor

nilchia commented Sep 6, 2024

Please don't merge. I'll add h5 assert.

tools/scanpy/macros.xml Outdated Show resolved Hide resolved
nilchia and others added 3 commits September 6, 2024 14:14
Co-authored-by: Pavankumar Videm <pavanvidem@gmail.com>
@nilchia
Copy link
Contributor

nilchia commented Sep 6, 2024

I added the h5 assert.
However, for some functions, this assertion is not a good test because the function does not add any layer or key to the h5ad file but only changes the contents of the keys.
Functions:
pp.downsample_counts
filter_marker
pp.sqrt
external.pp.magic 'all_genes'
pp.regress_out
pp.combat

@nilchia
Copy link
Contributor

nilchia commented Sep 6, 2024

in filter.xml, the real output of filter_marker is the tabular file and not the adata, right?

@nilchia
Copy link
Contributor

nilchia commented Sep 7, 2024

.. ERROR: Error '403 Client Error: Forbidden for url: https://www.biorxiv.org/content/10.1101/298430v1' accessing https://www.biorxiv.org/content/10.1101/298430v1

What is wrong with the link? It was the same error with the previous link.

@bgruening
Copy link
Member

No idea, but the preprint is published by now and you should link the final paper https://www.nature.com/articles/nbt.4314

@pavanvidem
Copy link
Member Author

in filter.xml, the real output of filter_marker is the tabular file and not the adata, right?

Yes, it is is. Good catch! It also updates the anndata but tabular file is the actual output to test. Thanks!

@pavanvidem
Copy link
Member Author

pavanvidem commented Sep 7, 2024

I added the h5 assert. However, for some functions, this assertion is not a good test because the function does not add any layer or key to the h5ad file but only changes the contents of the keys. Functions: pp.downsample_counts filter_marker pp.sqrt external.pp.magic with t=-1 pp.regress_out pp.combat

Yes, all those functions only update the count matrix, not the annotations. If we don't want to save the test data then check whether .X exists which should be there. I don't know what is a good test here.

@nilchia
Copy link
Contributor

nilchia commented Sep 7, 2024

I added test for those. Is it good?

<token name="@CMD_ANNDATA_WRITE_OUTPUTS@"><![CDATA[
adata.write_h5ad('anndata.h5ad', compression='gzip')
with open('anndata_info.txt','w', encoding='utf-8') as ainfo:
print(adata, file=ainfo)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are writing anndata info to a file and then to stdout. So for all tests where there are no changes in the annotations, you can also check the matrix size in stdout. Something like:

<assert_stdout>
    <has_text_matching expression="336 × 11"/>
</assert_stdout>

Copy link
Contributor

@nilchia nilchia Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean instead of the min() max() ... ?
But for those functions the matrix size does not change.
Only the values of adata.X is changed.

I used this assert for filter_cells/genes and downsample that the data size changes

@nilchia
Copy link
Contributor

nilchia commented Sep 13, 2024

@pavanvidem @bgruening
Should I add/modify anything here? or is it ready to merge?

@pavanvidem
Copy link
Member Author

Can you please check if this was fixed? #6302

Otherwise, good to merge.

@nilchia
Copy link
Contributor

nilchia commented Sep 13, 2024

Actually, I didn't understand what was the issue there. Did the config file have a problem? or the adata was not defined?

@bgruening
Copy link
Member

NameError: name 'adata' is not defined, can you check if you still have this variable somewhere and if you have a test for this?

So many things have changed, I guess we can proceed.

@nilchia
Copy link
Contributor

nilchia commented Sep 13, 2024

Almost all functions use this variable. I guess we don't have any problems with that.

@pavanvidem
Copy link
Member Author

pavanvidem commented Sep 14, 2024

Agree. Let's proceed. The error was in the plot tool. None of the functions in the plot tool change the anndata object. So it should not affect the analysis. The worst thing that can happen is that no plot is generated which we can investigate later.

@bgruening
Copy link
Member

Awesome!

@bgruening bgruening merged commit 91121b1 into galaxyproject:main Sep 14, 2024
14 checks passed
@pavanvidem
Copy link
Member Author

Thanks, @nilchia and @bgruening for helping out!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants