Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix subgary indexing in cnary.by_gene() #580

Merged
merged 1 commit into from
Apr 5, 2021

Conversation

tskir
Copy link
Collaborator

@tskir tskir commented Mar 18, 2021

Closes #573. Closes #576. Closes #579. Thank you @diushiguzhi @eriktoo @hrkemp @drmrgd @HYan-lei for reporting and initial investigations!

cnary.by_gene() iterates over chromosomes, which are represented as dataframe slices which preserve the original indexes. Because of this, accessing them by .iloc is not going to be possible, and .loc needs to be used instead. (An alternative solution would be to keep .iloc but reset the sub-dataframe index somewhere along the way, but I don't see a good reason to do this.)

The autotests don't seem to be working but I've ran the full suite locally and everything's fine.

cnary.by_gene() iterates over chromosomes, which are represented as
dataframe slices which preserve the original indexes. Because of this,
accessing them by .iloc is not going to be possible, and .loc needs to
be used instead.
@@ -83,7 +83,6 @@ def by_gene(self, ignore=params.IGNORE_GENE_NAMES):
Pairs of: (gene name, CNA of rows with same name)
"""
ignore += params.ANTITARGET_ALIASES
start_idx = end_idx = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this line because it's never used—both start_idx and end_idx are initialised again below

@tskir
Copy link
Collaborator Author

tskir commented Mar 18, 2021

P. S. It also solves the problem with the chromosome diagrams in #579 because the underlying reason is exactly the same.

@etal
Copy link
Owner

etal commented Apr 5, 2021

Do you see any other related code where iloc or loc has problems on original vs. reset index differences? I remember addressing something similar in the modules skgenome/intersect.py and skgenome/merge.py.

The fixes in by_gene() look correct, so I'll merge this PR and keep an eye out for any similar issues elsewhere in the codebase that haven't already been addressed.

@etal etal merged commit 27570f0 into etal:master Apr 5, 2021
@tskir tskir deleted the fix-genemetrics-indexing branch May 9, 2021 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants