Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scatter plot: avoid pandas positional index error #393

Merged
merged 1 commit into from
Nov 3, 2018

Conversation

chapmanb
Copy link
Contributor

When creating a scatter plot with a single chromosome, it can error out
with:

  File "build/bdist.linux-x86_64/egg/cnvlib/commands.py", line 914, in _cmd_scatter

  File "build/bdist.linux-x86_64/egg/cnvlib/scatter.py", line 49, in do_scatter
  File "build/bdist.linux-x86_64/egg/cnvlib/scatter.py", line 216, in chromosome_scatter
  File "build/bdist.linux-x86_64/egg/cnvlib/scatter.py", line 340, in select_range_genes
  File "build/bdist.linux-x86_64/egg/skgenome/gary.py", line 370, in in_range
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/future/builtins/newnext.py", line 59, in newnext
    return iterator.next()
  File "build/bdist.linux-x86_64/egg/skgenome/intersect.py", line 92, in iter_ranges
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1373, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1819, in _getitem_axis
    return self._get_list_axis(key, axis=axis)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1797, in _get_list_axis
    raise IndexError("positional indexers are out-of-bounds")
IndexError: positional indexers are out-of-bounds

There is a similar issue reported on BioStars (https://www.biostars.org/p/326154/).

Using loc instead of iloc, inspired by this StackOverflow discussion (https://stackoverflow.com/questions/44123056/indexerror-positional-indexers-are-out-of-bounds-when-theyre-demonstrably-no)
fixes the issue for my failing cases.

When creating a scatter plot with a single chromosome, it can error out
with:
```
  File "build/bdist.linux-x86_64/egg/cnvlib/commands.py", line 914, in _cmd_scatter

  File "build/bdist.linux-x86_64/egg/cnvlib/scatter.py", line 49, in do_scatter
  File "build/bdist.linux-x86_64/egg/cnvlib/scatter.py", line 216, in chromosome_scatter
  File "build/bdist.linux-x86_64/egg/cnvlib/scatter.py", line 340, in select_range_genes
  File "build/bdist.linux-x86_64/egg/skgenome/gary.py", line 370, in in_range
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/future/builtins/newnext.py", line 59, in newnext
    return iterator.next()
  File "build/bdist.linux-x86_64/egg/skgenome/intersect.py", line 92, in iter_ranges
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1373, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1819, in _getitem_axis
    return self._get_list_axis(key, axis=axis)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1797, in _get_list_axis
    raise IndexError("positional indexers are out-of-bounds")
IndexError: positional indexers are out-of-bounds
```
There is a similar issue reported on BioStars (https://www.biostars.org/p/326154/).

Using loc instead of iloc, inspired by this StackOverflow discussion (https://stackoverflow.com/questions/44123056/indexerror-positional-indexers-are-out-of-bounds-when-theyre-demonstrably-no)
fixes the issue for my failing cases.
@etal etal merged commit c66679a into etal:master Nov 3, 2018
@etal
Copy link
Owner

etal commented Nov 3, 2018

Ack. Thanks!

@etal etal mentioned this pull request Nov 21, 2018
@etal
Copy link
Owner

etal commented Jan 5, 2019

I'm not sure about this. On my end, scatter -c chr21 works as expected with the original .iloc[region_idx] in place, and while .loc[region_idx] doesn't crash, the unit tests related to intersection fail.

  • Do you see the same unit test failures with .loc?
  • Was there anything else special about the way scatter was run?
  • At one point pre-v0.9.5, around early June to July 7, the slice wasn't updated properly for chromosomes beyond the first one, which would lead to errors like the one reported on Biostars. (That one didn't report a release / development version, but the date looks plausible for using the development version during the buggy period.) Was bcbio also using a pre-release, prior to the most recent update for this?

@chapmanb
Copy link
Contributor Author

chapmanb commented Jan 7, 2019

Eric;
Sorry about this causing issues and thanks for looking into it. In bcbio we should have been using 0.9.5 since Sept 14th when we updated in bioconda. If this is causing failures I'm happy to revert for now and can work on a proper test case, which I should have submitted last time. I was really just digging around for fixes when hitting the problem and don't have a full picture of all the ramifications of swapping iloc and loc, so don't want to break other cases. Sorry again about the problem.

@etal
Copy link
Owner

etal commented Jan 8, 2019

No worries, I've rolled it back pending a full investigation. There could be something to do with the versions of numpy and pandas, I know they've both been fine-tuning their slice/index behaviors to be more rigorous.

@aarslank
Copy link

Hi,

Thank you for cnvkit! I'm using your docker container like this:
docker run -v /some/local/path/cnvkitTesting/:/home/ -it etal/cnvkit

I need some help with the plots. Here's what I'm seeing:

cnvkit.py scatter mycnr.cnr -s mycns.cns -o mypng.png: works. Plot includes all chromosomes.
cnvkit.py scatter mycnr.cnr -s mycns.cns -c 1 -o mypng.png: works. Plot only shows chromsome 1.
cnvkit.py scatter mycnr.cnr -s mycns.cns -c 2 -o mypng.png: Index error! IndexError: positional indexers are out-of-bounds.
cnvkit.py scatter mycnr.cnr -s mycns.cns -c 3 -o mypng.png: Index error! IndexError: positional indexers are out-of-bounds.
and so on.

Here's that index error I mentioned above:

Traceback (most recent call last):
File "/usr/local/bin/cnvkit.py", line 13, in
args.func(args)
File "/usr/local/lib/python2.7/dist-packages/cnvlib/commands.py", line 908, in _cmd_scatter
**scatter_opts)
File "/usr/local/lib/python2.7/dist-packages/cnvlib/scatter.py", line 49, in do_scatter
y_min, y_max, title, segment_color)
File "/usr/local/lib/python2.7/dist-packages/cnvlib/scatter.py", line 216, in chromosome_scatter
show_gene, window_width)
File "/usr/local/lib/python2.7/dist-packages/cnvlib/scatter.py", line 341, in select_range_genes
if cnarr else CNA([]))
File "/usr/local/lib/python2.7/dist-packages/skgenome/gary.py", line 369, in in_range
return self.as_dataframe(next(results))
File "/usr/local/lib/python2.7/dist-packages/future/builtins/newnext.py", line 59, in newnext
return iterator.next()
File "/usr/local/lib/python2.7/dist-packages/skgenome/intersect.py", line 92, in iter_ranges
subtable = table.iloc[region_idx]
File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1478, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py", line 2091, in _getitem_axis
return self._get_list_axis(key, axis=axis)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py", line 2073, in _get_list_axis
raise IndexError("positional indexers are out-of-bounds")
IndexError: positional indexers are out-of-bounds

Is there a workaround to get single chromosomes to work?

I also see a lot of RuntimeWarning's (even just with "cnvkit.ph -h") but I saw on another thread that they could probably be safely ignored.

etal added a commit that referenced this pull request Mar 20, 2019
In skgenome.intersect, if chromosome but no start/end coordinates are
given, both iter_range and idx_ranges were trimming the table to the
specified chromosome. Apparently that was messing up the dataframe's
index. Now, iter_ranges leaves that work to idx_ranges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants