Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "intersect" method to GenomicArray #340

Closed
etal opened this issue Apr 8, 2018 · 1 comment
Closed

Add "intersect" method to GenomicArray #340

etal opened this issue Apr 8, 2018 · 1 comment
Labels

Comments

@etal
Copy link
Owner

etal commented Apr 8, 2018

Just collect the results of "iter_ranges" into a single DataFrame and return a new GenomicArray.

This was an odd oversight, but apparently CNVkit has been using iter_ranges instead to do what it needs.

@etal etal added the skgenome label Apr 8, 2018
@etal
Copy link
Owner Author

etal commented May 2, 2018

For speed: Collect the matched row indices and do a single slice operation on the original dataframe -- should be much faster than the current approach of extracting each row into a tuple and collecting the tuples into a new dataframe. (See #346)

That won't work on its own if mode='trim' is specified. In that case, iter_ranges provides a simple solution. It may still be significantly faster to do the index-only approach as with mode='inner', and then separately collect just the bins that need to be trimmed, generate them separately, and concatenate the two dataframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant