Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mouse Genome Support #52

Merged
merged 36 commits into from
Nov 4, 2019
Merged

Mouse Genome Support #52

merged 36 commits into from
Nov 4, 2019

Conversation

apeltzer
Copy link
Member

@apeltzer apeltzer commented Oct 21, 2019

This is starting to add support for Mouse Genomes data to Sarek.

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

I will work on this ASAP, test locally and then provide both testing data AND the required reference data for iGenomes wherever possible / necessary.

conf/igenomes.config Outdated Show resolved Hide resolved
conf/igenomes.config Outdated Show resolved Hide resolved
conf/igenomes.config Outdated Show resolved Hide resolved
@apeltzer apeltzer marked this pull request as ready for review October 31, 2019 09:51
@apeltzer
Copy link
Member Author

This just needs tests for mouse stuff now - otherwise, we need to make the reference data available via iGenomes now.

@apeltzer apeltzer changed the title WIP: Mouse Genome Support Mouse Genome Support Oct 31, 2019
@maxulysse maxulysse requested a review from a team October 31, 2019 10:15
@maxulysse maxulysse added the enhancement New feature or request label Oct 31, 2019
docs/reference.md Outdated Show resolved Hide resolved
docs/reference.md Outdated Show resolved Hide resolved
Add changes by Maxime

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>
Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job, thanks a lot for this PR.

@maxulysse maxulysse requested a review from a team October 31, 2019 12:05
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>
Copy link
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

If you havent already it will be worth checking that the .fai files match with iGenomes in terms of chromosome size and ordering otherwise it may break things.
ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa.fai

bwaIndex = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa.{amb,ann,bwt,pac,sa}"
chrDir = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Chromosomes"
chrLength = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Length/GRCm38.len"
dbsnp = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/Annotation/mgp.v5.merged.snps_all.dbSNP142.vcf.gz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you download these files from @apeltzer ? Looks like they are the latest but may be worth documenting that somewhere e.g.

dbSNP files for GRCm38 were downloaded from ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/ on [date].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are quite difficult to find so worth being a bit more transparent whilst things are fresh!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a PR to AWS-iGenomes where I'm adding these already:

ewels/AWS-iGenomes#7

Didn't know exactly where this info should be dropped, but that felt like a logical thing to do as others not using Sarek might use the files too and can find the information about where I got these files from there then, too ?

The .fai files and .dict match - double checked that and also ran local test data (~50GB) with these change to check that everything is fine 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants