-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mouse Genome Support #52
Conversation
This just needs tests for mouse stuff now - otherwise, we need to make the reference data available via iGenomes now. |
Add changes by Maxime Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing job, thanks a lot for this PR.
Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
If you havent already it will be worth checking that the .fai
files match with iGenomes
in terms of chromosome size and ordering otherwise it may break things.
ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa.fai
bwaIndex = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa.{amb,ann,bwt,pac,sa}" | ||
chrDir = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Chromosomes" | ||
chrLength = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Length/GRCm38.len" | ||
dbsnp = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/Annotation/mgp.v5.merged.snps_all.dbSNP142.vcf.gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did you download these files from @apeltzer ? Looks like they are the latest but may be worth documenting that somewhere e.g.
dbSNP
files for GRCm38
were downloaded from ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/
on [date].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These files are quite difficult to find so worth being a bit more transparent whilst things are fresh!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a PR to AWS-iGenomes where I'm adding these already:
Didn't know exactly where this info should be dropped, but that felt like a logical thing to do as others not using Sarek might use the files too and can find the information about where I got these files from there then, too ?
The .fai
files and .dict
match - double checked that and also ran local test data (~50GB) with these change to check that everything is fine 👍
This is starting to add support for Mouse Genomes data to Sarek.
PR checklist
nextflow run . -profile test,docker
).nf-core lint .
).docs
is updatedCHANGELOG.md
is updatedREADME.md
is updatedI will work on this ASAP, test locally and then provide both testing data AND the required reference data for iGenomes wherever possible / necessary.