-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Create Docker file for v1.0 #13
Comments
Progress done today:
Next Goal:
Possible hurdles:
|
Progress done today:
Possible hurdles:
Next Goal:
|
that's good progress! |
Progress done today: Possible hurdles: Next Goal:
@drmjc What should be outputted by the program once the program is run successfully on this bam file NA12878.grch38.subsampled.bam? |
@sflodin, ^ it might create an xlsx file instead, I don't recall |
Hi @sflodin, @J-Bradlee, the webserver is back up and running |
Progress done today:
this was fixed more permanently this time by this command. found here
• This lead to a new error unable to fix
• Compared v0.9 and v1.0 for discrepancies to try solve the error. Pretty sure it is a dependency issue. Seems like samtools isn't working on our docker image.
We think this is due to how the sampleInfo.txt file is being created properly due to the samtools not working. Possible hurdles:
Next steps:
|
Thanks gents. Looks like this issue has been seen by the conda devs - bioconda/bioconda-recipes#16529 Since samtools is one of the most widely used and tested bioinformatics tools, I recommend trying with the latest version, which may be easier to install. If the major version number used by clinsv and the latest are the same then the cli should be the same. if the major version number has changed, the you may need to update the syntax of any samtools commands that may have changed. I expect there are a modest number of changes that may need to be made. |
PS if you upgrade samtools version, then also upgrade the bcdtools and htslib to match this as they work in sync |
Progress done today:
A quick cat of the error file revealed:
This should be an easy fix, when we build the docker container again. Next Goal:
Possible hurdles:
|
Progress Done Today:
This seemed to fix the "refdata-bX not matching" error we having before. However we are still getting the same error described in the last post. We then tried running that command within the docker container for the v0.9. And we get a similar error:
Note that we are able to get v0.9 running when we execute ClinSV from outside it's docker container. We are just not able to get it to work whilst inside the container. This makes us think we are passing in the -ref and -p commands incorrectly into ClinSv itself. Or we have some path issues.
We have a feeling that the "cd" command is causing the script to be unable to locate the ref data + other file paths and thus fail and give us the described error message in the previous dot point. We have located that the script is being generated between line 304-340 of clinsv.pl, as show below. Note the command
Next Steps
Possible hurdles
|
Progress Done Today
The directory (called "test" in our case) clinSV v1.0 docker container should look like as follows:
With the environment variables in the host machine defined as:
And the Docker Command as:
Where DOCKER_CONTAINER_NAME is the pulled/built docker container from our v1.0 dockerfile. Also notice that the -ref command is slightly different, as we added "/refdata-b38" to the end of it.
Next Steps
Possible Hurdles
|
Progress Done Today Got our docker file to run on a google cloud VM with 16 vCPUs and 64gb of ram. It takes roughly 1 and a half hours for the first job to be done for the sub sampled bam. Then roughly 30 minutes each for each script created to run. I did run into this error about 5 hours in:
Looks like the start of error is at
Running that command does gives the md5 hash error as above. Quick google around made me believe this is an openssl issue, that happens when you use an older version of python (v2.7) which ended its support in 2020, with a newer version of openssl (which on centOS 8 in the docker file is Therefore I installed python3 and ran "pairend_distro-a1.py" with python3 I no longer had any errors. I did the following:
Got output:
Alternatively, I tried to downgrade to openssl v.1.0.0 from 2010, which I did my installing the binaries following this guide. However, running the pairend script still ran into the same error. Next Steps:
Possible Hurdles:
|
excellent progress @J-Bradlee. I agree with switching to python3 & that was on the roadmap already. Downgrading dependencies will just cause problems later... |
Strangely enough changing the all other python 2 commands leads to another crash. Changing just the command mention above to python3 works and allows for the "lumpy" section of the script to continue. Did run in to a new dependency issue with big wig being used in clinSV/scripts/add-depth-to-PE-SR-calls.pl , it is unable to load the module despite it being there. I'm going to figure out how to reinstall bigwig for perl and see if that does the trick. I'll work on it over the weekend. |
Progress Done Today Specific things we had to do:
Carrying over from the steps previously done by the last docker image, we were able to finish running clinsv without any errors. We get the following output:
I assume since we are using a subsampled BAM that is the reason no report was made, and is the reason for the @drmjc Could you please confirm is this is desired output? Or at least it makes sense? Next Steps
Possible Hurdles
|
Excellent progress @J-Bradlee, i totally agree with using the most recent Ubuntu LTS. What files are in the
If the first 2 files are missing, then I suspect that's a bug/feature that SV-CNV.vcf may not get created. Can you please run a full BAM file through ClinSV to assess this? |
I get none of those 3 outputs. The last thing that gets created is |
I've updated the 'apps' to run the clinsv pipeline on DNAnexus, and ran it successfully on a patient sample overnight. I'll run the subset BAM file today and see what results are generated by the pipeline. FWIW, in the DNAnexus implementation, it's the 'igv' creation step that writes out the txt and xlsx result files. I'm still digging into how much this is due to the DNAnexus implementation, vs the raw clinsv implementation... |
have you had much success with the 2to3 script? It fixes some obvious
things, but is far from perfect...
…On Fri, 19 Nov 2021 at 11:23, James Bradley ***@***.***> wrote:
Strangely enough changing the all other python 2 commands leads to another
crash. Changing just the command mention above to python3 works and allows
for the "lumpy" section of the script to continue. Did run in to a new
dependency issue with big wig being used in
clinSV/scripts/add-depth-to-PE-SR-calls.pl , it is unable to load the
module despite it being there. I'm going to figure out how to reinstall
bigwig for perl and see if that does the trick. I'll work on it over the
weekend.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEQQM7KOIWQ2GWKA7GRCPLUMWKIFANCNFSM5DZE3QRA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
The root cause of the previous dependency issues was due to the some of the dependencies being installed in the wrong place. We also added the incorrect directory to $PATH which caused issues when linking the dependencies. Progress Done Today Began solving the issue of dependencies being placed in the wrong directories. Through a bit of experimentation we were able to create a system that made all the symbolic links work however have not transferred that to the docker yet - at the moment this seems doable just ran out of time for the day. Next step: |
Progress Done Today
Just some other things to note:
Next steps
Possible hurdles
|
Currently running into this error during the lumpy step @/app/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.e:
The VCF files that generated the error messages: project_folder.MQ20.OpreProc.f1.vcf.gz Any idea on whats causing this @drmjc ? Its weird as I didn't seem to have this issue with the centOS 8 docker container previously. This is the ubuntu OS btw. |
Progress done last few days
this is different to how it was done in install md 38:
the extra "." in the first line creates a broken link which just refers to itself. Doing this allowed us to run the first step, which is still currently running at the time I am writing this. Next steps
Possible hurdles
the latest release of the speedseq library repo was 2017. We could try and use the most up to date CNVnator (which has GRch38 support) for the speedseq install, hopefully speedseq would still be compatible. Update 1 Update 2
Looking over the output file it seems only chr1 - chr22 and chrX, xhrY and chrM is being found by the program. Not sure if this is intended behavior, or how filterCNVNator.pl is attempting to get this data... |
Thanks for persevering James. i agree that using a v old tabix is a poor solution. i didn't think that it had changed that much over the years though, perhaps just the cli syntax? shame speedseq doesn't support x38; I think this was chosen as a convenient way to get CNVnator, so this suggests that we may not need speedseq anymore. It's completely ok that only chr1-22,X,T,MT are analysed, as these are the only chromosomes that we care about. Doing so shouldn't raise an error though... All the noise in the log file suggests we need a way to filter out superfluous contigs in a way that is flexible wrt the reference genome used. I'd suggest selecting chromosomes |
There was an issue with the cli syntax with the old tabix, we have changed it to version 1.4 as in the original clinsv docker image. This works fine.
Perhaps it would be best if we just install CNVnator directly from this repo, however, it doesn't show on the ReadMe some command options to CNVnator that are being used by ClinSV such as '--threads' which is set to 14 and "-w" /"--window" which is set to 100.
We just attempted to filter out all the superfluous contigs by adding the --exclude command to the cnvnator wrapper which ensured we only would process chr1 - chr22 and chrX, chrY and chrM. There is no T or MT found in the FR05812606.bam file, or at least cnvnator wrapper is not finding anything with its |
ok - T was a typo. there's 22 autosomes, 2 allosomes/sex chromosomes (X,Y) and 1 mitochondrial genome (MT or chrM). with out without the chr prefix |
Progress done today
Next Steps
Possible hurdles
Just a question has v1.0 ever been successfully run? Reason I ask this, is that we are wondering if errors like this have occurred before and if they didn't, then perhaps we have not set up ClinSV properly or there could be an issue with using Ubuntu 20.4 LTS. |
Progress done today
Next steps
Possible Hurdles
|
Damn good job
…On Thu, 27 Jan 2022, 12:35 pm James Bradley, ***@***.***> wrote:
*Progress done today*
- Following from the last update, did run into an error with pdflatex
not being installed, this was used to convert the report tex file to a
pdf. Installing tex live easily fixed this up.
- Did not run into anymore errors after that, and the job finished. It
successfully outputted a report and igv session file. See pdf here:
FR05812606.QC_report.pdf
<https://github.com/KCCG/ClinSV/files/7946629/FR05812606.QC_report.pdf>
. I have also attached the full project folder, where the igv session file
can be found under the igv folder.
- You can pull this current docker image with the command docker pull
mrbradley2/clinsv:v2.6
*Next steps*
- Do a complete rerun of the program, to make sure there are no errors
that have been skipped over. Since we have not been re-running previously
successful steps after we change anything.
- The docker image is oversized, there is a lot of redundancy of
programs installed (in wrong places) that we can get rid of. This should
reduce the size by roughly half, from around 9gb to 4gb.
- Once this is done, we will refactor the original docker file so it
can directly build all the changes we have made from scratch. (not sure if
this is too high priority as we can just used the image hosted in the
docker repo, and commit any changes to there)
- Upload all the code to this repo.
- Start building out a CI/CD pipline connected to this repo that can
automate testing and deployment to docker repository.
*Possible Hurdles*
- Running into more errors on the rerun.
- Recreating the docker build file might take a while to create, as
installing some of the program takes a long time (especially root). This
makes for tedious debugging.
- The CI/CD pipeline will require a bit of tinkering to get set up, it
will also cost money to use if we were to use a cloud platform to do a
quick smoke screen for certain steps.
—
Reply to this email directly, view it on GitHub
<#13 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEQQM2S44SMOV3T2OB2XFDUYCONTANCNFSM5DZE3QRA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Progress Done Today
Next steps
Possible Hurdles
|
Progress Done over last month
Next Steps
Possible Hurdles
|
excellent progress @J-Bradlee This public data (from the platinum genomes project) is aligned to hg19, so please test ClinSV against this NA12878 file.
is it time for a pull request? |
James, do you see the need to run ClinSV twice, or is this now resolved? see #22 (comment) |
I have never required it to run twice, it either fails trying to find the ref data on continues through. I have set up the ref data folder system a little bit differently than suggested on the read me. I'll add to the to do list to see if I can recreate that bug. I have also not touched the singularity container at all, and been working mainly on the docker container. I have also run into a crash when trying to run grch37 with V1.0 clinSV, can you confirm if clinSV V1.0 is backwards compatible? I just assumed it was... Otherwise I'll dig into the error logs and see if I can fix it. |
It's not backwards compatible... V0.9 was only for hs37d5, v1.0 was only
for x38.
The next enhancement is an obvious one... Allow the reference to be
specified on the cli and grab the right resource files on the fly.
…On Tue, 15 Mar 2022, 1:14 pm James Bradley, ***@***.***> wrote:
James, do you see the need to run ClinSV twice, or is this now resolved?
see #22 (comment)
<#22 (comment)>
I have never required it to run twice, it either fails trying to find the
ref data on continues through. I have set up the ref data folder system a
little bit differently than suggested on the read me. I'll add to the to do
list to see if I can recreate that bug. I have also not touched the
singularity container at all, and been working mainly on the docker
container.
I have also run into a crash when trying to run grch37 with V1.0 clinSV,
can you confirm if clinSV V1.0 is backwards compatible? I just assumed it
was... Otherwise I'll dig into the error logs and see if I can fix it.
—
Reply to this email directly, view it on GitHub
<#13 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEQQM22LS5N7UEJ345SB3LU77XBNANCNFSM5DZE3QRA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
great job @J-Bradlee |
Thanks @J-Bradlee, this is looking good. I see now why you wanted to drop the v0.9 / hs37d5 documentation from the README.md - I agree with you. Probably a moot point if v1.1.0 is soon to come. |
Hello, I have a little problem with ClinSV. Both versions on dockers 0.9 and 1.0, after some time from the start of processing, the analysis stops with the following message:
The lumpy.caller.joined.e file says that there is no bam file in the directory. This file should probably be generated during the analysis itself. Below is the content of the file:
Thanks in advance |
Create a docker file based on the install requirements found on install38.md .
So far here are some of the dependencies involved:
Perl package manager:
Perl Modules:
Python Module
CNVnator multi
R package:
The text was updated successfully, but these errors were encountered: