-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-threading not working? #74
Comments
Dear Adam, the bottleneck is most likely the disk IO then. In angsd only
the analysis part is threaded not the file reading.
…On Sat, Mar 11, 2017 at 2:45 PM, Adam H. Freedman ***@***.***> wrote:
Both for my own projects, and to do testing so as to provide guidelines
for our cluster users, I've been running a handful of analyses with angsd
on a fairly big data set, and allocating 16 threads (-P 16). Every time
I've checked running processes with htop, angsd is only using one core. Not
sure why this would be the case.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#74>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGDo7jqpMPhU31cpIMyo83PcEFKbxNzDks5rkqWBgaJpZM4MaNR5>
.
|
Hello, I'm pretty new to angsd. I never get more than one process running. So this for 207 smallish bam files takes 5 hours! Is this likely to be related to my system or is an expected angsd behaviour? Are there other analysis that should correctly run multi-threading so I can test? |
Hi, I have encountered the same problem last year. The bottleneck is NOT just disk IO, since the process of HWE/MAF/SAF calculation can be considerably sped up with the following workaround:
Example:
claudius |
Hi Claudius it is only the analysis part in angsd that is threaded, not the filereading. Internally in the program the main process is the process that does the filereading, whenever a chunk has been read across all files it will spawn a thread that will do the analysis followed by printing.
I completely agree that a huge speedup can be achieved by parallelizing the filereading itself, and this is something we are planning to do at some point.
Thanks for writing us, userinput is always appreciated.
Best
… On 14 May 2017, at 02.23, Claudius Kerth ***@***.***> wrote:.
Hi,
I have encountered the same problem last year. The bottleneck is NOT just disk IO, since the process of HWE/MAF/SAF calculation can be considerably sped up with the following workaround:
split your regions file into a few dozen files with the Unixsplit command
spawn a separate angsd process for each of these small regions files with GNU parallel
combine the resulting output files
Example:
split -l 500 keep.rf SPLIT_RF/
ls SPLIT_RF/* | parallel -j 12 "angsd -rf {} -bam bamfile.list -ref Big_Data_ref.fa
-out PCA/GlobalMAFprior/EryPar.{/} -only_proper_pairs 0 -sites keep.sites -minMapQ 5
-baq 1 -doCounts 1 -GL 1 -domajorminor 1 -doMaf 1 -skipTriallelic 1
-SNP_pval 1e-3 -doGeno 32 -doPost 1"
combine output files with either Unix cat (*geno and *maf files) or realSFS cat (*saf files, but see issue #60 <#60>)
claudius
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#74 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGDo7hviX3E0M6774mNWRhEYzHX8Auiwks5r5fUpgaJpZM4MaNR5>.
|
Hi ANGSD,
Sure. The Since I don't know how to submit a feature request on github: phasing and LD estimation from genotype likelihoods would be awesome. claudius |
Im closing this issue, feel free to reopen if needed. |
For those looking for an equivalent to @claudiuskerth's solution for beagle likelihoods, the follwing worked for me (e.g. using 40 threads):
|
Both for my own projects, and to do testing so as to provide guidelines for our cluster users, I've been running a handful of analyses with angsd on a fairly big data set, and allocating 16 threads (-P 16). Every time I've checked running processes with htop, angsd is only using one core. Not sure why this would be the case.
The text was updated successfully, but these errors were encountered: