-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run-trust4 is very slow, what should I do to speed, my data is bulk BCR-seq #328
Comments
Which version are you using? What was your running command? |
./run-trust4 this is my script, |
Which version of TRUST4 is this? You can add the option "--repseq", which should improve the running time. Just want to confirm, is this UMI-based BCR-seq? |
yes,this is UMI-based BCR-seq. |
./run-trust4 --version |
If it UMI based and you know the read format, like where the UMI sequence located in the read. You can use the options "--barcodeLevel molecule --barcode xxx --readFormat XXX", where TRUST4 can regard the UMI as barcode to speed up the data processing. Since your file is super large and you are using the github version v1.1.5, you can use options like "--skipReadRealign" and "--contigMinCov 3" to further speed up the process. The "--contigMinCov" will filter UMI with less the specified number of reads, so "--contigMinCov" will only generate the results for UMIs with at least 3 reads. |
./run-trust4 |
You don't need the "--repseq" option for UMI-based BCR-seq. The --barcode is indicating the file containing the barcode, and --readFomart is a way to describe where the barcode and read sequence located in the "-1,-2,--barcode" files. Examples for that can be found in the README: https://github.com/liulab-dfci/TRUST4?tab=readme-ov-file#10x-genomics-data-and-barcode-based-single-cell-data |
运行TRUST4命令./run-trust4 ... |
Yes, this is too slow. What is the running command of "trust4" main program in the running log? |
I used nohup bash run_trust4_job.sh , this is script : |
[Sun Nov 17 16:11:04 2024] Found 109640387 reads. |
[Sun Nov 17 16:11:04 2024] Found 109640387 reads. |
As you can see, there are many options did not pass to the "trust4" main program, such as --barcode and the thread number (-t). I guess there might be some typo in the bash script, or you still ran it with the old parameter setting. |
If there are syntax errors in the script, it will not run properly. Because I wrote syntax errors in my script before, the program could not run. |
I try new script, check there no problem: |
I found that no matter how many threads I set up, it ultimately runs in a single thread. |
I think that the program may not be using multithreading correctly. In the logs, both fastq-extractor and trust4 indeed received the -t 120 parameter, but whether they correctly parsed and applied it internally needs to be checked in their documentation or source code. |
What is the command of "trust4" command in the running log now? |
[Tue Nov 5 13:52:15 2024] Read in and count kmers for 112200000 reads.
[Tue Nov 5 15:05:27 2024] Found 109640387 reads.
[Tue Nov 5 15:07:42 2024] Finish sorting the reads.
[Tue Nov 5 15:15:52 2024] Finish rough annotations.
[Tue Nov 5 15:17:52 2024] Processed 100000 reads (96984 are used for assembly).
[Tue Nov 5 15:18:08 2024] Processed 200000 reads (116845 are used for assembly).
[Tue Nov 5 15:18:09 2024] Processed 300000 reads (164767 are used for assembly).
[Tue Nov 5 15:18:09 2024] Processed 400000 reads (242688 are used for assembly).
...
[Sun Nov 10 01:47:45 2024] Processed 6800000 reads (5988662 are used for assembly).
[Sun Nov 10 07:57:21 2024] Processed 6900000 reads (6088107 are used for assembly).
[Sun Nov 10 12:22:23 2024] Processed 7000000 reads (6173861 are used for assembly).
[Sun Nov 10 20:38:25 2024] Processed 7100000 reads (6273293 are used for assembly).
[Mon Nov 11 03:37:36 2024] Processed 7200000 reads (6367994 are used for assembly).
[Mon Nov 11 08:49:15 2024] Processed 7300000 reads (6456989 are used for assembly).
[Mon Nov 11 15:24:26 2024] Processed 7400000 reads (6556401 are used for assembly).
[Mon Nov 11 20:34:51 2024] Processed 7500000 reads (6639358 are used for assembly).
[Tue Nov 12 04:52:51 2024] Processed 7600000 reads (6739051 are used for assembly).
The text was updated successfully, but these errors were encountered: