Error: Cannot open input sequence files. output/clean.main.unmapped_1.fastq output/clean.main.unmapped_2.fastq #24

JianGuoZhou3 · 2020-08-09T21:44:45Z

cd ~/biosoft
cd NCLscan

./NCLscan.py -c ./NCLscan.config -pj clean -o output --fq1 EGAR00001653004_1.fastq.gz --fq2 EGAR00001653004_2.fastq.gz
Unknown command "view".
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[E::main_mem] fail to open file `EGAR00001653004_1.fastq.gz'.
Unknown command "view".
gzip: EGAR00001653004_1.fastq.gz: No such file or directory
gzip: EGAR00001653004_2.fastq.gz: No such file or directory
Sequence file is empty: output/clean.main.bwa.unmapped_1.fastq
Error: Cannot open input sequence files. output/clean.main.bwa.unmapped_1.fastq output/clean.main.bwa.unmapped_2.fastq
Unknown command "view".
End of file reading 4 bytes
Total time cost = 0.00259613990784 sec
PslChimeraFilter v0.4

End of file reading 4 bytes
Total time cost = 0.00266909599304 sec
End of file reading 4 bytes
Total time cost = 0.00257301330566 sec
End of file reading 4 bytes
Total time cost = 0.0025110244751 sec
JunctionSite2BED v0.3

Reading annotations on chr1.
Reading annotations on chr2.
Reading annotations on chr3.
Reading annotations on chr4.
Reading annotations on chr5.
Reading annotations on chr6.
Reading annotations on chr7.
Reading annotations on chr8.
Reading annotations on chr9.
Reading annotations on chr10.
Reading annotations on chr11.
Reading annotations on chr12.
Reading annotations on chr13.
Reading annotations on chr14.
Reading annotations on chr15.
Reading annotations on chr16.
Reading annotations on chr17.
Reading annotations on chr18.
Reading annotations on chr19.
Reading annotations on chr20.
Reading annotations on chr21.
Reading annotations on chr22.
Reading annotations on chrX.
Reading annotations on chrY.
Reading annotations on chrM.
Read 60669 genes, 228048 transcripts and 1378888 exons from the gtf file.

novoindex (4.2) - Universal k-mer index constructor.

(C) 2008 - 2011 NovoCraft Technologies Sdn Bhd

novoindex output/clean.JS.ndx output/clean.JS.fa

Creating 23 indexing threads.

Building with 9-mer and step of 1 bp.

novoindex construction dT = 0.0s

Index memory size 0.001Gbyte.

Done.

Sequence file is empty: output/clean.main.unmapped_1.fastq
Error: Cannot open input sequence files. output/clean.main.unmapped_1.fastq
Sequence file is empty: output/clean.main.unmapped_2.fastq
Error: Cannot open input sequence files. output/clean.main.unmapped_2.fastq
End of file reading 4 bytes
Total time cost = 0.00268507003784 sec
End of file reading 4 bytes
Total time cost = 0.00269103050232 sec
PslChimeraFilter v0.4

JunctionSite2BED v0.3

Reading annotations on chr1.
Reading annotations on chr2.
Reading annotations on chr3.
Reading annotations on chr4.
Reading annotations on chr5.
Reading annotations on chr6.
Reading annotations on chr7.
Reading annotations on chr8.
Reading annotations on chr9.
Reading annotations on chr10.
Reading annotations on chr11.
Reading annotations on chr12.
Reading annotations on chr13.
Reading annotations on chr14.
Reading annotations on chr15.
Reading annotations on chr16.
Reading annotations on chr17.
Reading annotations on chr18.
Reading annotations on chr19.
Reading annotations on chr20.
Reading annotations on chr21.
Reading annotations on chr22.
Reading annotations on chrX.
Reading annotations on chrY.
Reading annotations on chrM.
Read 60669 genes, 228048 transcripts and 1378888 exons from the gtf file.

novoindex (4.2) - Universal k-mer index constructor.

(C) 2008 - 2011 NovoCraft Technologies Sdn Bhd

novoindex output/clean.JS2.ndx output/clean.JS2.fa

Creating 23 indexing threads.

Building with 9-mer and step of 1 bp.

novoindex construction dT = 0.0s

Index memory size 0.001Gbyte.

Done.

Sequence file is empty: output/clean.main.unmapped_1.fastq
Error: Cannot open input sequence files. output/clean.main.unmapped_1.fastq output/clean.main.unmapped_2.fastq
End of file reading 4 bytes
Total time cost = 0.00274515151978 sec
End of file reading 4 bytes
Total time cost = 0.00262308120728 sec
End of file reading 4 bytes
Total time cost = 0.00268697738647 sec
End of file reading 4 bytes
Total time cost = 0.00296902656555 sec
Traceback (most recent call last):
File "/home/rstudio/biosoft/NCLscan/bin/Add_read_count.py", line 118, in
add_read_count(args.result_tmp_file, args.result_sam_file, args.output, args.JSParser_bin)
File "/home/rstudio/biosoft/NCLscan/bin/Add_read_count.py", line 13, in add_read_count
all_junc_read_with_ref = get_junc_read(result_sam_data, JSParser_bin)
File "/home/rstudio/biosoft/NCLscan/bin/Add_read_count.py", line 66, in get_junc_read
junc_read_data = get_read_with_ref(junc_read_sam_data)
File "/home/rstudio/biosoft/NCLscan/bin/Add_read_count.py", line 46, in get_read_with_ref
ref_id = re.sub(".[0-9]*$", "", line[2])
IndexError: list index out of range
Traceback (most recent call last):
File "/home/rstudio/biosoft/NCLscan/bin/get_gene_name.py", line 91, in
add_gene_name(args.result_tmp_file, args.gene_anno, args.output)
File "/home/rstudio/biosoft/NCLscan/bin/get_gene_name.py", line 8, in add_gene_name
result_tmp_data = read_TSV(result_tmp_file)
File "/home/rstudio/biosoft/NCLscan/bin/get_gene_name.py", line 64, in read_TSV
with open(tsv_file) as data_reader:
IOError: [Errno 2] No such file or directory: 'output/clean.result.tmp2'
Traceback (most recent call last):
File "./NCLscan.py", line 448, in
NCL_Scan4(config, datasets_list, args.project_name, args.output_dir)
File "./NCLscan.py", line 255, in NCL_Scan4
final_tmp = read_TSV("{prefix}.result.tmp3".format(**config_options))
File "./NCLscan.py", line 279, in read_TSV
with open(tsv_file) as data_reader:
IOError: [Errno 2] No such file or directory: 'output/clean.result.tmp3'

Please help me check those errors.
Best,

JianGuoZhou3 · 2020-08-09T21:45:19Z

@TreesLab @chiangtw

chiangtw · 2020-08-10T05:45:15Z

Hi,

Please check if the path of samtools in the NCLscan.config was assigned correctly.

And from the error messages, there are two more issues need to be noticed:

Make sure the paths to the input fastq files correct.
Use novoalign V3 instead.
(V4 is still not supported by NCLscan)

Thanks for reporting,
tw

JianGuoZhou3 · 2020-08-10T12:53:00Z

Hi,

Please check if the path of samtools in the NCLscan.config was assigned correctly.

And from the error messages, there are two more issues need to be noticed:

Make sure the paths to the input fastq files correct.

Use novoalign V3 instead.
(V4 is still not supported by NCLscan)

Thanks for reporting,
tw

Thanks for your method.
Right now, might be fixed.
But I re-Builded the reference index

   cd ~/biosoft 
   cd NCLscan
 ./bin/create_reference.py -c NCLscan.config

But still have some errors.

kn) [rstudio@Zhou-EU-2 NCLscan]$  ./bin/create_reference.py -c NCLscan.config
[bwa_index] Pack FASTA... 34.57 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=6739091948, availableWord=486186984
[BWTIncConstructFromPacked] 10 iterations done. 99999996 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 199999996 characters processed.
[BWTIncConstructFromPacked] 30 iterations done. 299999996 characters processed.
[BWTIncConstructFromPacked] 40 iterations done. 399999996 characters processed.
[BWTIncConstructFromPacked] 50 iterations done. 499999996 characters processed.
[BWTIncConstructFromPacked] 60 iterations done. 599999996 characters processed.
[BWTIncConstructFromPacked] 70 iterations done. 699999996 characters processed.
[BWTIncConstructFromPacked] 80 iterations done. 799999996 characters processed.
[BWTIncConstructFromPacked] 90 iterations done. 899999996 characters processed.
[BWTIncConstructFromPacked] 100 iterations done. 999999996 characters processed.
[BWTIncConstructFromPacked] 110 iterations done. 1099999996 characters processed.
[BWTIncConstructFromPacked] 120 iterations done. 1199999996 characters processed.
[BWTIncConstructFromPacked] 130 iterations done. 1299999996 characters processed.
[BWTIncConstructFromPacked] 140 iterations done. 1399999996 characters processed.
[BWTIncConstructFromPacked] 150 iterations done. 1499999996 characters processed.
[BWTIncConstructFromPacked] 160 iterations done. 1599999996 characters processed.
[BWTIncConstructFromPacked] 170 iterations done. 1699999996 characters processed.
[BWTIncConstructFromPacked] 180 iterations done. 1799999996 characters processed.
[BWTIncConstructFromPacked] 190 iterations done. 1899999996 characters processed.
[BWTIncConstructFromPacked] 200 iterations done. 1999999996 characters processed.
[BWTIncConstructFromPacked] 210 iterations done. 2099999996 characters processed.
[BWTIncConstructFromPacked] 220 iterations done. 2199999996 characters processed.
[BWTIncConstructFromPacked] 230 iterations done. 2299999996 characters processed.
[BWTIncConstructFromPacked] 240 iterations done. 2399999996 characters processed.
[BWTIncConstructFromPacked] 250 iterations done. 2499999996 characters processed.
[BWTIncConstructFromPacked] 260 iterations done. 2599999996 characters processed.
[BWTIncConstructFromPacked] 270 iterations done. 2699999996 characters processed.
[BWTIncConstructFromPacked] 280 iterations done. 2799999996 characters processed.
[BWTIncConstructFromPacked] 290 iterations done. 2899999996 characters processed.
[BWTIncConstructFromPacked] 300 iterations done. 2999999996 characters processed.
[BWTIncConstructFromPacked] 310 iterations done. 3099999996 characters processed.
[BWTIncConstructFromPacked] 320 iterations done. 3199999996 characters processed.
[BWTIncConstructFromPacked] 330 iterations done. 3299999996 characters processed.
[BWTIncConstructFromPacked] 340 iterations done. 3399999996 characters processed.
[BWTIncConstructFromPacked] 350 iterations done. 3499999996 characters processed.
[BWTIncConstructFromPacked] 360 iterations done. 3599999996 characters processed.
[BWTIncConstructFromPacked] 370 iterations done. 3699999996 characters processed.
 [BWTIncConstructFromPacked] 380 iterations done. 3799999996 characters processed.
[BWTIncConstructFromPacked] 390 iterations done. 3899999996 characters processed.
[BWTIncConstructFromPacked] 400 iterations done. 3999999996 characters processed.
[BWTIncConstructFromPacked] 410 iterations done. 4099999996 characters processed.
[BWTIncConstructFromPacked] 420 iterations done. 4199999996 characters processed.
[BWTIncConstructFromPacked] 430 iterations done. 4299999996 characters processed.
[BWTIncConstructFromPacked] 440 iterations done. 4399999996 characters processed.
[BWTIncConstructFromPacked] 450 iterations done. 4499999996 characters processed.
[BWTIncConstructFromPacked] 460 iterations done. 4599999996 characters processed.
[BWTIncConstructFromPacked] 470 iterations done. 4699999996 characters processed.
[BWTIncConstructFromPacked] 480 iterations done. 4799999996 characters processed.
[BWTIncConstructFromPacked] 490 iterations done. 4899999996 characters processed.
[BWTIncConstructFromPacked] 500 iterations done. 4999999996 characters processed.
[BWTIncConstructFromPacked] 510 iterations done. 5099999996 characters processed.
[BWTIncConstructFromPacked] 520 iterations done. 5199999996 characters processed.
[BWTIncConstructFromPacked] 530 iterations done. 5299999996 characters processed.
[BWTIncConstructFromPacked] 540 iterations done. 5399999996 characters processed.
[BWTIncConstructFromPacked] 550 iterations done. 5499999996 characters processed.
[BWTIncConstructFromPacked] 560 iterations done. 5599999996 characters processed.
[BWTIncConstructFromPacked] 570 iterations done. 5699999996 characters processed.
[BWTIncConstructFromPacked] 580 iterations done. 5799999996 characters processed.
[BWTIncConstructFromPacked] 590 iterations done. 5899999996 characters processed.
[BWTIncConstructFromPacked] 600 iterations done. 5999999996 characters processed.
[BWTIncConstructFromPacked] 610 iterations done. 6099261676 characters processed.
[BWTIncConstructFromPacked] 620 iterations done. 6189682860 characters processed.
[BWTIncConstructFromPacked] 630 iterations done. 6270045420 characters processed.
[BWTIncConstructFromPacked] 640 iterations done. 6341467836 characters processed.
[BWTIncConstructFromPacked] 650 iterations done. 6404944284 characters processed.
[BWTIncConstructFromPacked] 660 iterations done. 6461358300 characters processed.
[BWTIncConstructFromPacked] 670 iterations done. 6511495228 characters processed.
[BWTIncConstructFromPacked] 680 iterations done. 6556052988 characters processed.
[BWTIncConstructFromPacked] 690 iterations done. 6595652028 characters processed.
[BWTIncConstructFromPacked] 700 iterations done. 6630843740 characters processed.
[BWTIncConstructFromPacked] 710 iterations done. 6662118188 characters processed.
[BWTIncConstructFromPacked] 720 iterations done. 6689910956 characters processed.
[BWTIncConstructFromPacked] 730 iterations done. 6714609244 characters processed.
[BWTIncConstructFromPacked] 740 iterations done. 6736557052 characters processed.
[bwt_gen] Finished constructing BWT in 742 iterations.
[bwa_index] 2590.57 seconds elapse.
[bwa_index] Update BWT... 39.07 sec
[bwa_index] Pack forward-only FASTA...
25.69 sec
[bwa_index] Construct SA from BWT and Occ... 1185.27 sec
[main] Version: 0.7.17-r1188
[main] CMD: /home/rstudio/miniconda2/envs/kn/bin/bwa index /home/rstudio/hdd/reference/hg38_ek12/AllRef.fa
[main] Real time: 4754.727 sec; CPU: 3875.171 sec
# novoindex (3.9) - Universal k-mer index constructor.
# (C) 2008 - 2011 NovoCraft Technologies Sdn Bhd
# novoindex /home/rstudio/hdd/reference/hg38_ek12/AllRef.ndx /home/rstudio/hdd/reference/hg38_ek12/AllRef.fa
# Creating 23 indexing threads.
# Building with 14-mer and step of 2 bp.
tcmalloc: large alloc 1073750016 bytes == 0x1954000 @  0x4008e4 0x56ca89 0x40408c 0x40127b 0x4d21bb 0x402845
tcmalloc: large alloc 9131360256 bytes == 0x419de000 @  0x4008e4 0x56d7d3 0x40423a 0x40127b 0x4d21bb 0x402845
Error: Invalid NA code in /home/rstudio/hdd/reference/hg38_ek12/AllRef.fa at line 54452646.

JianGuoZhou3 · 2020-08-11T21:01:35Z

NCLscan.config

#############################
### NCLscan Configuration ###
#############################

## The directory of NCLscan
NCLscan_dir = /home/rstudio/biosoft/NCLscan


## The directory of references and indices
## The script "create_reference.py" would create the needed references and indices here.
NCLscan_ref_dir = /home/rstudio/hdd/reference/hg38_ek12


## The following four reference files can be downloaded from the GENCODE website (http://www.gencodegenes.org/).

## The reference genome sequence, eg. /path/to/GRCh37.p13.genome.fa
Reference_genome = /home/rstudio/hdd/reference/hg38_ek12/GRCh38.p13.genome.fa

## The gene annotation file, eg. /path/to/gencode.v19.annotation.gtf
Gene_annotation = /home/rstudio/hdd/reference/hg38_ek12/gencode.v34.annotation.gtf

## The protein-coding transcript sequences, eg. /path/to/gencode.v19.pc_transcripts.fa
Protein_coding_transcripts =  /home/rstudio/hdd/reference/hg38_ek12/gencode.v34.pc_translations.fa

## The long non-coding RNA transcript sequences, eg. /path/to/gencode.v19.lncRNA_transcripts.fa
lncRNA_transcripts =  /home/rstudio/hdd/reference/hg38_ek12/gencode.v34.lncRNA_transcripts.fa


## External tools
bedtools_bin      = /home/rstudio/miniconda2/envs/kn/bin/bedtools
blat_bin          = /home/rstudio/miniconda2/envs/kn/bin/blat
bwa_bin           = /home/rstudio/miniconda2/envs/kn/bin/bwa
samtools_bin      = /home/rstudio/miniconda2/envs/kn/bin/samtools
novoalign_bin     = /home/rstudio/biosoft/novocraft/novoalign
novoindex_bin     = /home/rstudio/biosoft/novocraft/novoindex


## Bin
NCLscan_bin = {NCLscan_dir}/bin

Add_read_count_bin      = {NCLscan_bin}/Add_read_count.py
AssembleExons_bin       = {NCLscan_bin}/AssembleExons
AssembleFastq_bin       = {NCLscan_bin}/AssembleFastq
AssembleJSeq_bin        = {NCLscan_bin}/AssembleJSeq.py
FastqOut_bin            = {NCLscan_bin}/FastqOut
get_gene_name_bin       = {NCLscan_bin}/get_gene_name.py
GetInfo_bin             = {NCLscan_bin}/GetInfo
GetKey_bin              = {NCLscan_bin}/GetKey
GetNameB4Dot_bin        = {NCLscan_bin}/GetNameB4Dot
InsertInList_bin        = {NCLscan_bin}/InsertInList
JSFilter_bin            = {NCLscan_bin}/JSFilter
JSParser_bin            = {NCLscan_bin}/JSParser
JunctionSite2BED_bin    = {NCLscan_bin}/JunctionSite2BED
mp_blat_bin             = {NCLscan_bin}/mp_blat.py
PslChimeraFilter_bin    = {NCLscan_bin}/PslChimeraFilter
RemoveInList_bin        = {NCLscan_bin}/RemoveInList
RetainInList_bin        = {NCLscan_bin}/RetainInList
RmBadMapping_bin        = {NCLscan_bin}/RmBadMapping
RmColinearPairInSam_bin = {NCLscan_bin}/RmColinearPairInSam
RmRedundance_bin        = {NCLscan_bin}/RmRedundance
SeqOut_bin              = {NCLscan_bin}/SeqOut



###########################
### Advanced parameters ###
###########################

## The following two parameters indicate the maximal read length (L) and fragment size of the used paired-end RNA-seq data (FASTQ files), where fragment size = 2L + insert size. 
## If L > 151, the users should change these two parameters to (L, 2L + insert size).
max_read_len      = 151
max_fragment_size = 500


## The base quality threshold. The value should be a non-negative integer.
quality_score = 20

## The collection of the supporting reads must span the NCL junction boundary by the setting size of span range on both sides of the junction site.
span_range = 50


###################
### Performance ###
###################

## Parameters for bwa mem
## The number of threads
bwa-mem-t = 10

## Parameters for mp_blat.py
## The number of processes for running blat
##
## NOTE: The memory usage of each blat process would be up to 4 GB!
##
mp_blat_process = 1

chiangtw · 2020-08-12T02:35:52Z

Hi,

...
## The protein-coding transcript sequences, eg. /path/to/gencode.v19.pc_transcripts.fa
Protein_coding_transcripts = /home/rstudio/hdd/reference/hg38_ek12/gencode.v34.pc_translations.fa
...

Please use "gencode.v34.pc_transcripts.fa" instead.

tw

JianGuoZhou3 · 2020-08-12T11:36:28Z

Hi tw, thanks,
Now it's works.
Jian-Guo

JianGuoZhou3 · 2020-08-12T19:54:32Z

Hi tw, Could you please share some protocol of annotation of circRNA? I check there is a package “ circRNAprofiler: An R-Based Computational Framework for the Downstream Analysis of Circular RNAs”, but for NCLscan is only for hg19, but my work is base on hg38. Best, Jian-Guo

…

在 2020年8月12日，04:36，Tai-Wei Chiang ***@***.***> 写道： Hi, ... ## The protein-coding transcript sequences, eg. /path/to/gencode.v19.pc_transcripts.fa Protein_coding_transcripts = /home/rstudio/hdd/reference/hg38_ek12/gencode.v34.pc_translations.fa ... Please use "gencode.v34.pc_transcripts.fa" instead. tw — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE2DUKPGSOANACRAVUKT663SAH5ZJANCNFSM4PZMMBVQ>.

chiangtw · 2020-08-13T06:42:08Z

Ah...
I'm not really sure what you need.

If you need coordinates converting, you can use liftover.

But if it is about circRNAprofiler, maybe you should just open an issue on their GitHub page,
I think they can help you better.

JianGuoZhou3 · 2020-08-13T08:07:33Z

Thanks, i will contact circRNAprofiler.
Best,

JianGuoZhou3 · 2020-08-13T08:18:48Z

I used NCLscan

 ./NCLscan.py -c ./NCLscan.config -pj clean -o output --fq1 simu_5X_100PE_1.fastq --fq2 simu_5X_100PE_2.fastq

After run,

 cd output
(base) [rstudio@Zhou-EU-2 output]$ ls
all.clean.1.result          clean.3b.info          clean.JS2.Idx           clean.chi.bed                   clean.main.bwa.unmapped_1.fastq  clean.preJS.seq2
all.clean.1.sam             clean.4.info           clean.JS2.cleaned.info  clean.chi0.bed                  clean.main.bwa.unmapped_2.fastq  clean.preResult
all.clean.1b.result         clean.4b.info          clean.JS2.fa            clean.chi2.bed                  clean.main.sam                   clean.rG.non_un.psl
all.clean.1b.sam            clean.JS.GRCh37.2.psl  clean.JS2.info          clean.chi3.bed                  clean.main.um3.fa                clean.rG.psl
all.clean.JS.Parsered.sam   clean.JS.GRCh37.psl    clean.JS2.info_1        clean.chrM.2.psl                clean.main.unmapped.fa           clean.rG.um3.psl
all.clean.JS.sam            clean.JS.cleaned.info  clean.JS2.info_2        clean.chrM.um3.psl              clean.main.unmapped.sam          clean.result
all.clean.JS2.sam           clean.JS.clear.fa      clean.JS2.ndx           clean.coding.2.psl              clean.main.unmapped.sam.id       clean.result.info
all.clean.JS2b.sam          clean.JS.fa            clean.JS2.preIdx        clean.coding.um3.psl            clean.main.unmapped_1.fastq      clean.result.sam
all.clean.b.result          clean.JS.info          clean.JS2.prefa         clean.colinear.psl              clean.main.unmapped_2.fastq      clean.result.tmp
all.clean.result            clean.JS.info_1        clean.JS2.result.info   clean.info                      clean.main_1.JS.sam              clean.result.tmp2
all.clean.um3.colinear.psl  clean.JS.info_2        clean.JS2.seq           clean.linearJS                  clean.main_1.um3.fastq           clean.result.tmp3
all.clean.um3.fa            clean.JS.ndx           clean.JS2.seq_1         clean.lncRNA.2.psl              clean.main_2.JS.sam              clean.tmp.info
clean.2.chi.bed             clean.JS.seq           clean.JS2.seq_2         clean.lncRNA.um3.psl            clean.main_2.um3.fastq           clean.tmp2.info
clean.2.info                clean.JS.seq12         clean.PreJS2.bed        clean.main.JS2.sam              clean.ncl.sam                    clean.un.psl
clean.2.info.GRCh37.psl     clean.JS.seq_1         clean.PreJS2.info       clean.main.JS2b.sam             clean.preJS.bed                  clean.unmapped.2.fa
clean.2.info.fa             clean.JS.seq_2         clean.PreJS2.info2      clean.main.bwa.bam              clean.preJS.info                 clean.unmapped.fa
clean.2b.info               clean.JS1.Idx          clean.PreJS2.seq        clean.main.bwa.unmapped.sam     clean.preJS.info2                temp.list
clean.3.info                clean.JS1.preIdx       clean.PreJS2.seq2       clean.main.bwa.unmapped.sam.id  clean.preJS.seq                  tmp.info

But, I don't know keep which docunments and input to circRNAprofiler.
Best,

JianGuoZhou3 · 2020-08-13T16:37:04Z

Hi tw,
Could please share comments about how to accelerate this program?
I tested one sample with 6h （9:43 am to 4:29 pm）
One of my comments is run NCL_Scan1-4 separately.

    NCL_Scan1(config, datasets_list, args.output_dir)
    NCL_Scan2(config, datasets_list, args.project_name, args.output_dir)
    NCL_Scan3(config, datasets_list, args.project_name, args.output_dir)
    NCL_Scan4(config, datasets_list, args.project_name, args.output_dir)

for NCLscan.config

###################
### Performance ###
###################

## Parameters for bwa mem
## The number of threads
bwa-mem-t = 10

## Parameters for mp_blat.py
## The number of processes for running blat
##
## NOTE: The memory usage of each blat process would be up to 4 GB!
##
mp_blat_process = 10

However, my CPU only used 20% when running novoalign...
Best,

chiangtw · 2020-08-17T02:13:57Z

Hi,

However, my CPU only used 20% when running novoalign...

Did you have your novoalign license under the same path with the novoalign program?

Thanks,
tw

JianGuoZhou3 · 2020-08-17T06:07:20Z

Hi,

However, my CPU only used 20% when running novoalign...

Did you have your novoalign license under the same path with the novoalign program?

Thanks,
tw

Hi tw,
I already have the license of novoalign. And I can use novoalign v4.
Best,

chiangtw · 2020-08-18T11:24:45Z

Hi,

Hi,

However, my CPU only used 20% when running novoalign...

Did you have your novoalign license under the same path with the novoalign program?
Thanks,
tw

Hi tw,
I already have the license of novoalign. And I can use novoalign v4.
Best,

You're right! There is no "non-profit" mode for novoalign V4, it must be run with a license.

But usually, novoalign should use as many CPU as possible for its mapping job.

And, to be honest, I don't have any ideas about the issue right now.

Best,
tw

JianGuoZhou3 · 2020-08-18T11:30:39Z

I mean I have the license if novoalign v4, but you suggest used novoalign v3.
As Colin Hercus, whom works at Novocraft said that "That's excellent. V4 should also work with the license file and is 3 to 4 times faster".
But, you recommend is use v3.
So, v4 is works for NCLscan?
Best,

JianGuoZhou3 · 2020-08-18T11:34:02Z

I mean I have the license if novoalign v4, but you suggest used novoalign v3. As Colin Hercus, whom works at Novocraft said that "That's excellent. V4 should also work with the license file and is 3 to 4 times faster". But, you recommend is use v3. So, v4 is works for NCLscan? Best Jian-Guo Zhou MD Tai-Wei Chiang <notifications@github.com> 于 2020年8月18日星期二 GMT+2 下午01:25:00写道： Hi, Hi, However, my CPU only used 20% when running novoalign... Did you have your novoalign license under the same path with the novoalign program? Thanks, tw Hi tw, I already have the license of novoalign. And I can use novoalign v4. Best, You're right! There is no "non-profit" mode for novoalign V4, it must be run with a license. But usually, novoalign should use as many CPU as possible for its mapping job. And, to be honest, I don't have any ideas about the issue right now. Best, tw — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

chiangtw · 2020-08-19T03:02:20Z

Hi,
There is a behaviour change of the parameter "--pechimera" of novoalign V4 ,
and this change would cause an error in the NCLscan pipeline.

So, for now, we still suggest the users to use novoalign V3 with NCLscan.

Please refer to issue #21 .

Thanks,
tw

JianGuoZhou3 · 2020-08-19T12:12:13Z

For sure, I understand.
But, if you have free time, please try the novoalign V4, because of this more faster.
Now, I have to used more than 4h per samples.
Best,

JianGuoZhou3 closed this as completed Aug 12, 2020

JianGuoZhou3 reopened this Aug 13, 2020

JianGuoZhou3 closed this as completed Mar 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: Cannot open input sequence files. output/clean.main.unmapped_1.fastq output/clean.main.unmapped_2.fastq #24

Error: Cannot open input sequence files. output/clean.main.unmapped_1.fastq output/clean.main.unmapped_2.fastq #24

JianGuoZhou3 commented Aug 9, 2020

JianGuoZhou3 commented Aug 9, 2020

chiangtw commented Aug 10, 2020

JianGuoZhou3 commented Aug 10, 2020

JianGuoZhou3 commented Aug 11, 2020 •

edited

Loading

chiangtw commented Aug 12, 2020 •

edited

Loading

JianGuoZhou3 commented Aug 12, 2020

JianGuoZhou3 commented Aug 12, 2020 via email

chiangtw commented Aug 13, 2020

JianGuoZhou3 commented Aug 13, 2020

JianGuoZhou3 commented Aug 13, 2020

JianGuoZhou3 commented Aug 13, 2020

chiangtw commented Aug 17, 2020

JianGuoZhou3 commented Aug 17, 2020 •

edited

Loading

chiangtw commented Aug 18, 2020

JianGuoZhou3 commented Aug 18, 2020

JianGuoZhou3 commented Aug 18, 2020 via email

chiangtw commented Aug 19, 2020

JianGuoZhou3 commented Aug 19, 2020

Error: Cannot open input sequence files. output/clean.main.unmapped_1.fastq output/clean.main.unmapped_2.fastq #24

Error: Cannot open input sequence files. output/clean.main.unmapped_1.fastq output/clean.main.unmapped_2.fastq #24

Comments

JianGuoZhou3 commented Aug 9, 2020

novoindex (4.2) - Universal k-mer index constructor.

(C) 2008 - 2011 NovoCraft Technologies Sdn Bhd

novoindex output/clean.JS.ndx output/clean.JS.fa

Creating 23 indexing threads.

Building with 9-mer and step of 1 bp.

novoindex construction dT = 0.0s

Index memory size 0.001Gbyte.

Done.

novoindex (4.2) - Universal k-mer index constructor.

(C) 2008 - 2011 NovoCraft Technologies Sdn Bhd

novoindex output/clean.JS2.ndx output/clean.JS2.fa

Creating 23 indexing threads.

Building with 9-mer and step of 1 bp.

novoindex construction dT = 0.0s

Index memory size 0.001Gbyte.

Done.

JianGuoZhou3 commented Aug 9, 2020

chiangtw commented Aug 10, 2020

JianGuoZhou3 commented Aug 10, 2020

JianGuoZhou3 commented Aug 11, 2020 • edited Loading

chiangtw commented Aug 12, 2020 • edited Loading

JianGuoZhou3 commented Aug 12, 2020

JianGuoZhou3 commented Aug 12, 2020 via email

chiangtw commented Aug 13, 2020

JianGuoZhou3 commented Aug 13, 2020

JianGuoZhou3 commented Aug 13, 2020

JianGuoZhou3 commented Aug 13, 2020

chiangtw commented Aug 17, 2020

JianGuoZhou3 commented Aug 17, 2020 • edited Loading

chiangtw commented Aug 18, 2020

JianGuoZhou3 commented Aug 18, 2020

JianGuoZhou3 commented Aug 18, 2020 via email

chiangtw commented Aug 19, 2020

JianGuoZhou3 commented Aug 19, 2020

JianGuoZhou3 commented Aug 11, 2020 •

edited

Loading

chiangtw commented Aug 12, 2020 •

edited

Loading

JianGuoZhou3 commented Aug 17, 2020 •

edited

Loading