Question about circFilter.py #78

JunmingH · 2020-03-16T13:36:12Z

Hi,

I am trying to run DCC but always stuck at circFilter.py
The error message is below.
Traceback (most recent call last):
File "/circu_RNA/DCC-0.4.8/DCC/main.py", line 842, in
main()
File "/circu_RNA/DCC-0.4.8/DCC/main.py", line 375, in main
filt.filter_nonrep(rep_file, indx0, count0)
File "/circu_RNA/DCC-0.4.8/DCC/circFilter.py", line 110, in filter_nonrep
nonrep = np.column_stack((indx0, count0))
File "/share/pkg.7/python2/2.7.16/install/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 640, in column_stack
return _nx.concatenate(arrays, 1)
MemoryError

Could you please give me some ideas about this?

BTW is that possible it spends two days on combining those results together? Since each time it takes a very long time to combine, but still, have a memory error. I only use 4 threads and request 252 GB.

Best

tjakobi · 2020-03-19T22:17:09Z

Dear @JunmingH,

would you have the complete command line that triggered this error?

Regarding the time constraints - two days seems pretty long. But the time required depends on data and command line used.

Cheers,
Tobias

JunmingH · 2020-03-19T22:21:35Z

Hi Tobias,

Attached is the script. I ran it with around 300 samples. But there is no chance to run it completly.

/DCC-0.4.7/DCC/main.py @DCC_InputFiles/samplesheet_BM10 -T 2 -D -N -R /ref/GRCh38_Repeats_simpleRepeats_RepeatMasker.gtf -an
/ref/gencode.v26.primary_assembly.annotation.gtf -F -M -Nr 1 1 -fg -G -A /ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa -O /MSBB_rep/BM10 -B @DCC_InputFiles/bam_files_BM10 -t MSBB_rep/tmp_dir/BM10

tjakobi · 2020-03-20T09:13:28Z

Hi @JunmingH,

I don't think I ever ran DCC on such a large number of sample - therefore the merging might indeed take a lot of time.

However, you might tune some of the parameters to reduce the time:

-T does not apply to the merging step, that part is single threaded, but you might want to increase the number of threads to speed up the rest of the computation
-Nr 1 1 is a very low threshold that will yield a very high number of false positives, i.e. its enough if a circle is only seen once with one BSJ read in one sample. I would recommend something like --Nr 2,5 or even -Nr 5,10 - meaning 2 reads in at least 5 samples or 5 reads in at least 10. THis also should speed up the merging.
The memory error is probably related to the extremely huge spare matrix that contains mostly zeros holding circRNA counts

Let me know if the new parameters can address your issue.

Cheers,
Tobias

JunmingH · 2020-03-20T14:19:42Z

Hi Tobias,

One of my scripts is successful. It used 400 Gb memory for filter function. But it still stuck at a place.

I am not sure which part it is:

The error message is :
"[E::faidx_adjust_position] The sequence "chr1" not found
[E::faidx_adjust_position] The sequence "chr1" not found
[E::faidx_adjust_position] The sequence "chr1" not found
[E::faidx_adjust_position] The sequence "chr1" not found
[E::faidx_adjust_position] The sequence "chr1" not found
........."
Have you ever see this before?

JunmingH · 2020-03-20T14:19:59Z

BTW very appreciate for your help!

tjakobi · 2020-03-21T10:27:03Z

Dear @JunmingH,

You may want to create a new index for your genome fasta file.

Try samtools faidx genome.fasta in the folder of the genome file. Maybe the index is corrupted or incomplete.

Cheers,
Tobias

JunmingH · 2020-03-21T11:37:11Z

Do you mean -A /ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa this file?

tjakobi · 2020-03-21T18:38:21Z

Yes, exactly.

JunmingH · 2020-03-24T14:12:03Z

Hi Tobias,

Is this warning normal?, I got this when doing linear counting

2020-03-24 09:52:35,105 WARNING: circRNA start position ('chr18', '45196357') does not have mapped read counts, treated as 0

gnilihzeux · 2020-04-13T07:27:47Z

Hi, @tjakobi , I'm stuck with the same part. My error information is as follows:

Filtering by read counts
Count CircSkip junctions
Traceback (most recent call last):
  File "/usr/bin/DCC", line 11, in <module>
    load_entry_point('DCC==0.4.7', 'console_scripts', 'DCC')()
  File "build/bdist.linux-x86_64/egg/DCC/main.py", line 472, in main
  File "build/bdist.linux-x86_64/egg/DCC/main.py", line 669, in findCircSkipJunction
  File "build/bdist.linux-x86_64/egg/DCC/Circ_nonCirc_Exon_Match.py", line 281, in findcircAdjacent
  File "build/bdist.linux-x86_64/egg/DCC/Circ_nonCirc_Exon_Match.py", line 224, in getAdjacent
ValueError: invalid literal for int() with base 10: '1_3182'
started circRNA detection from file

I have no idea that whether if my gtf annotation, which was downloaded from NCBI, is not right, my gtf:

##gff-version 3
##source-version rtracklayer 1.46.0
##date 2020-04-13
NC_003977.2     RefSeq  region  1       3182    .       +       .       ID=NC_003977.2:1_3182;Dbxref=taxon:10407;Is_circular=true;gbkey=Src;genome=genomic;mol_type=genomic DNA;strain=ayw;
NC_003977.2     RefSeq  gene    1376    1840    .       +       .       ID=gene-HBVgp3;Dbxref=GeneID:944566;gbkey=Gene;Name=X;gene=X;gene_biotype=protein_coding;locus_tag=HBVgp3;
NC_003977.2     RefSeq  CDS     1376    1840    .       +       0       ID=cds-YP_009173867.1;Dbxref=Genbank:YP_009173867.1,GeneID:944566;gbkey=CDS;Name=YP_009173867.1;gene=X;locus_tag=HBVgp3;Parent=gene-HBVgp3;product=X protein;protein_id=YP_009173867.1;
NC_003977.2     RefSeq  regulatory_region       1592    1602    .       +       .       ID=id-NC_003977.2:1592_1602;gbkey=regulatory;Note=direct repeat sequence 2 (DR2)%3b cis-acting sequence which participates in synthesis of minus-strand DNA
NC_003977.2     RefSeq  regulatory_region       1775    1794    .       +       .       ID=id-NC_003977.2:1775_1794;gbkey=regulatory;Note=phi%3b cis-acting sequence which participates in synthesis of minus-strand DNA
NC_003977.2     RefSeq  sequence_difference     1775    1775    .       -       .       ID=id-NC_003977.2:1775_1775;gbkey=misc_difference;Note=G is A in ref[2]%3b~conflict
NC_003977.2     RefSeq  gene    1816    2454    .       +       .       ID=gene-HBVgp4;Dbxref=GeneID:944568;gbkey=Gene;Name=C;gene=C;gene_biotype=protein_coding;locus_tag=HBVgp4;

But CircCoordinates and CircCount were still generated. Two separate chunks not affect each other ?

Thanks for your time.

JunmingH · 2020-04-13T13:20:31Z

Try ucsc annotation file. This one did not have chromosome number.
Junming

gnilihzeux · 2020-04-14T00:38:44Z

@JunmingH Thanks. It's a genome of Hepatitis B virus and there is no information in UCSC.

tjakobi · 2020-04-17T09:36:46Z

Hi, sorry for the delayed response.

There might be an issue with the format of the 9th column, and DCC running into problems gathering data from it. DCC normally uses GTF files and you are providing a GFF3 file, which might result in problems. You might want to try using a GTF formatted file just to rule this possibility out. See https://www.biostars.org/p/99462/ for details on the differences.

gnilihzeux · 2020-04-23T00:43:53Z

Thanks, it is indeed a problem of GTF.

tjakobi self-assigned this Mar 19, 2020

tjakobi added bug question labels Mar 19, 2020

kopardev mentioned this issue Dec 15, 2022

DCC error CCBR/CHARLIE#43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about circFilter.py #78

Question about circFilter.py #78

JunmingH commented Mar 16, 2020

tjakobi commented Mar 19, 2020

JunmingH commented Mar 19, 2020

tjakobi commented Mar 20, 2020

JunmingH commented Mar 20, 2020

JunmingH commented Mar 20, 2020

tjakobi commented Mar 21, 2020

JunmingH commented Mar 21, 2020

tjakobi commented Mar 21, 2020

JunmingH commented Mar 24, 2020

gnilihzeux commented Apr 13, 2020

JunmingH commented Apr 13, 2020

gnilihzeux commented Apr 14, 2020

tjakobi commented Apr 17, 2020

gnilihzeux commented Apr 23, 2020

Question about circFilter.py #78

Question about circFilter.py #78

Comments

JunmingH commented Mar 16, 2020

tjakobi commented Mar 19, 2020

JunmingH commented Mar 19, 2020

tjakobi commented Mar 20, 2020

JunmingH commented Mar 20, 2020

JunmingH commented Mar 20, 2020

tjakobi commented Mar 21, 2020

JunmingH commented Mar 21, 2020

tjakobi commented Mar 21, 2020

JunmingH commented Mar 24, 2020

gnilihzeux commented Apr 13, 2020

JunmingH commented Apr 13, 2020

gnilihzeux commented Apr 14, 2020

tjakobi commented Apr 17, 2020

gnilihzeux commented Apr 23, 2020