Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine the results #72

Closed
JunmingH opened this issue Nov 18, 2019 · 11 comments
Closed

Combine the results #72

JunmingH opened this issue Nov 18, 2019 · 11 comments
Labels

Comments

@JunmingH
Copy link

Hi I was trying to combine the results together. since there have have issue when I integrate those bam files together. Right now I have CircCoordinates CircRNACount CircSkipJunctions LinearCount Four files for each subjects. I was wondering how could I cimbine each subjects together. using which column to match them?
Thanks!

@tjakobi
Copy link
Contributor

tjakobi commented Nov 24, 2019

Hi @JunmingH,

you would have to combing the different output files, into one set of files with multiple columns for each of the samples. However, the rows will be different, too, since not all circRNAs will be detect in each sample. I would recommend to try to get all sample processed by DCC in one run, possibly with -T 2 or 2 to not create too much CPU and memory load.

Cheers,
Tobias

@JunmingH
Copy link
Author

Hi @tjakobi Tobias,

I was trying to using server to processing the data but still gave me the error,
Traceback (most recent call last):
File "/restricted/projectnb/casa/jmh/RNA-seq/circu_RNA/DCC-0.4.7/DCC/main.py", line 818, in
main()
File "DCC-0.4.7/DCC/main.py", line 254, in main
minL=options.min, strand=False, pairdendindependent=False, same=same), Input)
File "/share/pkg.7/python2/2.7.16/install/lib/python2.7/multiprocessing/pool.py", line 253, in map
return self.map_async(func, iterable, chunksize).get()
File "/share/pkg.7/python2/2.7.16/install/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
IndexError: list index out of range

@tjakobi
Copy link
Contributor

tjakobi commented Nov 25, 2019

Could you please attach the log file of that DCC run?

@JunmingH
Copy link
Author

@JunmingH
Copy link
Author

@tjakobi Can you give me some idea for this?

@tjakobi
Copy link
Contributor

tjakobi commented Nov 26, 2019

Hi @JunmingH,

I am relatively sure that your command line is not correct, see the following error:

DDC2_o.txt:     => locating circRNAs (unstranded mode) [/restricted/projectnb/casa/jmh/RNA-seq/circu_RNA/script/samplesheet]
DDC2_o.txt:WARNING: File /restricted/projectnb/casa/jmh/RNA-seq/circu_RNA/script/samplesheet, line 2 does not contain all features.
DDC2_o.txt:WARNING: /restricted/projectnb/casa/jmh/RNA-seq/circu_RNA/script/samplesheet is probably corrupt.

Here the Junctions files should be scanned, not the samplesheet.

Can you please provide your complete command line?

Cheers,
Tobias

@JunmingH
Copy link
Author

@tjakobi Hi Tobias,

Attached is~
python2 ${app_dir}/main.py @samplesheet
-D -N -R ${gtf_dir}/GRCh38_Repeats_simpleRepeats_RepeatMasker.gtf
-an ref/GRCh38/annotation/Homo_sapiens.GRCh38.95.gtf
-F -M -Nr 1 1 -fg -G -A ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa
-T 2 -O /dcc_all_results/
-B @bam_files

@tjakobi
Copy link
Contributor

tjakobi commented Nov 26, 2019

The samplesheet and the command line look okay - however DCC seems to think there is only one input file called samplesheet.

Your command line is not the command line that DCC itself prints out, do you have the complete DCC log, i.e. DCC-2019***.log? That log file contains the actuall command line DCC sees.

Cheers,
Tobias

@JunmingH
Copy link
Author

@tjakobi Sure,
Attached is
2019-11-24 14:54:23,207 DCC 0.4.7 started
2019-11-24 14:54:23,207 DCC command line: /jmh/RNA-seq/circu_RNA/DCC-0.4.7/DCC/main.py /jmh/RNA-seq/circu_RNA/script/samplesheet -D -N -R jmh/RNA-seq/circu_RNA/script/ref/GRCh38_Repeats_simpleRepeats_RepeatMasker.gtf -an /jmh/ref/GRCh38/annotation/Homo_sapiens.GRCh38.95.gtf -F -M -Nr 1 1 -fg -G -A /bu_brain_rnaseq/hjm_test/step_by_step/ref_RSEM/ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa -T 2 -O /jmh/RNA-seq/circu_RNA/dcc_all_results/ -B /jmh/RNA-seq/circu_RNA/script/bam_files
2019-11-24 14:54:23,422 Starting to detect circRNAs
2019-11-24 14:54:23,422 Non-stranded data, the strand of circRNAs guessed from the strand of host genes
2019-11-24 14:54:23,423 started circRNA detection from file /jmh/RNA-seq/circu_RNA/script/samplesheet

@tjakobi
Copy link
Contributor

tjakobi commented Nov 26, 2019

Hi @JunmingH,

from the log file you can see that the actual command line is

jmh/RNA-seq/circu_RNA/DCC-0.4.7/DCC/main.py /jmh/RNA-seq/circu_RNA/script/samplesheet

While it should be

jmh/RNA-seq/circu_RNA/DCC-0.4.7/DCC/main.py @/jmh/RNA-seq/circu_RNA/script/samplesheet

The @ for the input is missing.

Cheers,
Tobias

@JunmingH
Copy link
Author

Hi @tjakobi Tobias

Thanks for your help! It's working right now!

@tjakobi tjakobi closed this as completed Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants