Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMRfind generates thousands of chunk.tsv files #94

Open
MeganAdler opened this issue Aug 26, 2024 · 1 comment
Open

DMRfind generates thousands of chunk.tsv files #94

MeganAdler opened this issue Aug 26, 2024 · 1 comment

Comments

@MeganAdler
Copy link

Hi Yupeng,

I'm running into some issues with the results generated by DMRfind. I have EM-seq reads from two different organisms to use in my DMR analysis. I've run DMRfind (methylpy 1.4.7) on Arabidopsis with the simplified command below with success:

methylpy DMRfind \
	--allc-files /allc_Arabidopsis_1_merged.tsv.gz /allc_Arabidopsis_2_merged.tsv.gz
	--samples A1 A2 \
	--mc-type "CGN" \
	--chroms 1 2 3 4 5 \
	--num-procs 10 \
	--output-prefix /results/CGN_DMR_A1_A2_merged

Results:

  • CGN_DMR_A1_A2_merged_rms_results.tsv.gz
  • CGN_DMR_A1_A2_merged_rms_results_collapsed.tsv
  • CGN_DMR_A1_A2_merged_rms_results_collapsed.tsv.DMR.bed
  • CGN_DMR_A1_A2_merged_rms_results_collapsed.tsv.DMS.bed

However, running this similar command on another organism (with a scaffold genome) led to the generation of thousands of scaffold#_chunk#.tsv files in the results directory:

methylpy DMRfind \
	--allc-files /allc_Sample_1_merged.tsv.gz /allc_Sample_2_merged.tsv.gz
	--samples S1 S2 \
	--mc-type "CGN" \
	--num-procs 10 \
	--output-prefix /results/CGN_DMR_S1_S2_merged

Results:

  • CGN_DMR_S1_S2_merged_rms_results.tsv.gz
  • CGN_DMR_S1_S2_merged_rms_results_collapsed.tsv
  • CGN_DMR_S1_S2_merged_rms_results_collapsed.tsv.DMR.bed
  • CGN_DMR_S1_S2_merged_rms_results_collapsed.tsv.DMS.bed
  • CGN_DMR_S1_S2_merged_rms_results_for_organism_scaffold1_chunk_0.tsv
  • CGN_DMR_S1_S2_merged_rms_results_for_organism_scaffold1_chunk_1.tsv
  • CGN_DMR_S1_S2_merged_rms_results_for_organism_scaffold1_chunk_2.tsv
  • thousands more of the rms chunk files

It's interesting because there was no obvious error in the output file and some of the DMRs seem to have compiled in the top four files.

Thank you for your help, and please let me know if you need any more information for troubleshooting this issue.

@yupenghe
Copy link
Owner

yupenghe commented Sep 4, 2024

Yes the behavior of generating thousands of chunk files is expected. The chunk files are expected to be removed at the end of the run but it does not seem to be the case. I don't know what went wrong. One potential explanation is that the program died without error message. If you run it again, do you still see the chunk files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants