Skip to content
Niklas Birth edited this page Mar 11, 2024 · 2 revisions

CoCoPyE has three different output modes: standard, extended and full. standard is more or less self-explanatory. extended contains some additional output which might occasionally be useful. full requires some understanding of CoCoPyEs operating principles and is mainly intended for debugging purposes.

You can set the output mode with the -v/--verbosity argument (e.g. cocopye run -i ... -o ... -v extended). The output format is always csv.

The following paragraphs describe the columns that are present in the outfile.

Standard output

  • bin: Name of the input bin (filename excluding file extension)
  • completeness: Completeness value between 0 and 1
  • contamination: Contamination value between 0 and 1
  • method: Either markers or markers + neural network
  • taxonomy: Taxonomy estimate based on a consensus between the nearest neighbors
  • taxonomy_level: Rank of the taxonomy estimate
  • notes: Additional notes (currently this is always empty, but we might add some notes in the future)

Extended output

This extends the standard output by the following lines:

  • stage: Value between 1 and 3; a higher value means that the result is more accurate. (CoCoPyE has three stages. A higher stage leads to a more accurate result, but not all input bins are suitable for all stages.)
  • num_markers: Number of markers that were used for the estimate in stage 2. A higher value could indicate a more precise prediction (needs verification).
  • coding_density: Total Pfam count divided by the bin size (value between 0 and 1)
  • knn_score: Similarity to the nearest neighbors (value between 0 and 1)

Full output

  • bin: same as above
  • stage: same as above
  • method: same as above
  • 1_completeness_arc: Stage 1 completeness estimate based on archea markers
  • 1_contamination_arc: Stage 1 contamination estimate based on archea markers
  • 1_completeness_bac: Stage 1 completeness estimate based on bacteria markers
  • 1_contamination_bac: Stage 1 contamination estimate based on bacteria markers
  • 2_completeness: Stage 2 completeness estimate
  • 2_contamination: Stage 2 contamination estimate
  • 2_num_markers: same as num_markers in extended output
  • 3_completeness: Stage 3 completeness estimate
  • 3_contamination: Stage contaminateion estimate
  • coding_density: same as above
  • knn_score: same as above
  • taxonomy: same as above
  • taxonomy_level: same as above
  • notes: same as above
Clone this wiki locally