Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error generating CE flat file - Empty Divergent Region Bed #8

Open
mckeowr1 opened this issue Apr 7, 2023 · 0 comments
Open

Error generating CE flat file - Empty Divergent Region Bed #8

mckeowr1 opened this issue Apr 7, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@mckeowr1
Copy link
Contributor

mckeowr1 commented Apr 7, 2023

The nf pipeline terminated during the make_flat_file process. The Rscript failed and threw this erorr

  Joining, by = c("CHROM", "POS", "REF", "ALT", "ANNOTATION")
  Warning message:
  Expected 7 pieces. Missing pieces filled with `NA` in 427215 rows [3, 4, 5, 6, 67, 72, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 199, 200, 201, 356, ...]. 
  Error in `[.data.frame`(x, i, j, drop) : undefined columns selected
  Calls: %>% ... <Anonymous> -> fuzzy_join -> [ -> [.data.table -> [.data.frame
  In addition: Warning message:
  In data.table::fread(args[4], col.names = c("chrom", "start", "end")) :
    File 'div.bed' has size 0. Returning a NULL data.table.
  Execution halted

It looks like its happening here when we add the divergent region status for CE. This sense that this failure is CE specific.

if(args[5] == "c_elegans") {
    divergent <- data.table::fread(args[4], 
                                   col.names = c("chrom", "start", "end"))
    
    #Condense Overlapping Regions (If portion of regions shared by >1 strian
    condensed <- divergent %>%
      dplyr::mutate(DIVERGENT = "D") # Add marker to Divergent Region
    
    #Join the data - if a position is within a divergent region Divergent tag is added
    
    test <- table_ready %>%
      dplyr::mutate(start = POS, end = POS)
    join <- fuzzyjoin::genome_join(test, condensed, by = c(
      "CHROM" = "chrom", 
      "start" = "start", 
      "end" = "end"), 
      mode = "left") %>%
      dplyr::select(-start.x, -start.y, -end.x, -end.y, -chrom)
    
    # join <- fuzzyjoin::genome_join(table_ready, condensed,  by = c(
    #   "CHROM" = "chrom",
    #   "POS" = "start",
    #   "POS" = "end"),
    #   mode = "left") %>%
    #   select(-chrom, -start, -end)
} else {
    join <- table_ready %>%
        dplyr::mutate(DIVERGENT = NA)
}

It seems that we are just tring to join a NULL data.table (whatever that is) with our annotation and that just doens't work. This is probably a problem with the channel-inputs.

Input tuple for make_flat_file:

tuple file(bcsq_samples_parsed), file(div_file), file(bcsq_scores), file(wbgene_names)

Current channel wrangling

bcsq_parse_samples.out
.combine(bcsq_parse_scores.out)
.combine(Channel.fromPath("${reference_dir}/csq/${params.species}.gff")) | make_flat_file

output for bcsq_parse_samples

output:
    tuple file("BCSQ_samples_parsed.tsv"), file("div.bed")

output for bcsq_parse_scores

output:
    path "BCSQ_scores_parsed.tsv"

Our current input tuple should be:

file("BCSQ_samples_parsed.tsv"), file("div.bed") path "BCSQ_scores_parsed.tsv" Channel.fromPath("${reference_dir}/csq/${params.species}.gff"

So it looks like the input cardinality matches.

The other potential source of the issue is that we are not loading the divergent region file. Lets check if it was the correct file in the bcsq_parse_samples process....

Its also blank. Well thats because I didn't specify one. This is a bug that we should fix. The make_flat_file script automatically tries to join the divergent region bed, even if the file is blank when the species ID is C. elegans. We should just have the script check if it can load the data or not. If not then print an erorr message and append NA for the divergent column.

@mckeowr1 mckeowr1 added the bug Something isn't working label Apr 7, 2023
@mckeowr1 mckeowr1 self-assigned this Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant