Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some TADs are offset/don't align visually #13

Open
Naveen-Ahuja opened this issue Jan 8, 2024 · 5 comments
Open

Some TADs are offset/don't align visually #13

Naveen-Ahuja opened this issue Jan 8, 2024 · 5 comments

Comments

@Naveen-Ahuja
Copy link

Naveen-Ahuja commented Jan 8, 2024

Hi,

Spectral works really well but in some regions the TADs are offset or don't align well with what one would expect when visually inspecting them (see attached image). I wanted to know if you have any advice on fixing this or any particular parameter I should play with to fix this.

Screenshot 2024-01-08 at 6 29 34 PM

Here is how I called the TADs:

for (chr in chromosomes) {
out_path <- paste0("ctrl_p14_", chr, ".bedpe")
ctrl_p14<- strawr::straw("NONE", "/Users", chr, chr, "BP", 50000)

ctrl_p14_tad = SpectralTAD(ctrl_p14, chr = chr, resolution = 50000, levels = 3, qual_filter = FALSE, z_clust = FALSE, out_format= "juicebox", out_path=out_path)
}

Thank you

@mdozmorov
Copy link
Contributor

Hi @Naveen-Ahuja, thanks for reporting. This may happen due to data sparsity. I suggest two solutions:

  1. Set levels = 1. First-level TADs are typically the most robust.
  2. Set qual_filter = TRUE, z_clust = TRUE. This put additional checks on the quality of TADs. In fact, we now use these settings by default. We need to change the defaults on Bioconductor.

If you would like to make a more thorough test, here is our code checking the effect of various combinations of parameters. As said, qual_filter = TRUE, z_clust = TRUE should work best.

# SpectralTAD settings
qual_filter = TRUE; z_clust = TRUE # Silhouette score filtering
# qual_filter = FALSE; z_clust = TRUE # Z-score filtering
# qual_filter = FALSE; z_clust = FALSE # Mixed filtering
max_tad_size <- 2000000; window_size <- max_tad_size / resolution
gap_threshold <- 0.8
# Save directory
save_dir_tads <- file.path(save_dir, paste0(s, "_", normalization, "_", resolution, "_", ifelse(qual_filter, "qualT", "qualF"), "_", ifelse(z_clust, "zT", "zF"), "_", window_size, "_", gap_threshold))
if (!dir.exists(save_dir_tads)) {
  dir.create(save_dir_tads, recursive = TRUE)
}

for (chr in chromosomes) {
  # Input file
  fileNameIn <- file.path(data_dir, sample, paste0(s, "_", chr, "_", normalization, "_", resolution, ".txt.gz"))
  mtx <- fread(fileNameIn)
  # Output file
  fileNameOut <- file.path(save_dir_tads, paste0(chr, ".bedpe"))
  # SpectralTAD run depending on parameters
  SpectralTAD(mtx, chr = chr, levels = 1, qual_filter = qual_filter, z_clust = z_clust, window_size = window_size, resolution = resolution, gap_threshold = gap_threshold, grange = FALSE, out_format = "bedpe", out_path = fileNameOut)
}

@mdozmorov mdozmorov mentioned this issue Aug 6, 2024
@mdozmorov
Copy link
Contributor

@Naveen-Ahuja, can you make a minimal reproducible example? You can share data subset privately. Your results do look confusing, we never encountered them and I would dig deeper.

@gorliver
Copy link

gorliver commented Aug 7, 2024

Hi @Naveen-Ahuja, thanks for reporting. This may happen due to data sparsity. I suggest two solutions:

  1. Set levels = 1. First-level TADs are typically the most robust.
  2. Set qual_filter = TRUE, z_clust = TRUE. This put additional checks on the quality of TADs. In fact, we now use these settings by default. We need to change the defaults on Bioconductor.

If you would like to make a more thorough test, here is our code checking the effect of various combinations of parameters. As said, qual_filter = TRUE, z_clust = TRUE should work best.

# SpectralTAD settings
qual_filter = TRUE; z_clust = TRUE # Silhouette score filtering
# qual_filter = FALSE; z_clust = TRUE # Z-score filtering
# qual_filter = FALSE; z_clust = FALSE # Mixed filtering
max_tad_size <- 2000000; window_size <- max_tad_size / resolution
gap_threshold <- 0.8
# Save directory
save_dir_tads <- file.path(save_dir, paste0(s, "_", normalization, "_", resolution, "_", ifelse(qual_filter, "qualT", "qualF"), "_", ifelse(z_clust, "zT", "zF"), "_", window_size, "_", gap_threshold))
if (!dir.exists(save_dir_tads)) {
  dir.create(save_dir_tads, recursive = TRUE)
}

for (chr in chromosomes) {
  # Input file
  fileNameIn <- file.path(data_dir, sample, paste0(s, "_", chr, "_", normalization, "_", resolution, ".txt.gz"))
  mtx <- fread(fileNameIn)
  # Output file
  fileNameOut <- file.path(save_dir_tads, paste0(chr, ".bedpe"))
  # SpectralTAD run depending on parameters
  SpectralTAD(mtx, chr = chr, levels = 1, qual_filter = qual_filter, z_clust = z_clust, window_size = window_size, resolution = resolution, gap_threshold = gap_threshold, grange = FALSE, out_format = "bedpe", out_path = fileNameOut)
}

Hi @mdozmorov, Thanks for the code. Would you please explain the parameter "normalization"? From what I understand, this parameter is for matrix generation, not for spectralTAD?

@mdozmorov
Copy link
Contributor

This code is inspirational, shouldn't be used verbatim. Yes, "normalization" refers to our data preprocessing, it is not related to SpectralTAD. Most important are qual_filter = TRUE; z_clust = TRUE; max_tad_size <- 2000000; window_size <- max_tad_size / resolution; gap_threshold <- 0.8

@Naveen-Ahuja
Copy link
Author

@mdozmorov sorry for the delay, I had to get permission to share the data. Where can I privately share with you my hic matrix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants