You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is related to #499. In short, it appears that somewhere along the way, the length scaled gene counts are somehow lost and the output for the salmon.merged.gene_counts_length_scaled.rds file is practically identical to the salmon.merged.gene_counts.rds (or tsv) file with the exception of minor rounding differences. These counts are then used in the eventual DESeq2 dds output as well.
This can be seen with the full test output from the docs:
This bug is also present at the sample-specific level, so it's not occurring when all the samples are merged:
The same issue is present in the salmon.merged.gene_tpm and salmon.merged.gene_tpm_length_scaled files:
Expected behaviour
The raw salmon output is fine, and importing via tximport(files, countsFromAbundance = "lengthScaledTPM", tx2gene = tx2gene) results in slightly different output as expected. The tx2gene file isn't available on the website for the test results, but I have confirmed this with my own data:
countsFromAbundance
character, either "no" (default), "scaledTPM", or "lengthScaledTPM", for whether to generate estimated counts using abundance estimates scaled up to library size (scaledTPM) or additionally scaled using the average transcript length over samples and the library size (lengthScaledTPM). if using scaledTPM or lengthScaledTPM, then the counts are no longer correlated with average transcript length, and so the length offset matrix should not be used.
So the lengthScaledTPM values depend on the other samples and should not be read in individually as is currently being done.
j-andrews7
changed the title
salmon.merged.gene_counts and salmon.merged.gene_counts_length_scaled output practically identical
salmon output must not be read in individually when countsFromAbundance = "lengthScaledTPM"
Jan 24, 2021
This is related to #499. In short, it appears that somewhere along the way, the length scaled gene counts are somehow lost and the output for the
salmon.merged.gene_counts_length_scaled.rds
file is practically identical to thesalmon.merged.gene_counts.rds
(or tsv) file with the exception of minor rounding differences. These counts are then used in the eventual DESeq2 dds output as well.This can be seen with the full test output from the docs:
This bug is also present at the sample-specific level, so it's not occurring when all the samples are merged:
The same issue is present in the
salmon.merged.gene_tpm
andsalmon.merged.gene_tpm_length_scaled
files:Expected behaviour
The raw salmon output is fine, and importing via
tximport(files, countsFromAbundance = "lengthScaledTPM", tx2gene = tx2gene)
results in slightly different output as expected. The tx2gene file isn't available on the website for the test results, but I have confirmed this with my own data:System
Nextflow Installation
nextflow-20.12.0-edge-all
Container engine
The text was updated successfully, but these errors were encountered: