Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mosdepth missing values imputed #1810

Merged
merged 6 commits into from
Dec 30, 2022

Conversation

adamrtalbot
Copy link
Contributor

@adamrtalbot adamrtalbot commented Dec 7, 2022

Changes:

  • Mosdepth output has missing values in *.{region,global}.dist.txt
  • This change to module fills any missing values with the next value
  • e.g., if there is 100% at 100X and 80% at 80X, the value at 90X will be recorded as 80X
  • Previously, this would have been recorded at 0%.
  • This may underestimate coverage slightly but it's not clear from MosDepth docs how it should be handled.
  • See .mosdepth.region.dist.txt missing values? brentp/mosdepth#190

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md has been updated

Changes:
 - Mosdepth output has missing values in *.{region,global}.dist.txt
 - This change to module fills any missing values with the next value
 - e.g., if there is 100% at 100X and 80% at 80X, the value at 90X will be recorded as 80X
 - This may underestimate coverage slightly but it's not clear from MosDepth docs how it should be handled.
 - See brentp/mosdepth#190
@adamrtalbot
Copy link
Contributor Author

Previous general stats table:
image

New general stats table:
image

@ewels
Copy link
Member

ewels commented Dec 7, 2022

Thanks! Speed thought: generally I try to avoid computing new data within MultiQC (plenty of exceptions abound). I agree that reporting 0.0% is just plain wrong, but what do you think about using None instead? (An empty table cell).

Won't be as pretty, but would be more true to the original data. This data is essentially the same as the plot lower in the report though, right? So prettiness shouldn't be super important? Or have I got that wrong?

mosdepth-cumcoverage-dist-id

@adamrtalbot
Copy link
Contributor Author

Yours looks like that, but mine looks like this:
image

So perhaps we should fix at the source isntead? It's interpreting missing values as zero somewhere, we should stop this?

@ewels
Copy link
Member

ewels commented Dec 7, 2022

aha! Yes, if we remove the 0s so that they're not there (or are None, might work) then the plot will not have those points and the line will be smooth.

@adamrtalbot
Copy link
Contributor Author

Oh hold on, different plots. I think the data in general stats is coming from my plot though.

@adamrtalbot
Copy link
Contributor Author

I added some code where it evaluates every integer in the coverage values, but I haven't pushed it because while it parses the data fine and solves the problem, it takes forever to write the file.

Unfortunately using None instead of zero creates a coverage of None%.

@ewels
Copy link
Member

ewels commented Dec 15, 2022

Unfortunately using None instead of zero creates a coverage of None%.

Hmm, how annoying. Can we just remove the keys if None then please. Should work fine if they keys are just absent.

@ewels ewels added the waiting: changes Issue / PR is on hold, waiting for requested changes label Dec 15, 2022
@adamrtalbot
Copy link
Contributor Author

Before and after comparison:
multiqc_mosdepth_update.zip

@ewels ewels removed the waiting: changes Issue / PR is on hold, waiting for requested changes label Dec 24, 2022
Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Just moved the log statement out of the loop as I was worried that there might be a lot per sample and it could make the debug log file huge.

Thanks for working on this and accepting feedback 👍🏻

@ewels
Copy link
Member

ewels commented Dec 24, 2022

Just before I merge - does this definitely fix the plot as well? The test data isn't affected like your screenshot above.. From a skim read of the code it now looks like it's only affecting values in the general stats table. Just want to check that the plot is good as well.

@adamrtalbot
Copy link
Contributor Author

What was the problem again?

I think the main problem was the general stats table. The other problem is the coverage plot which isn’t fixed but I can look into that in the new year (in a new PR?)

@ewels
Copy link
Member

ewels commented Dec 25, 2022

Sorry, had a typo - I was talking about the plot. If you have some example data kicking around I can take a quick look for a fix.. I'd prefer to do both in one PR if possible, just so that we don't forget.

@ewels
Copy link
Member

ewels commented Dec 30, 2022

Ok, let's merge - I'm hoping to get a release out in 2022 and it would be good to include this 😅

@ewels
Copy link
Member

ewels commented Dec 30, 2022

I dropped the log message, hope that's ok. Figured that it's fairly self explanatory behaviour..

@ewels ewels merged commit 90fffec into MultiQC:master Dec 30, 2022
@adamrtalbot
Copy link
Contributor Author

Thank you! I don’t have access to my laptop until next weekend so wouldn’t have been able to get this done for a while.

@ewels
Copy link
Member

ewels commented Dec 31, 2022

Very sensible! 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants