Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picard HsMetrics module to show usable bases #831

Closed
pancheto opened this issue Sep 11, 2018 · 5 comments
Closed

Picard HsMetrics module to show usable bases #831

pancheto opened this issue Sep 11, 2018 · 5 comments

Comments

@pancheto
Copy link

pancheto commented Sep 11, 2018

Is your feature request related to a problem? Please describe.
Only 25 out of 57 Picard HsMetrics' reports are parsed by MultiQC. I see the need for keeping the table narrow, but I would like to decide which columns to parse, if not all, and not only be limited to display or hide them. In particular, I am interested in 2 columns (PCT_USABLE_BASES_ON_BAIT and PCT_USABLE_BASES_ON_TARGET) which are very informative to know how well a sequencing experiment performed.

By the way, the MAX_TARGET_COVERAGE column is being parsed although its header description (The maximum coverage of reads that mapped to target regions of an experiment.) is not being displayed.

Describe the solution you'd like
I would like to have a configuration option to add or remove columns from Picard HsMetrics module.

Describe alternatives you've considered
I have not find a way to show these 2 columns (or any other) but to edit HsMetrics.py module directly.

Additional context
Being these columns percentages printed out as fractions, there is an additional issue displaying them through multiqc since there is only room for 1 decimal place. I have modified their values by multiplying them by 100 manually, but being able to configure this would be also great.

Let me just add a big thank you for such a great tool.

@ewels
Copy link
Member

ewels commented Sep 11, 2018

Ho @pancheto,

Thanks for the suggestion! MultiQC is already parsing all of these fields but just not displaying them as you say. It should be possible to add a new user config option to overwrite the defaults and display the requested fields. I think we can refactor the code so that all fractional fields are automatically displayed properly as percentages.

Phil

@chadisaad
Copy link
Contributor

I'm not sure that they are used. They are removed her:
https://github.com/ewels/MultiQC/blob/master/multiqc/modules/picard/HsMetrics.py#L203

@ewels
Copy link
Member

ewels commented Apr 20, 2020

Hi both,

Thanks again for your issue / work on this. I've just refactored the code that generates this table so that it can be altered in the MultiQC config. I also added the two columns you wanted @pancheto, so that they are now shown by default.

The code should basically work as before, but now there are two new config options documented here: https://multiqc.info/docs/#hsmetrics

You can customise the columns shown in the HsMetrics table with the config keys HsMetrics_table_cols and HsMetrics_table_cols_hidden. For example:

picard_config:
    HsMetrics_table_cols:
        - NEAR_BAIT_BASES
        - OFF_BAIT_BASES
        - ON_BAIT_BASES
    HsMetrics_table_cols_hidden:
        - MAX_TARGET_COVERAGE
        - MEAN_BAIT_COVERAGE
        - MEAN_TARGET_COVERAGE

Only values listed in HsMetrics_table_cols will be included in the table.
Anything listed in HsMetrics_table_cols_hidden will be hidden by default.

Note: Loads of columns are still shown by default. It would be great if we could trim this list of defaults down quite a lot. Any suggestions for a minimum list of the most useful columns? At the moment the table is so wide that it is not very usable.

Phil

@jamigo
Copy link

jamigo commented Apr 20, 2020

thank you for the update! we'll be glad to use it in the next release.

regarding which columns to show, I agree that the table is too wide, but it's difficult to narrow it, since it helps both upstream and downstream analysts and they do require different table columns. we've reordered them to show the ones we're more interested in first, and that can be more than enough for most users.

@giorgiagandolfi
Copy link

Hi,
I created my custom config to select only specific columns to be parsed by Multiqc when running over HsMetrics output files. The custom cofig is written like this:

picard_config:
  HsMetrics_table_cols:
    - PCT_OFF_BAIT
    - PCT_SELECTED_BASES
    - PCT_PF_READS
    - PCT_PF_UQ_READS
    - PCT_PF_UQ_READS_ALIGNED
  HsMetrics_table_cols_hidden:
    - HET_SNP_SENSITIVITY
    - HET_SNP_Q
    - BAIT_DESIGN_EFFICIENCY
    - ON_BAIT_BASES
    - NEAR_BAIT_BASES
    - OFF_BAIT_BASES
    - ON_BAIT_VS_SELECTED
    - PF_READS
    - PF_BASES
    - PF_UNIQUE_READS
    - PF_UQ_READS_ALIGNED
    - PF_BASES_ALIGNED
    - PF_UQ_BASES_ALIGNED
    - ON_TARGET_BASES

table_cond_formatting_rules:
  mqc-picardhsmetrics-PCT_OFF_BAIT:
    pass:
      - lt: 0.20
    warn:
      - gt: 0.20
    fail:
      - gt: 0.30

And launched as follows:

multiqc hsmetrics/ --config myConfig.yaml

This was the error:

Module picard raised an exception: Traceback (most recent call last):
 File "/opt/common/tools/miniconda3/envs/my_env/lib/python3.6/site-packages/multiqc/multiqc.py", line 569, in run
   output = mod()
 File "/opt/common/tools/miniconda3/envs/my_env/lib/python3.6/site-packages/multiqc/modules/picard/picard.py", line 65, in __init__
   n['HsMetrics'] = HsMetrics.parse_reports(self)
 File "/opt/common/tools/miniconda3/envs/my_env/lib/python3.6/site-packages/multiqc/modules/picard/HsMetrics.py", line 156, in parse_reports
   covs = config.picard_config['general_stats_target_coverage']
KeyError: 'general_stats_target_coverage'

Thanks for help,
Giorgia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants