Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presenting cluster information in tables and plots #655

Closed
SarahAlidoost opened this issue Jun 2, 2023 · 4 comments
Closed

Presenting cluster information in tables and plots #655

SarahAlidoost opened this issue Jun 2, 2023 · 4 comments
Assignees
Labels
analysis/postprocessing related to the analysis/postprocessing of a haddock3 run bug Something isn't working community contributions from people outside the haddock team

Comments

@SarahAlidoost
Copy link
Contributor

The dataframes used for creating tables, scatter and box plots have three columns cluster-id, cluster-ranking and capri_rank. Here are two examples where there are Unclustered and Other groups in the dataframes:

      Cluster-id capri_rank cluster-ranking
0          -           1               -
1          -           1               -
2          -           1               -
3          -           1               -
4          -           1               -
        Cluster-id capri_rank  cluster-ranking
125      Other          11               11
129      Other          11               11
131      Other          11               11
92       Other          11               13
108      Other          11               13
109      Other          11               13
119      Other          11               13
111      Other          11               14
130      Other          11               14

The representation of data in plots and tables for these groups is not consistent. For example, a cluster with Cluster-id = "-" is called "Unclustered" in tables and in scatterplots whereas it is "-" in box plots and shown as capri_rank=1 in the x-axis of box plot.
Another example, a cluster with Cluster-id="Other" is called "Other" in scatter plots and box plots legends whereas they are shown with cluster-ranking=11, 13, 14 in tables whereas it is shown as capri_rank=11 in the x-axis of box plot.

See more:

0 0 0 0_8000_docking-protein-protein_run1-test-branch_analysis_4_caprieval_analysis_report html

0 0 0 0_8000_docking-protein-protein_run1-test-branch_analysis_4_caprieval_analysis_report html (1)

0 0 0 0_8000_docking-protein-protein_run1-test-branch_analysis_4_caprieval_analysis_report html (2)

0 0 0 0_8000_docking-antibody-antigen_run1-CDR-NMR-CSP-test_analysis_04_caprieval_analysis_report html

0 0 0 0_8000_docking-antibody-antigen_run1-CDR-NMR-CSP-test_analysis_04_caprieval_analysis_report html (1)

@mgiulini
Copy link
Contributor

mgiulini commented Jun 2, 2023

hey @SarahAlidoost can you be a bit more specific? is there anything unconsistent on the analysis side?
Unclustered is a label assigned the cluster_id of a model when there's no clustering data (cluster_id = -), while Other refers to all the clusters (combined together) with cluster rank higher than a threshold (default = 10). These two labels have a very different meaning.

@SarahAlidoost
Copy link
Contributor Author

hey @SarahAlidoost can you be a bit more specific? is there anything unconsistent on the analysis side?

only on plotting the results and not running the analysis. There is an inconsistency between the labels used in tables, scatters and box plots.

Unclustered is a label assigned the cluster_id of a model when there's no clustering data (cluster_id = -),

The label Unclustered is used as a header in the table and in the legend of the scatters whereas the label "-" is used in the legend of the box plots. In the table, the value of Cluster Rank is "-" while the x-axis of box plots shows capri_rank = 1.

while Other refers to all the clusters (combined together) with cluster rank higher than a threshold (default = 10). These two labels have a very different meaning.

This is another example of the inconsistency of labels. The label "Other" is used in the legend of scatter plots and box plots whereas there is no column "Other" in the table. In the table, there are columns with headers according to cluster-ranking=11, 13, 14. Also, there is no "Other" in the x-axis of box plots but instead, they are all shown as capri_rank=11 (as an example).

Please let me know if it is still unclear.

@amjjbonvin
Copy link
Member

amjjbonvin commented Jun 2, 2023 via email

@amjjbonvin
Copy link
Member

amjjbonvin commented Jun 2, 2023 via email

@rvhonorato rvhonorato added bug Something isn't working analysis/postprocessing related to the analysis/postprocessing of a haddock3 run labels Jun 13, 2023
@rvhonorato rvhonorato added the community contributions from people outside the haddock team label Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis/postprocessing related to the analysis/postprocessing of a haddock3 run bug Something isn't working community contributions from people outside the haddock team
Projects
None yet
Development

No branches or pull requests

4 participants