implementing a leaderboard #63

sgbaird · 2022-08-24T04:40:18Z

Would probably be good to have a somewhat temporary leaderboard (either in the README or as a separate markdown file that gets displayed prominently on the documentation page) and then in the long-term add it to Matbench. materialsproject/matbench#150 (comment)

For the short-term implementation, maybe just tables with bolding applied to the best values within some tolerance (5% perhaps). Might be nice to be able to see composition vs. structural vs. composition + structure metrics, with the default or most prominently displayed metrics being composition + structure. In other words, display the combined metric where both structure and composition conditions need to be met. The other two (only req is meeting composition condition and only req is meeting structure condition) are instructive and help us understand where certain algorithms are lacking. Right now, I don't think the API tracks the composition and structure conditions independently from each other.

kjappelbaum · 2022-10-21T07:45:37Z

Happy to factor out code from https://github.com/kjappelbaum/mofdscribe/blob/main/dev_scripts/update_bench.py if it helps. The idea I use in mofdscribe is that there is one json that is produced.

These docs are compiled to RST which can then be used by a sphinx extension to create the tables. (And you can also embed some interactive plot as HTML).

sgbaird · 2022-10-21T16:34:46Z

Happy to factor out code from kjappelbaum/mofdscribe@main/dev_scripts/update_bench.py if it helps. The idea I use in mofdscribe is that there is one json that is produced.

@kjappelbaum ooh, interesting! I'd like to dig into this some more. mofdscribe really is a batteries-included package 😄 @ardunn + co. and I have been discussing integrating matbench-genmetrics into the Matbench leaderboard in a Matbench 2.0. @ardunn mentioned that it should be pretty straightforward. Rn I'm working on getting some leaderboard results before deciding on the final architecture of the leaderboard, since it might make sense to change some of the metrics depending on how the results are. At minimum I'd like to have a couple of models with xtal2png, one model with FTCP, and one with CDVAE. Rn I have one model trained using imagen-pytorch (ElucidatedImagen) + xtal2png using the default hyperparameters. Ran it on an A100 for a few days. So compute heavy..

sgbaird · 2023-06-17T19:52:02Z

Hi @kjappelbaum, I'm recircling this and planning to submit another JOSS manuscript. I think implementing a simple leaderboard within the repository would be best rather than trying to incorporate it elsewhere. Would you still be willing to factor out the leaderboard code from mofdscribe like you mentioned?

kjappelbaum · 2023-06-30T05:54:30Z

Ah, somehow I didn't get the notification from this issue.

From our email thread

I think we might have a simple config file for such a benchmarking package in which you can set:

—dir in which results are stored
—dir into which the leaderboard pages will be written
—path to conf.py for sphinx
—metrics to be logged

For the abstractions, I think I’d implement it as a workflow in which we have the option for users to add various callbacks, e.g. if they want to customize plotting.

Besides, there are some reusable utils, such as the watermark

—https://github.com/kjappelbaum/mofdscribe/blob/main/src/mofdscribe/bench/watermark.py

that we could also ship in such a package.

There would perhaps also be a “start” command/CLI that adds the required configuration to the sphinx configuration file.

If we both agree on this setup, I'll make some time end of next week to do it.

sgbaird · 2023-07-01T05:58:33Z

@kjappelbaum no worries, I think I sent this message concurrently with the email thread. This sounds great to me. Thank you!

This was referenced Aug 24, 2022

allow for tracking composition and structure metrics independently #64

Open

Convert StructureMatcher inline code to HTML for partial link #59

Merged

sgbaird mentioned this issue Jun 23, 2023

Public leaderboard and submission system sparks-baird/matsci-opt-benchmarks#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementing a leaderboard #63

implementing a leaderboard #63

sgbaird commented Aug 24, 2022 •

edited

Loading

kjappelbaum commented Oct 21, 2022

sgbaird commented Oct 21, 2022

sgbaird commented Jun 17, 2023 •

edited

Loading

kjappelbaum commented Jun 30, 2023 •

edited

Loading

sgbaird commented Jul 1, 2023

implementing a leaderboard #63

implementing a leaderboard #63

Comments

sgbaird commented Aug 24, 2022 • edited Loading

kjappelbaum commented Oct 21, 2022

sgbaird commented Oct 21, 2022

sgbaird commented Jun 17, 2023 • edited Loading

kjappelbaum commented Jun 30, 2023 • edited Loading

sgbaird commented Jul 1, 2023

sgbaird commented Aug 24, 2022 •

edited

Loading

sgbaird commented Jun 17, 2023 •

edited

Loading

kjappelbaum commented Jun 30, 2023 •

edited

Loading