Project 6: BioModelsML: Building a FAIR and reproducible collection of machine learning models in life science and medicine for easy reuse
Machine Learning (ML) models are scattered across various resources without sufficient metadata, making them difficult to find, access, and reuse. Additionally, the lack of standards and interoperability in ML formats and programming frameworks, along with dependencies on multiple software libraries, make reproducing ML models a resource-intensive task. BioModelsML project aims to develop a comprehensive collection of FAIR and reproducible ML models for re-use in life science and medical research.
BioHackEU22#9 and continuous team effort have paved the way to develop a FAIR-ML checklist and workflows covering aspects such as training code, trained model sharing, dataset linking, figure reproduction, evaluation metrics, Docker for building and applying trained ML models, model metadata, and dissemination via BioModels. We are currently working on several reference ML models to demonstrate application of checklist and workflows.
In BioHackathon2023, we aim to invite ML modellers and software developers (both onsite and virtual) to curate and disseminate ML models via BioModels using our FAIR-ML checklist. We will use interactive hacking sessions to improve our checklist and prepare documentation for community curation. Through interaction with the EDAM and BioSchemas teams, we will improve metadata annotation aspects and align our checklist with the DOME guidelines.
This proposal addresses a grand challenge and has the potential to have a huge impact on the field. The BioHackathon2023 will help us finalize key aspects of FAIR ML model curation and dissemination via BioModels and provide enough pilot work to apply for the needed large national or international funding to drive this project.
Our short team goal is to refine our current FAIR-ML checklist and workflow to share a small collection of metadata-rich reproducible ML models via BioModels within six months. Our long-term goal is to build an open and free collection of FAIR reproducible ML models in BioModels through internal and community curation.
In the BH2023, we will revise the FAIR-ML checklist and workflows through community engagement with ML modellers to gather their feedback as well as support them to disseminate their models through BioModels.
We will engage with the BioCuration, BioSchemas, EDAM, DOME, APICURON and the ML community to align our project. Specifically, we will try to establish the minimal metadata required to annotate ML models and the required ontologies with members of ELIXIR ML group.
The team consists of eight on-site and remote members, including leads. We will try to recruit at least three additional ML/domain experts.
Please join the 06_fair-ml-biomodels channel in the BioHackEU Slack channel. We will coordinate meetings and activities via Slack to help us keep our on-site and remote participants in sync. If you have any questions, please contact one of the project leads below.
Rahuman Sheriff, Nils Hoffmann, Sumukh Deshpande