Supplementary Material for "Autonomous LLM-driven research from data to human-verifiable research papers"
This repository contains the supplementary material of the pre-print arXiv:2404.17605 "Autonomous LLM-driven research from data to human-verifiable research papers", in which we describe data-to-paper, a framework for systematically navigating the power of AI to perform complete end-to-end data-science research, starting from raw data and concluding with comprehensive, reproducible, and correct scientific papers. data-to-paper software can be found here.
For the V0 version of the supplementary material click here.
In 4 case studies, we showed that data-to-paper can perform full run cycles, from data alone to complete research papers, across different datasets and fields:
-
Health Indicators (open goal). A clean unweighted subset of CDC’s Behavioral Risk Factor Surveillance System (BRFSS) 2015 annual dataset (Kaggle). Example Paper created by data-to paper.
-
Social Network (open goal). A directed graph of Twitter interactions among the 117th Congress members (Fink et al). Example Paper created by data-to paper.
-
Treatment Policy (fixed-goal). A dataset on treatment and outcomes of non-vigorous infants admitted to the Neonatal Intensive Care Unit (NICU), before and after a change to treatment guidelines was implemented (Saint-Fleur et al). Example Paper created by data-to paper.
-
Treatment Optimization (fixed-goal). A dataset of pediatric patients, which received mechanical ventilation after undergoing surgery, including an x-ray-based determination of the optimal tracheal tube intubation depth and a set of personalized patient attributes to be used in machine learning and formula-based models to predict this optimal depth (Shim et al). Example Paper created by data-to paper.