AutoRA in Research Tab in Documentation #699

chadcwilliams · 2025-01-14T20:15:05Z

I would like to suggest adding a tab in the AutoRA documentation that points towards AutoRA in published research. My thought would be to include it in the nav and have it lead to a page that outlines all of the published research that uses AutoRA (or at least the published research that has come from our group).

The reason this came up is that I wanted to open the latest publication and went to the documentations to find it, but after a lazy search came up with nothing. Maybe I just overlooked it, but I didn't see anything obvious pointing our users to this or other research using AutoRA.

Ideally, the page would include a list of publications, maybe with a few sentences (or the abstract) with it so that people can scan the page to see what AutoRA has produced without needing to open each pub explicitly.

musslick · 2025-01-14T20:18:50Z

@chadcwilliams Thank you for bringing this up! It's hilarious that you just opened this issue –– I had the same thought today. I was wondering if we should dedicate an extra page (or even an extra section in the README.md) to list references using AutoRA, particularly those where AutoRA facilitated novel discoveries. Great idea! Will bring this up in the maintenance meeting with Younes. Also changing the priority to needed because I think we should add it sooner rather than later.

musslick · 2025-01-21T10:32:37Z

Adding @chadcwilliams suggestion here:

AutoRA in Research

White Paper

AutoRA: Automated Research Assistant for Closed-Loop Empirical Research

Sebastian Musslick, Benjamin Andrew, Chad C Williams, Sida Li, Ioana Marinescu, Marina Dubova, George T Dang, Younes Strittmatter, & John G Holland

Automated Research Assistant (autora) is a Python package for automating and integrating empirical research processes, such as experimental design, data collection, and model discovery. With this package, users can define an empirical research problem and specify the methods they want to employ for solving it. autora is designed as a declarative language in that it provides a vocabulary and set of abstractions to describe and execute scientific processes and to integrate them into a closed-loop system for scientific discovery. The package interfaces with other tools for automating scientific practices, such as scikit-learn for model discovery, sweetpea and sweetbean for experimental design, firebase_admin for executing web-based experiments, and autodoc for documenting the empirical research process. While initially developed for the behavioral sciences, autora is designed as a general framework for closed-loop scientific discovery, with applications in other empirical disciplines. Use cases of autora include the execution of closed-loop empirical studies (Musslick et al., 2024), the benchmarking of scientific discovery algorithms (Hewson et al., 2023; Weinhardt et al., 2024), and the implementation of metascientific studies (Musslick et al., 2023).

Musslick, S., Andrew, B., Williams, C. C., Li, S., Marinescu, I., Dubova, M., ... & Holland, J. G. (2024). AutoRA: Automated Research Assistant for Closed-Loop Empirical Research. Journal of Open Source Software, 9(104), 6839.

Manuscripts Using AutoRA

Bayesian Machine Scientist for Model Discovery in Psychology

Joshua Tomas Sealth Hewson, Younes Strittmatter, Ioana Marinescu, Chad C Williams, & Sebastian Musslick

The rapid growth in complex datasets within the field of psychology poses challenges for integrating observations into quantitative models of human information processing. Other fields of research, such as physics, proposed equation discovery techniques as a way of automating data-driven discovery of interpretable models. One such approach is the Bayesian Machine Scientist (BMS), which employs Bayesian inference to derive mathematical equations linking input variables to an output variable. While BMS has shown promise, its application has been limited to a small subset of scientific domains. This study examines the utility of BMS for model discovery in psychology. In Experiment 1, we compare BMS in recovering four models of human information processing against two common psychological benchmark models—linear/logit regression and a black-box neural network—across a spectrum of noise levels. BMS outperformed the benchmark models on the majority of noise levels and demonstrated at least equivalent performance when considering higher levels of noise. These findings demonstrate BMS’s potential for discovering psychological models of human information processing. In Experiment 2, we investigated the impact of informed priors on BMS recovery, comparing domain-specific function priors against a benchmark uniform prior. Specifically, we investigated four priors across research domains spanning their specificity to psychology. We observe that informed priors robustly enhanced BMS performance for only one of the four models of human information processing. In summary, our findings demonstrate the effectiveness of BMS in recovering computational models of human information processing across a range of noise levels; however, whether integrating expert knowledge into the BMS framework improves performance remains a subject of further inquiry.

Hewson, J. T. S., Strittmatter, Y., Marinescu, I., Williams, C. C., & Musslick, S. (2023). Bayesian Machine Scientist for Model Discovery in Psychology. In NeurIPS 2023 AI for Science Workshop.

Expression Sampler as a Dynamic Benchmark for Symbolic Regression

Ioana Marinescu, Younes Strittmatter, Chad C Williams, & Sebastian Musslick

Equation discovery, the problem of identifying mathematical expressions from data, has witnessed the emergence of symbolic regression (SR) techniques aided by benchmarking systems like SRbench. However, these systems are limited by their reliance on static expressions and datasets, which, in turn, provides limited insight into the circumstances under which SR algorithms perform well versus fail. To address this issue, we introduce an open-source method for generating comprehensive SR datasets via random sampling of mathematical expressions. This method enables dynamic expression sampling while controlling for various expression characteristics pertaining to expression complexity. The method also allows for using prior information about expression distributions, for example, to simulate expression distributions for a specific scientific domain. Using this dynamic benchmark, we demonstrate that the overall performance of established SR algorithms decreases with expression complexity and provide insight into which equation features are best recovered. Our results suggest that most SR algorithms overestimate the number of expression tree nodes and trigonometric functions and underestimate the number of input variables present in the ground truth.

Marinescu, I., Strittmatter, Y., Williams, C. C., & Musslick, S. (2023). Expression Sampler as a Dynamic Benchmark for Symbolic Regression. In NeurIPS 2023 AI for Science Workshop.

chadcwilliams added documentation Improvements or additions to documentation. priority 3 - optional Not time sensitive. labels Jan 14, 2025

musslick added priority 1 - needed These are highly desirable to be fixed, ideally within 2 weeks. and removed priority 3 - optional Not time sensitive. labels Jan 14, 2025

musslick self-assigned this Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoRA in Research Tab in Documentation #699

AutoRA in Research Tab in Documentation #699

chadcwilliams commented Jan 14, 2025

musslick commented Jan 14, 2025 •

edited

Loading

musslick commented Jan 21, 2025

AutoRA in Research Tab in Documentation #699

AutoRA in Research Tab in Documentation #699

Comments

chadcwilliams commented Jan 14, 2025

musslick commented Jan 14, 2025 • edited Loading

musslick commented Jan 21, 2025

AutoRA in Research

White Paper

AutoRA: Automated Research Assistant for Closed-Loop Empirical Research

Sebastian Musslick, Benjamin Andrew, Chad C Williams, Sida Li, Ioana Marinescu, Marina Dubova, George T Dang, Younes Strittmatter, & John G Holland

Manuscripts Using AutoRA

Bayesian Machine Scientist for Model Discovery in Psychology

Joshua Tomas Sealth Hewson, Younes Strittmatter, Ioana Marinescu, Chad C Williams, & Sebastian Musslick

Expression Sampler as a Dynamic Benchmark for Symbolic Regression

Ioana Marinescu, Younes Strittmatter, Chad C Williams, & Sebastian Musslick

musslick commented Jan 14, 2025 •

edited

Loading