Skip to content

Berkeley AI Research project on RL modeling of exploration and exploitation trade-offs between children and adults.

Notifications You must be signed in to change notification settings

KataTech/explore-vs-exploit-gopniklab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

explore-vs-explorit-gopniklab

Project: Computational Modeling of Approach-Avoid Task with Reinforcement Learning Frameworks

Overview

Repository for Kai Hung's research project under UC Berkeley NSF SUPERB REU for summer 2022. Kai was fortunate to be mentored by Eunice Yiu and Dr. Alison Gopnik. Credits to Fei Dai for contributing her code and ideas for capturing the 1-D and 2-D learning preference tradeoffs.

Abstract: State-of-the-art deep and reinforcement learning algorithms have achieved incredible progress towards pattern recognition and decision-making problems at the cost of large computing power and/or processed data, but their ability to generalize quickly and reliably remains poor relative to an average human child. To understand how children are able to gather information and learn so much from so little, we focus on computationally modeling children’s decision-making in an approach-avoid paradigm: children can opt to approach a certain stimulus, which may be rewarding or punishing; or they can opt to avoid it and learn nothing about whether the stimulus is rewarding or punishing. Specifically, we perform parameter estimation by fitting experimental data with variants of a standard reinforcement learning model including parameters such as learning rate and inverse temperature. Contrasting children’s best-fit model parameters with adults, we find that children are more exploratory (lower inverse temperature) and less affected by external negative reward factors (smaller negative learning rate), yet more capable of inferring the correct two-dimensional decision rule for maximizing net external reward gains from experimental results.

The experimental data for this project originated from a study conducted by Dr. Emily Liquin and Dr. Alison Gopnik: https://www.sciencedirect.com/science/article/pii/S0010027721003632

Instructions

This repository is organized around the core pipeline illustrated in the poster and technical talk files, which is contained in reinforcement_learning.ipynb. Model-related functions are stored in the models folder where they are further split into generative_models.py and likelihood_models.py, corresponding to the set of functions used to generate data given model + parameter and the set of functions used to estimate the parameters given model + data.

It is advised for contributors to view the content of the reinforcement_learning.ipynb script to understand the overall pipeline before adding/modifying generating and parameter estimation (likelihood) functions. To avoid confusion, one should largely ignore all other scripts beside reinforcement_learning.ipynb, the models folder, and helpers.py at the start.

Here is a detailed breakdown of the files...

  • models/
    • generating_models.py - a script storing functions to generate data
    • likelihood_models.py - a script storing functions to estimate parameters
  • Computational Modeling for Approach-Avoid Task with Reinforcement Learning Frameworks .pptx - final poster for this project
  • Study3_AAData_Adults.csv - the dataset for adults
  • Study3_AAData_Kids.csv - the dataset for kids
  • Technical Talk - Kai Hung.pptx - final slide presentation for this project
  • Variable_Key.xlsx - a key for the variable labels in the above two datasets
  • additional.py - a script containing commented out code for additional analysis, should be c/p into reinforcement_learning.ipynb cells to be ran
  • code_optimization.ipynb - a script used to debug inefficient code
  • data_exploration.ipynb - a script used to perform exploratory data analysis
  • helpers.py - a script containing non-model helper methods for reinforcement_learning.ipynb
  • modeling_tutorial.ipynb - a script modeled after Dr. Anne Collins' computational modeling workflow
  • reinforcement_learning.ipynb - the main script of this project, where the entire project workflow is conducted
  • rl_model.ipynb - a script from Fei Dai, containing attempts to model the data with a Bayesian framework; largely incomplete

The overall workflow within reinforcement_learning.ipynb is as follows: (1) Scroll to the third code cell with the first line of "Initialize a vector to store the bestllh..." and confirm the number of models you want to use. (2) Scroll to the "Experiments: Parameter Recovery" section. Perform parameter recovery for all of the models. Their generative and likelihood function should already be in the corresponding "models" folder. (3) Scroll to the "Model-Fitting on Experimental Data" section. Perform model fitting using the fit_model() function. (4) Scroll to the "Model Comparisons" section and follow its workflow to ensure that all the models are individually powerful via the confusion matrix. Make sure that "save = True" for the fit_model() calls in the previous section or else the global model_info variable won't have the correct values for this section. (5) Scroll to the "Model Simulation" section and manually enter the specs associated with the best-fit models for each age group (hint: search "TODO"). WARNING: All the plot functions in this section containssave and save_path optional parameters which must be both deleted from their function call if the user does not wish to save the resulting plots. You may also need to create an "outputs" folder in this directory for it to run properly with the save on.

Future Directions and Ideas

  • The "discount" factor (which really is a tuning parameter on reward perception, and not how the phrase "discount" is typically used in RL and economics) showed promising fit. So, it is very plausible that kids are not treating initial exposure to negative stimuli as purely negative. In fact, it is likely that they may be curious (hence there is an intrinsic reward to better understanding the reward distribution). I imagine that this could both be modeled in a flat intrinsic reward as a function of observation, or through much complicated procedure.
  • It may also be interesting to examine the extent of conforming to one-dimensional rule vs. two-dimensional rule between children and adults. We could potentially draw inspiration from the concept of "interaction" in classical linear regression to construct a model with similar components accounting for both the Q function input space of patterns, colors, and the object identities themselves.

About

Berkeley AI Research project on RL modeling of exploration and exploitation trade-offs between children and adults.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published