Repository for Kai Hung's research project under UC Berkeley NSF SUPERB REU for summer 2022. Kai was fortunate to be mentored by Eunice Yiu and Dr. Alison Gopnik. Credits to Fei Dai for contributing her code and ideas for capturing the 1-D and 2-D learning preference tradeoffs.
Abstract: State-of-the-art deep and reinforcement learning algorithms have achieved incredible progress towards pattern recognition and decision-making problems at the cost of large computing power and/or processed data, but their ability to generalize quickly and reliably remains poor relative to an average human child. To understand how children are able to gather information and learn so much from so little, we focus on computationally modeling children’s decision-making in an approach-avoid paradigm: children can opt to approach a certain stimulus, which may be rewarding or punishing; or they can opt to avoid it and learn nothing about whether the stimulus is rewarding or punishing. Specifically, we perform parameter estimation by fitting experimental data with variants of a standard reinforcement learning model including parameters such as learning rate and inverse temperature. Contrasting children’s best-fit model parameters with adults, we find that children are more exploratory (lower inverse temperature) and less affected by external negative reward factors (smaller negative learning rate), yet more capable of inferring the correct two-dimensional decision rule for maximizing net external reward gains from experimental results.
The experimental data for this project originated from a study conducted by Dr. Emily Liquin and Dr. Alison Gopnik: https://www.sciencedirect.com/science/article/pii/S0010027721003632
This repository is organized around the core pipeline illustrated in the poster and technical talk files, which is contained in reinforcement_learning.ipynb
. Model-related functions are stored in the models
folder where they are further split into generative_models.py
and likelihood_models.py
, corresponding to the set of functions used to generate data given model + parameter and the set of functions used to estimate the parameters given model + data.
It is advised for contributors to view the content of the reinforcement_learning.ipynb
script to understand the overall pipeline before adding/modifying generating and parameter estimation (likelihood) functions. To avoid confusion, one should largely ignore all other scripts beside reinforcement_learning.ipynb
, the models
folder, and helpers.py
at the start.
Here is a detailed breakdown of the files...
models/
generating_models.py
- a script storing functions to generate datalikelihood_models.py
- a script storing functions to estimate parameters
Computational Modeling for Approach-Avoid Task with Reinforcement Learning Frameworks .pptx
- final poster for this projectStudy3_AAData_Adults.csv
- the dataset for adultsStudy3_AAData_Kids.csv
- the dataset for kidsTechnical Talk - Kai Hung.pptx
- final slide presentation for this projectVariable_Key.xlsx
- a key for the variable labels in the above two datasetsadditional.py
- a script containing commented out code for additional analysis, should be c/p into reinforcement_learning.ipynb cells to be rancode_optimization.ipynb
- a script used to debug inefficient codedata_exploration.ipynb
- a script used to perform exploratory data analysishelpers.py
- a script containing non-model helper methods for reinforcement_learning.ipynbmodeling_tutorial.ipynb
- a script modeled after Dr. Anne Collins' computational modeling workflowreinforcement_learning.ipynb
- the main script of this project, where the entire project workflow is conductedrl_model.ipynb
- a script from Fei Dai, containing attempts to model the data with a Bayesian framework; largely incomplete
The overall workflow within reinforcement_learning.ipynb
is as follows:
(1) Scroll to the third code cell with the first line of "Initialize a vector to store the bestllh..." and confirm the number of models you want to use.
(2) Scroll to the "Experiments: Parameter Recovery" section. Perform parameter recovery for all of the models. Their generative and likelihood function should already be in the corresponding "models" folder.
(3) Scroll to the "Model-Fitting on Experimental Data" section. Perform model fitting using the fit_model()
function.
(4) Scroll to the "Model Comparisons" section and follow its workflow to ensure that all the models are individually powerful via the confusion matrix. Make sure that "save = True" for the fit_model()
calls in the previous section or else the global model_info
variable won't have the correct values for this section.
(5) Scroll to the "Model Simulation" section and manually enter the specs associated with the best-fit models for each age group (hint: search "TODO"). WARNING: All the plot functions in this section containssave
and save_path
optional parameters which must be both deleted from their function call if the user does not wish to save the resulting plots. You may also need to create an "outputs" folder in this directory for it to run properly with the save
on.
- The "discount" factor (which really is a tuning parameter on reward perception, and not how the phrase "discount" is typically used in RL and economics) showed promising fit. So, it is very plausible that kids are not treating initial exposure to negative stimuli as purely negative. In fact, it is likely that they may be curious (hence there is an intrinsic reward to better understanding the reward distribution). I imagine that this could both be modeled in a flat intrinsic reward as a function of observation, or through much complicated procedure.
- It may also be interesting to examine the extent of conforming to one-dimensional rule vs. two-dimensional rule between children and adults. We could potentially draw inspiration from the concept of "interaction" in classical linear regression to construct a model with similar components accounting for both the Q function input space of patterns, colors, and the object identities themselves.