publications | Kai J. Sandbrink

2024

Understanding Human Meta-Control and Its Pathologies Using Deep Neural Networks

Kai Jappe Sandbrink, Laurence Hunt, and Christopher Summerfield

Sep 2024

Abs

In mammals, neurons in the medial prefrontal cortex respond to action prediction errors (APEs). Here, using computational simulations with deep neural networks, we show the that this error monitoring process is crucial for inferring how controllable an environment is, and thus for estimating the value of control processes (meta-control). We trained both humans and deep reinforcement learning (RL) agents to perform a reward-guided learning task that required adaptation to changes in environmental controllability. Deep RL agents could only solve the task when designed to explicitly predict APEs, and when trained this way, they displayed signatures of meta-control that closely resembled those observed in humans. Moreover, when deep RL agents were trained to over- or under-estimate controllability, they developed behavioural pathologies matching those of humans who reported depressive, anxious or compulsive traits on transdiagnostic questionnaires. These findings open up new avenues for studying both healthy and pathological meta-control using deep neural networks.
Can Reinforcement Learning Model Learning across Development? Online Lifelong Learning through Adaptive Intrinsic Motivation

Kai J. Sandbrink, Brian Christian, Linas M. Nasvytis, and 2 more authors

Proceedings of the Annual Meeting of the Cognitive Science Society, Sep 2024

Abs

Reinforcement learning is a powerful model of animal learning in brief, controlled experimental conditions, but does not readily explain the development of behavior over an animal’s whole lifetime. In this paper, we describe a framework to address this shortcoming by introducing the single-life reinforcement learning setting to cognitive science. We construct an agent with two learning systems: an extrinsic learner that learns within a single lifetime, and an intrinsic learner that learns across lifetimes, equipping the agent with intrinsic motivation. We show that this model outperforms heuristic benchmarks and recapitulates a transition from exploratory to habit-driven behavior, while allowing the agent to learn an interpretable value function. We formulate a precise definition of intrinsic motivation and discuss the philosophical implications of using reinforcement learning as a model of behavior in the real world.
Flexible Task Abstractions Emerge in Linear Networks with Fast and Bounded Units

Kai Jappe Sandbrink, Jan Philipp Bauer, Alexandra Maria Proca, and 3 more authors

In The Thirty-eighth Annual Conference on Neural Information Processing Systems, Nov 2024

Abs

Animals survive in dynamic environments changing at arbitrary timescales, but such data distribution shifts are a challenge to neural networks. To adapt to change, neural systems may change a large number of parameters, which is a slow process involving forgetting past information. In contrast, animals leverage distribution changes to segment their stream of experience into tasks and associate them with internal task abstracts. Animals can then respond flexibly by selecting the appropriate task abstraction. However, how such flexible task abstractions may arise in neural systems remains unknown. Here, we analyze a linear gated network where the weights and gates are jointly optimized via gradient descent, but with neuron-like constraints on the gates including a faster timescale, non-negativity, and bounded activity. We observe that the weights self-organize into modules specialized for tasks or sub-tasks encountered, while the gates layer forms unique representations that switch the appropriate weight modules (task abstractions). We analytically reduce the learning dynamics to an effective eigenspace, revealing a virtuous cycle: fast adapting gates drive weight specialization by protecting previous knowledge, while weight specialization in turn increases the update rate of the gating layer. Task switching in the gating layer accelerates as a function of curriculum block size and task training, mirroring key findings in cognitive neuroscience. We show that the discovered task abstractions support generalization through both task and subtask composition, and we extend our findings to a non-linear network switching between two tasks. Overall, our work offers a theory of cognitive flexibility in animals as arising from joint gradient descent on synaptic and neural gating in a neural network architecture.

2023

Contrasting Action and Posture Coding with Hierarchical Deep Neural Network Models of Proprioception

Kai J Sandbrink, Pranav Mamidanna, Claudio Michaelis, and 3 more authors

eLife, May 2023

Abs

Biological motor control is versatile, efficient, and depends on proprioceptive feedback. Muscles are flexible and undergo continuous changes, requiring distributed adaptive control mechanisms that continuously account for the body’s state. The canonical role of proprioception is representing the body state. We hypothesize that the proprioceptive system could also be critical for high-level tasks such as action recognition. To test this theory, we pursued a task-driven modeling approach, which allowed us to isolate the study of proprioception. We generated a large synthetic dataset of human arm trajectories tracing characters of the Latin alphabet in 3D space, together with muscle activities obtained from a musculoskeletal model and model-based muscle spindle activity. Next, we compared two classes of tasks: trajectory decoding and action recognition, which allowed us to train hierarchical models to decode either the position and velocity of the end-effector of one’s posture or the character (action) identity from the spindle firing patterns. We found that artificial neural networks could robustly solve both tasks, and the networks’ units show tuning properties similar to neurons in the primate somatosensory cortex and the brainstem. Remarkably, we found uniformly distributed directional selective units only with the action-recognition-trained models and not the trajectory-decoding-trained models. This suggests that proprioceptive encoding is additionally associated with higher-level functions such as action recognition and therefore provides new, experimentally testable hypotheses of how proprioception aids in adaptive motor control.
Learning the Value of Control with Deep RL

Kai Sandbrink, and Christopher Summerfield

In 2023 Conference on Cognitive Computational Neuroscience, May 2023
Children Prioritize Purely Exploratory Actions in Observe-vs.-Bet Tasks

Eunice Yiu, Kai Sandbrink, and Alison Gopnik

In Intrinsically-Motivated and Open-Ended Learning Workshop @NeurIPS2023, Nov 2023

Abs

In reinforcement learning, agents often need to decide between selecting actions that are familiar and have previously yielded positive results (exploitation), and seeking new information that could allow them to uncover more effective actions (exploration). Understanding the specific kinds of heuristics and strategies that humans employ to solve this problem over the course of their development remains an open question in cognitive science and AI. In this study we develop an "observe or bet" task that separates "pure exploration” from "pure exploitation.” Participants have the option to either observe an instance of an outcome and receive no reward, or to bet on an action that is eventually rewarding, but offers no immediate feedback. We collected data from 56 five-to-seven-year-old children who completed the task at one of three different probability levels. We compared how children performed against both approximate solutions to the partially-observable Markov decision process and meta-RL models that were meta trained on the same decision making task across different probability levels. We found that the children observe significantly more than the two classes of algorithms. We then quantified how children’s policies differ between the different probability levels by fitting probabilistic programming models and by calculating the likelihood of the children’s actions under the task-driven model. The fitted parameters of the behavioral model as well as the direction of the deviation from neural network policies demonstrate that the primary way children change the frequency with which they bet on the door for which they have less evidence. This suggests both that children model the causal structure of the environment and that they produce a “hedging behavior” that would be impossible to detect in standard bandit tasks, and that reduces variance in overall rewards. The results shed light on how children reason about reward and information, providing a developmental benchmark that can help shape our understanding of both human behavior and RL neural network models.

2020

DLC2Kinematics: A Post-Deeplabcut Module for Kinematic Analysis

Mackenzie Mathis, Jessy Lauer, Tanmay Nath, and 5 more authors

Feb 2020

Abs

Kinematic analysis is crucial in biomedical, biomechanical, life sciences and medicine. Here, we present a python toolbox for analysis of markerless motion capture data collected with DeepLabCut. This toolbox represents the contributions of members of the Mathis Lab of Adaptive Motor Control from 2017-2023. Please see https://github.com/AdaptiveMotorControlLab/DLC2Kinematics for up to date versions. We kindly ask that if you use this code you cite the software.

2018

Time Series Forecasting of Air Quality Based On Regional Numerical Modeling in Hong Kong

Tong Liu, Alexis K. H. Lau, Kai Sandbrink, and 1 more author

Journal of Geophysical Research: Atmospheres, Feb 2018

Abs

Based on prevailing numerical forecasting models (Community Multiscale Air Quality [CMAQ] model , Comprehensive Air Quality Model with Extensions, and Nested Air Quality Prediction Modeling System) and observations from monitoring stations in Hong Kong, we employ a set of autoregressive integrated moving average (ARIMA) models with numerical forecasts (ARIMAX) to improve the forecast of air pollutants including PM2.5, NO2, and O3. The results show significant improvements in multiple evaluation metrics for daily (1–3 days) and hourly (1–72 hr) forecast. Forecasts on daily 1-hr and 8-hr maximum O3 are also improved. For instance, compared with CMAQ, applying CMAQ-ARIMA reduces average root-mean-square errors (RMSEs) at all stations for daily average PM2.5, NO2, and O3 in the next 3 days by 14.3–21.0%, 41.2–46.3%, and 47.8–49.7%, respectively. For hourly forecasts in the next 72 hr, reductions in RMSEs brought by ARIMAX using CMAQ are 18.2% for PM2.5, 32.1% for NO2, and 36.7% for O3. Large improvements in RMSEs are achieved for nonrural PM2.5 and rural NO2 using ARIMAX with three numerical models. Dynamic hourly forecast shows that ARIMAX can be applied for forecast of 7- to 72-hr PM2.5, 4- to 72-hr NO2, and 4- to 6-hr O3. Besides applying ARIMAX for NO2, we recommend a mixed forecast strategy to ARIMAX for normal values of PM2.5 and O3 and employ numerical models for outputs above 75th percentile of historical observations. Our hybrid ARIMAX method can combine the advantage of ARIMA and numerical modeling to assist real-time air quality forecasting efficiently and consistently.