Learning Adaptive Safety for Multi-Agent Systems
Preprint and supplementary material available online.
Overview
Ensuring safety in dynamic multi-agent systems is challenging due to limited information about the other agents. Control Barrier Functions (CBFs) show promise for safety assurance but current methods make strong assumptions about other agents and often rely on manual tuning to balance safety, feasibility, and performance. In this work, we will delve into the problem of adaptive safe learning for multi-agent systems with CBF. We will show how emergent behavior can be profoundly influenced by the CBF configuration, highlighting the necessity for a responsive and dynamic approach to CBF design. So far, we have developed ASRL, a novel adaptive safe RL framework, to fully automate the optimization of policy and CBF coefficients, to enhance safety and long-term performance through reinforcement learning. By directly interacting with the other agents, ASRL learns to cope with diverse agent behaviors and maintains the cost violations below a desired limit. We will evaluate ASRL in a multi-robot system and a competitive multi-agent racing scenario, against learning-based and control-theoretic approaches. We will build upon CBF-based control to formulate a theory for safe control synthesis for hybrid dynamical systems.
Video Overview
Installation
The implementation has been tested with Python 3.8 under Ubuntu 20.04.
Steps
- Clone this repository.
- Install requirements: ```bash pip install -r requirements.txt
Docker
For better reproducibility, we will release soon a Dockerfile to build a container with all the necessary dependencies. :construction_worker:
Reproducing the Results
We assume that all the experiments are run from the project directory
and that the project directory is added to the PYTHONPATH environment variable as follows:
export PYTHONPATH=$PYTHONPATH:$(pwd)
Experiment 1 - End-to-End Training

- For the multi-robot environment, run from the project directory:
./scripts/run_exp_baselines.sh [0-6]
where the exp-id [0-6] denotes runs with
PPOPID, PPOLag, CPO, IPO, DDPGLag, TD3Lag, and PPOSaute respectively.
- Similary, For the racing environment, run:
./scripts/run_exp_baselines.sh [7-13]
The results will be saved in the logs/baselines folder.
Experiment 2 - Ablation Study

We provide a couple of ablate models to augment built-in controllers with adaptive safety in the checkpoints folder.
To play with trained models with adaptive safety, run:
./scripts/run_checkpoint_eval.sh [0-1]
where the exp-id [0-1] denotes runs for particle-env and racing environments respectively.
Publications
Contributors
Citation
@misc{berducci2023learning,
title={Learning Adaptive Safety for Multi-Agent Systems},
author={Luigi Berducci and Shuo Yang and Rahul Mangharam and Radu Grosu},
year={2023},
eprint={2309.10657},
archivePrefix={arXiv},
primaryClass={cs.RO}
}