lunduniversity.lu.se

Denna sida på svenska This page in English

Seminars and Events at automatic control

All seminars are held at the Department of Automatic Control, in the seminar room M 3170-73 on the third floor in the M-building, unless stated otherwise.

 

Seminar by Yulong Gao: Policy Evaluation in Distributional LQR

Seminarium

From: 2025-04-07 14:15 to 15:00
Place: Seminar Room M 3170-73 at Dept. of Automatic Control, LTH
Contact: emma [dot] tegling [at] control [dot] lth [dot] se


Date & Time: April 7th, 14:15-15:00
Location: Seminar Room M 3170-73 at Dept. of Automatic Control, LTH
Speaker: Yulong Gao, Imperial College London
Title: Policy Evaluation in Distributional LQR

Abstract: Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard reinforcement learning.  Meanwhile, a challenge in DRL is that the policy evaluation typically relies on the representation of the return distribution, which needs to be carefully designed. In this talk, we will discuss the special class of DRL problems that rely on a discounted linear quadratic regulator (LQR), which we call distributional LQR. Specifically, we provide a closed-form expression for the distribution of the random return, which is applicable for all types of exogenous disturbance as long as it is independent and identically distributed (i.i.d.). While the proposed exact random return consists of infinitely many random variables, we show that its distribution can be well approximated by a finite number of random variables. The associated approximation error can be analytically bounded under mild assumptions. We further show this truncated random return can be represented in the positive definite quadratic form of random variables, which enables to exactly characterize the probability density function in the Gaussian case. When the model is unknown, we propose a model-free approach for estimating the return distribution, supported by sample complexity guarantees. Finally, we extend our approach to partially observable linear systems.

Bio: Yulong Gao is a Lecturer (Assistant Professor) at the Department of Electrical and Electronic Engineering, Imperial College London. He received the B.E. degree in Automation in 2013, the M.E. degree in Control Science and Engineering in 2016, both from Beijing Institute of Technology, and the joint Ph.D. degree in Electrical Engineering in 2021 from KTH Royal Institute of Technology and Nanyang Technological University. He was a Researcher at KTH from 2021 to 2022 and a postdoctoral researcher at Oxford from 2022 to 2023. His research interests include formal verification and control, machine learning, and applications to safety-critical systems.