I am David Yu-Tung Hui / 許宇同, a 2023 MSc graduate from Mila, University of Montreal.
I'm interested in creating algorithms that learn through interactions with an environment. I hope that these algorithms will eventually be used to discover new knowledge about our world.
To achieve this dream, I research how to train deep neural networks with reinforcement learning (RL). RL algorithms formalize the process of learning through interaction as an optimization problem, and deep neural networks have been shown to be highly effective at numerical optimization in the previous decade. My research specifically uses linear algebra and probability theory to design principled loss functions for stable optimization across a variety of settings.
I've written two works toward this goal:
-
Stabilizing Q-Learning for Continuous Control, (MSc Thesis) showed that critic networks with LayerNorm had convergent semi-gradient updates of the mean-squared temporal-difference error. This enabled learning high-dimensional continuous control tasks such as dog-run in DeepMind Control.
-
Double Gumbel Q-Learning, (Spotlight @NeurIPS 2023) showed that Maximum-Entropy RL has two heteroscedastic Gumbel noise sources. Accounting for these noise sources improved the aggregate performance of SAC by 2x at 1M training timesteps.
I'm currently looking for opportunities where I can continue my research.
For more information about me, please see: