Masters Theses
Date of Award
8-2024
Degree Type
Thesis
Degree Name
Master of Science
Major
Computer Science
Major Professor
Hairong Qi
Committee Members
Hairong Qi, Catherine Schuman, Dan Wilson, Amir Sadovnik
Abstract
Reinforcement Learning (RL) has made significant strides in various domains, yet developing effective control policies for environments with complex, nonlinear dynamics remains a challenge, particularly for policy gradient methods. These methods often struggle due to high-variance in gradient estimates, non-convex optimization landscapes, and sample inefficiency, resulting in unstable learning, suboptimal policies, and trade-offs between performance and reproducibility. The quest for more robust, stable, and effective methods has led to numerous innovations and remains a critical area of research. Proximal Policy Optimization (PPO) has gained popularity in recent years due to its balance in performance, training stability, and computational efficiency. In contrast with their nonlinear counterparts, linear systems are simpler, more predictable, and easier to analyze. Koopman Theory has emerged as a powerful framework for studying nonlinear systems through a globally-linear operator that acts on a higher-dimensional space of measurement functions. Combining these two ideas, Koopman-Inspired Proximal Policy Optimization (KIPPO) extends PPO to learn a simplifying representation of the underlying system's dynamics while retaining essential features for effective policy learning. This is achieved through a Koopman-approximation auxiliary network and carefully designed constraints that enable balancing the complexity of latent dynamics. Results demonstrate improvements over the PPO baseline with 8-60% increased performance while reducing variability by up to 91% when evaluated on diverse continuous control tasks. The study also examines the effects and interactions of key hyperparameters and the impacts of individual loss components through an ablation study, providing a comprehensive analysis of the approach.
Recommended Citation
Cozma, Andrei, "Koopman-Inspired Proximal Policy Optimization (KIPPO). " Master's Thesis, University of Tennessee, 2024.
https://trace.tennessee.edu/utk_gradthes/11783