Orcid ID

https://orcid.org/0009-0001-3670-0813

Date of Award

8-2024

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Hairong Qi

Committee Members

Hairong Qi, Catherine Schuman, Dan Wilson, Amir Sadovnik

Abstract

Reinforcement Learning (RL) has made significant strides in various domains, yet developing effective control policies for environments with complex, nonlinear dynamics remains a challenge, particularly for policy gradient methods. These methods often struggle due to high-variance in gradient estimates, non-convex optimization landscapes, and sample inefficiency, resulting in unstable learning, suboptimal policies, and trade-offs between performance and reproducibility. The quest for more robust, stable, and effective methods has led to numerous innovations and remains a critical area of research. Proximal Policy Optimization (PPO) has gained popularity in recent years due to its balance in performance, training stability, and computational efficiency. In contrast with their nonlinear counterparts, linear systems are simpler, more predictable, and easier to analyze. Koopman Theory has emerged as a powerful framework for studying nonlinear systems through a globally-linear operator that acts on a higher-dimensional space of measurement functions. Combining these two ideas, Koopman-Inspired Proximal Policy Optimization (KIPPO) extends PPO to learn a simplifying representation of the underlying system's dynamics while retaining essential features for effective policy learning. This is achieved through a Koopman-approximation auxiliary network and carefully designed constraints that enable balancing the complexity of latent dynamics. Results demonstrate improvements over the PPO baseline with 8-60% increased performance while reducing variability by up to 91% when evaluated on diverse continuous control tasks. The study also examines the effects and interactions of key hyperparameters and the impacts of individual loss components through an ablation study, providing a comprehensive analysis of the approach.

Recommended Citation

Cozma, Andrei, "Koopman-Inspired Proximal Policy Optimization (KIPPO). " Master's Thesis, University of Tennessee, 2024.
https://trace.tennessee.edu/utk_gradthes/11783

Download

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Artificial Intelligence and Robotics Commons, Theory and Algorithms Commons

COinS

Masters Theses

Koopman-Inspired Proximal Policy Optimization (KIPPO)

Orcid ID

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Search

Browse

Contributors

Useful Links

About Trace

Masters Theses

Koopman-Inspired Proximal Policy Optimization (KIPPO)

Author

Orcid ID

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Share

Search

Browse

Contributors

Useful Links

About Trace