Orcid ID

https://orcid.org/0009-0004-5349-9455

Date of Award

12-2024

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Catherine Schuman

Committee Members

Piotr Luszczek, James Plank

Abstract

In this modern era of AI revolution, there have been massive and rapid investments in data-driven large-scale AI systems. However, the high-performance computing techniques that enable the computation for these rapidly growing AI systems, consume a staggering amount of energy and resources. This proliferation of AI brings new optimization challenges for sustainability without losing scalability and performance. This thesis research aims to tackle these challenges and provide a way forward for scalable and sustainable AI. Among the energy-efficient alternatives of the traditional Von Neumann architecture, neuromorphic computing and its Spiking Neural Networks (SNNs) are a promising choice due to their inherent energy efficiency. However, in some real-world application scenarios such as complex continuous control tasks, SNNs often lack the performance optimizations of traditional artificial neural networks. To address this issue, researchers have combined SNNs with Deep Reinforcement Learning (DeepRL) algorithms to utilize their optimization techniques. Although this integration manages to accomplish these complex tasks, the question of scalability still remains unexplored. Hence, this thesis presents a novel model called SpikeRL, which is a scalable and efficient framework for DeepRL-based SNNs for complex continuous control tasks. The SpikeRL framework consists of three major components. First, a DeepRL-based SNN model utilizing population encoding. Second, distributed computing across models and environments is implemented using PyTorch’s distributed package with both Message Passing Interface (MPI) and NVIDIA Collective Communications ivLibrary (NCCL) backends. Lastly, further optimization for model training is achieved by using mixed-precision techniques for parameter updates. Comparison analyses with the state-of-the-art SNN methods demonstrated that the SpikeRL model achieves an overall performance increase of 40%, an energy efficiency of 39%, and a carbon emission reduction of 28%. The SpikeRL model was also tested on a neuromorphic hardware at TENNLab using the Reduced Instruction Spiking Processor (RISP) simulation to run the inference of the model. Although the deployment of SpikeRL on a neuromorphic hardware is still work in progress, the research findings presented in this thesis demonstrate the scalability and energy efficiency of SpikeRL for the training of complex continuous control agents, leading to advancements in the domain of scalable and sustainable AI.

Recommended Citation

Tahmid, Tokey, "Energy-Efficient Computing for Scalable and Sustainable AI. " Master's Thesis, University of Tennessee, 2024.
https://trace.tennessee.edu/utk_gradthes/12870

Download

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Computer and Systems Architecture Commons, Other Computer Engineering Commons, Robotics Commons

COinS

Masters Theses

Energy-Efficient Computing for Scalable and Sustainable AI

Orcid ID

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Search

Browse

Contributors

Useful Links

About Trace

Masters Theses

Energy-Efficient Computing for Scalable and Sustainable AI

Author

Orcid ID

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Recommended Citation

Included in

Share

Search

Browse

Contributors

Useful Links

About Trace