Leveraging parallelism for training navigation policies through Reinforcement Learning

Modern approaches for training RL agents employ extreme parallelism to lower training times.

Master Thesis Proposal in Robotics and AI

This Masters Thesis aims to explore how parallelism can be utilized to accelerate the training of navigation policies for aerial robots using reinforcement learning (RL).

As the complexity of real-world environments increases, traditional RL methods face challenges in terms of computational efficiency, particularly in training agents for fast and reliable navigation. By leveraging parallelism, through the simultaneous execution of multiple agents or environments, the Masters student will focus on improving learning speed and policy convergence. The student will work with simulation environments such as Aerial Gym and PyBullet to train aerial robots for a challenging navigation task, where an aerial robot has to reach to goal in dynamic 3D environments, using onboard sensors. These tasks are computationally intensive, and parallelization offers a way to distribute the workload across multiple cores or machines, allowing faster and more efficient policy learning.

The work will involve the implementation and evaluation of RL algorithms, such as Proximal Policy Optimization (PPO), in parallelized frameworks. Specifically, the focus will be on how parallelism can enhance learning performance, reduce training time, and improve scalability in aerial robot navigation tasks. Using Aerial Gym and PyBullet as testbeds, the student will assess how these parallelized RL approaches perform compared to traditional methods, focusing on metrics like speed of convergence, accuracy of navigation, and computational resource utilization. This thesis aims to gain practical insights into developing fast and reliable navigation policies for autonomous aerial robots, with potential applications in search and rescue missions, surveillance, and autonomous delivery systems.

Some intermediate tasks leading up to the final goals would be the following:

Simulation Environment Setup: Become proficient in Aerial Gym and PyBullet and test basic aerial navigation tasks. Develop novel drone robot models involving soft components.
Literature Review & Background Research: Study reinforcement learning (RL), parallelization techniques, and aerial navigation. Survey key RL algorithms like PPO and DQN.
RL Algorithm Implementation: Implement and train RL models (PPO, DQN) for aerial robot navigation tasks in a traditional sequential manner.
Parallel RL Framework: Design and implement a parallelized RL framework using Aerial Gym and PyBullet. Run tests to ensure that multi-agent or multi-environment parallelization improves training speed and efficiency.
Performance Evaluation & Optimization: Compare parallel vs. sequential RL performance, focusing on training time, scalability, and accuracy. Optimize the parallel framework to guarantee generalizability and successful translation to experiments.

Contact:

Akshit Saradagi, akshit.saradagi@ltu.se

Fausto Lagos, vidya.sumathy@ltu.se

George Nikolakopoulos, geonik@ltu.se