When navigation algorithms realize their potential, autonomous driving will usher a new era of mobility, providing cheaper mobility to the elderly, the disabled and the young. It will reduce car ownership, increase fuel efficiency and reduce traffic jams. The navigation problem needs to be solved efficiently. It is well understood that slower reaction time in driving can be fatal. For the self-driving cars to be safe, the navigation algorithms have to be optimized without compromising accuracy. I will talk about optimizing algorithms in the navigation pipeline.
Navigation is often addressed in three parts: mapping, localization and planning. We focus on grid based mapping algorithms that divide the space into grid cells. The state of the art (SOTA) grid based mapping algorithms either make a limiting assumption of each grid-cell being independent of its neighboring cells or use slow sampling based methods like Gibbs sampling and Metropolis Hastings. Although the sampling based methods are guaranteed to converge, they are slow in practice because they do not fully utilize the relationships among random variables. We instead use modern inference methods like Belief Propagation and Dual Decomposition which not only converge up to two times faster but also lead to more accurate maps.
Recognizing the fact that robots are able to collaborate with each other and can cooperatively map the environment faster and while accomplishing the task faster, we address the problem two robot mutual localization. We propose a polynomial root-finding based mutual localization algorithm that uses observations from two robots and depends upon only a few fiducial markers instead of external landmarks. Dependence upon fewer fiducial markers makes our algorithm an order of magnitude faster than external landmarks based methods like Bundler.
A special class of planning algorithms, called Goal-conditioned reinforcement learning (RL), is applied to cases when the goal location can change for every trial. The SOTA algorithms in Goal-conditioned RL use redundant ways to specify the goal location. Due to this redundant information, the algorithms like Hindsight Experience Replay are slow because of reward re-sampling. We propose a deep extension to Floyd-Warshall Reinforcement Learning which leads to removal of this duplicate information and avoid rewards re-sampling. The resultant algorithm requires only half the reward samples as the baselines.