
In many practical problems from online advertisement to healthcare and computational finance, it is extremely important to have guarantees on the performance and other characteristics of the policy generated by our algorithms. This reduces the risk of deploying our policy and helps us to convince the product (hospital, investment) managers that it is not going to harm their business. In the first part of the talk, we provide an overview of our work on learning safe and risk-sensitive policies in sequential decision-making problems. The notion of safety studied here is “safety w.r.t. a baseline”, i.e., a policy is considered safe if it is guaranteed to perform at least as well as a baseline. We look at the problem of safety w.r.t. a baseline from three different angles that are related to off-policy evaluation and counterfactual inference; robust control and the simulation to real problem; and conservative exploration in online learning. The second part of the talk is about controlling non-linear dynamical systems from high-dimensional observations (e.g., raw pixel images) that is robust to noise in the system dynamics. Our method is a principled way of combining variational auto-encoders with locally-optimal controllers. It uses a deep generative model from the family of variational auto-encoders that learns the predictive conditional density of the future observation given the current one, while introducing a low-dimensional embedding space for control. We introduce specific structure in the generative graphical model so that the dynamics in the embedding space is constrained to be locally linear. We also propose a principled variational approximation of the embedding posterior that is (more) robust against the noise.
<tjavidi@ucsd.edu>