Two Time Scale Algorithms for Variance-penalized Control and Risk-neutral Semi-Markov Control

by Prof. Abhijit Gosavi (Missouri University of Science and Technology, USA)

Thursday, December 16, 2010 from 14:30 to 15:30 (Asia/Kolkata)
at Colaba Campus ( A-212 )

Description

The two time scale framework has been applied in numerous settings for solving Markov decision processes. It has some remarkable properties that allow it to develop solution algorithms for problems that are difficult to solve with single-time-scale algorithms, such as classical value iteration or Q-Learning. In this talk, we will discuss two applications of this framework. The first will be for solving a variance-penalized Markov decision process using dynamic programming. The second application will be for developing an actor critic algorithm that can solve a risk-neutral semi-Markov decision process. For the actor critic, we will present some numerical results from a case study in airline revenue management (this is joint work with Sean Meyn of the University of Illinois and Susan Murray, Ketaki Kulkarni, and Katie Grantham of Missouri S & T).

Organised by

John Barretto