Título: Partially observable total-cost Markov decision processes with general state and action spaces
Ponente: Pavlo Kasyanov
Fecha: 30/09/2013 12:30h
Lugar: Sala de Seminarios, Edificio Torretamarit
Title: Partially observable total-cost Markov decision processes with general state and action spaces
Speaker: Pavlo Kasyanov
Date: 30/09/2013 12:30h
Location: Sala de Seminarios, Edificio Torretamarit
Resumen
For Partially Observable Markov Decision Processes (POMDPs) with Borel state, observation, and action sets and with the expected total costs, this talk provides sufficient conditions for the existence of optimal policies and validity of other optimality properties including that optimal policies satisfy optimality equations and value iterations converge to optimal values. Action sets may not be compact and one-step functions may not be bounded. Since POMDPs can be reduced to Completely Observable Markov Decision Processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above mentioned optimality properties for COMDPs. The central question is whether transition probabilities for a COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if observation probabilities for the POMDP are continuous in the total variation, and the continuity in the total variation cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples.
Breve Bio
Pavlo Kasyanov es Director del departamento de Sistemas Matemáticos del Instituto de Análisis Aplicado de Sistemas de la Universidad de Politécnica de Kiev. Ha publicado cinco monografías y gran cantidad de artículos en revistas científicas de alto nivel. Sus áreas de interés se centran en las inclusiones diferenciales no lineales de evolución, la teoría de sistemas dinámicos en dimensión infinita y los métodos numéricos en el análisis no lineal y la teoría de optimización.
Abstract
For Partially Observable Markov Decision Processes (POMDPs) with Borel state, observation, and action sets and with the expected total costs, this talk provides sufficient conditions for the existence of optimal policies and validity of other optimality properties including that optimal policies satisfy optimality equations and value iterations converge to optimal values. Action sets may not be compact and one-step functions may not be bounded. Since POMDPs can be reduced to Completely Observable Markov Decision Processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above mentioned optimality properties for COMDPs. The central question is whether transition probabilities for a COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if observation probabilities for the POMDP are continuous in the total variation, and the continuity in the total variation cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples.
Brief Bio
Pavlo Kasyanov es Director del departamento de Sistemas Matemáticos del Instituto de Análisis Aplicado de Sistemas de la Universidad de Politécnica de Kiev. Ha publicado cinco monografías y gran cantidad de artículos en revistas científicas de alto nivel. Sus áreas de interés se centran en las inclusiones diferenciales no lineales de evolución, la teoría de sistemas dinámicos en dimensión infinita y los métodos numéricos en el análisis no lineal y la teoría de optimización.