BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20260610T190409EDT-0354VVTkBu@132.216.98.100
DTSTAMP:20260610T230409Z
DESCRIPTION:Informal Systems Seminar (ISS)\, Centre for Intelligent Machine
 s (CIM) and Groupe d'Etudes et de Recherche en Analyse des Decisions (GERA
 D)\n\nSpeaker: Amit Sinha\n	\n	** Note that this is a hybrid event.\n	** This
  seminar will be projected at McConnell 437 at McGill University.\n	\n	Zoom 
 Link\n	Meeting ID: 845 1388 1004       \n	Passcode: VISS\n	\n	Abstract: The tr
 aditional approach to POMDPs is to convert them into fully observed MDPs b
 y considering a belief state as an information state. However\, a belief-s
 tate based approach requires perfect knowledge of the system dynamics and 
 is therefore not applicable in the learning setting where the system model
  is unknown. Various approaches to circumvent this limitation have been pr
 oposed in the literature. A unified treatment of these approaches involves
  considering the 'agent state'\, which is a model-free\, recursively updat
 eable function of the observation history. Some examples of an agent state
  include frame stacking and recurrent neural networks. Since the agent sta
 te is model-free\, it is used to adapt standard RL algorithms to POMDPs. H
 owever\, standard RL algorithms like Q-learning learn a deterministic stat
 ionary policy. Since the agent state is not an information state\, we cann
 ot apply the same results for MDPs and thus\, we must first consider what 
 happens with the different policy classes: stationary/non-stationary and d
 eterministic/stochastic. Our main thesis that we illustrate via examples i
 s that because the agent state is not information state\, non-stationary a
 gent-state based policies can outperform stationary ones. To leverage this
  feature\, we propose PASQL (periodic agent-state based Q-learning)\, whic
 h is a variant of agent-state-based Q-learning that learns periodic polici
 es. By combining ideas from periodic Markov chains and stochastic approxim
 ation\, we rigorously establish that PASQL converges to a cyclic limit and
  characterize the approximation error of the converged periodic policy. Fi
 nally\, we present a numerical experiment to highlight the salient feature
 s of PASQL and demonstrate the benefit of learning periodic policies over 
 stationary policies.\n	\n	Affiliation: Amit Sinha is a PhD candidate in the 
 Department of Electrical and Computer Engineering\, McGill University.\n
DTSTART:20240704T140000Z
DTEND:20240704T150000Z
LOCATION:Zames Seminar Room\, MC 437\, McConnell Engineering Building\, CA\
 , QC\, Montreal\, H3A 0E9\, 3480 rue University
SUMMARY:Periodic agent-state based Q-learning for POMDPs
URL:https://www.mcgill.ca/cim/channels/event/periodic-agent-state-based-q-l
 earning-pomdps-357836
END:VEVENT
END:VCALENDAR
