BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20260611T061904EDT-8633ezo37m@132.216.98.100
DTSTAMP:20260611T101904Z
DESCRIPTION:Abstract\n\nDeep Reinforcement Learning (DRL) has transformed d
 ecision-making in areas such as game playing\, robotics\, protein structur
 e prediction\, and reasoning in large language models. However\, its pract
 ical use is often hindered by the issue of low sample efficiency. Unlike h
 umans\, DRL agents typically require millions of interactions to learn eff
 ective policies\, making training costly and time-consuming. This thesis t
 ackles the sample efficiency challenge in DRL through three novel approach
 es and demonstrates a practical application in time series forecasting.\n
 \nFirst\, we address the offline RL setting\, where policies are learned f
 rom fixed datasets without further online environment interaction. We show
  that existing model-free methods tend to produce overly conservative poli
 cies and propose a relaxed behavior regularization strategy to overcome th
 is issue.\n\nNext\, we investigate the use of pre-trained Vision-Language 
 Models (VLMs) to guide online RL in reward-sparse environments. While VLMs
  can provide useful task progress signals\, we identify a reward misalignm
 ent problem. To fix this\, we introduce FuRL\, a method that aligns VLM-de
 rived rewards with task goals\, significantly improving learning efficienc
 y.\n\nWe also explore Inverse Reinforcement Learning (IRL) from expert vid
 eo demonstrations. Existing Optimal Transport-based methods often ignore t
 emporal structure. To remedy this\, we propose a method that integrates co
 ntext embeddings and a masking mechanism to capture temporal order\, enabl
 ing policy learning from just two action-free videos.\n\nFinally\, we appl
 y DRL to ensemble learning for time series forecasting under non-stationar
 y conditions. By treating model combination as a reinforcement learning ta
 sk\, we design a system that dynamically adjusts model weights\, achieving
  strong performance even with limited training data.\n
DTSTART:20250829T170000Z
DTEND:20250829T190000Z
LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H
 3A 0E9\, 3480 rue University
SUMMARY:PhD defence of Yuwei Fu – Sample Efficient Reinforcement Learning: 
 Methods and Applications
URL:https://www.mcgill.ca/ece/channels/event/phd-defence-yuwei-fu-sample-ef
 ficient-reinforcement-learning-methods-and-applications-366417
END:VEVENT
END:VCALENDAR
