Predicting Rewards at Every Time Scale

Roshan Shariff, University of Alberta


In reinforcement learning, future rewards are often discounted: we prefer rewards we receive immediately rather than those far in the future. The rate of discounting imposes a "time scale" on our reward valuation and is incorporated into the learned value functions. In this talk, I discuss how learning value functions with several different discount factors allows us to reason about the detailed temporal structure of future rewards.