Skip to content

Reward Function

Kartikay Garg edited this page Mar 4, 2018 · 2 revisions

Reward Function

Let,

  • l(ti) be the amount of long currency,
  • s(ti) be the amount of short currency and
  • p(ti) be the price of the currency at time instant ti.

unrealized PnL

At any timestamp, the reward given to the agent is the actual value its portfolio. It is defined by,
equation

  • non zero intermediate rewards allow the agent to converge to a trading strategy in less number of iterations
  • however, frequent intermediate rewards are often noisy and tend to destabilize the trading process

exponentially weighted unrealized PnL

equation
where ri is the unrealized PnL reward at ri time instant, ω is suitable parameter and k is the number of lag terms in the exponential weighted average

  • balances out the two extremes as described above
  • serves dual objectives in guiding policy learning,
    1. it provides the agent intermediate rewards to facilitate fast learning of the trading strategy
    2. weighted average over past rewards tends to reduce the noise in the rather frequent rewards
Clone this wiki locally