Gt=rt+γrt+1+γ2rt+2+r3γrt+3+...G_t = r_t+\gamma r_{t+1}+\gamma^2r_{t+2}+r^3\gamma r_{t+3}+...Gt?=rt?+γrt+1?+γ2rt+2?+r3γrt+3?+... in MDP M under policy π\piπ
Define Gi,t=ri,t+γri,t+1+γ2ri,t+2+...+γTi?1ri,τiG_{i,t}=r_{i,t}+\gamma r_{i, t+1} + \gamma^2r_{i, t+2}+...+\gamma^{\Tau_i-1}r_{i,\tau_i}Gi,t?=ri,t?+γri,t+1?+γ2ri,t+2?+...+γTi??1ri,τi?? as return from time step t onwards in ith episode
For each state s visited in episode i
For first time t that state s is visited in episode i
Increment counter of total first visits: N(s)=N(s)+1N(s) = N(s)+1N(s)=N(s)+1
Increment total return G(s)=G(s)+Gi,tG(s)=G(s)+G_{i,t}G(s)=G(s)+Gi,t?
Define Gi,t=ri,t+γri,t+1+γ2ri,t+2+...+γTi?1ri,τiG_{i,t}=r_{i,t}+\gamma r_{i, t+1} + \gamma^2r_{i, t+2}+...+\gamma^{\Tau_i-1}r_{i,\tau_i}Gi,t?=ri,t?+γri,t+1?+γ2ri,t+2?+...+γTi??1ri,τi?? as return from time step t onwards in ith episode
For each state s visited in episode i
For every time t that state s is visited in episode i
Increment counter of total first visits: N(s)=N(s)+1N(s) = N(s)+1N(s)=N(s)+1
Increment total return G(s)=G(s)+Gi,tG(s) = G(s) + G_{i,t}G(s)=G(s)+Gi,t?
Define Gi,t=ri,t+γri,t+1+γ2ri,t+2+...+γTi?1ri,τiG_{i,t}=r_{i,t}+\gamma r_{i, t+1} + \gamma^2r_{i, t+2}+...+\gamma^{\Tau_i-1}r_{i,\tau_i}Gi,t?=ri,t?+γri,t+1?+γ2ri,t+2?+...+γTi??1ri,τi?? as return from time step t onwards in ith episode
For state s visited at time step t in episode i
Increment counter of total first visits: N(s)=N(s)+1N(s) = N(s)+1N(s)=N(s)+1
Define Gi,t=ri,t+γri,t+1+γ2ri,t+2+...+γTi?1ri,τiG_{i,t}=r_{i,t}+\gamma r_{i, t+1} + \gamma^2r_{i, t+2}+...+\gamma^{\Tau_i-1}r_{i,\tau_i}Gi,t?=ri,t?+γri,t+1?+γ2ri,t+2?+...+γTi??1ri,τi?? as return from time step t onwards in ith episode
For state s visited at time step t in episode i
代码语言:txt
复制
- For state s is visited at time step t in episode i
- Increment counter of total first visits: N(s)=N(s)+1N(s) = N(s)+1N(s)=N(s)+1
- Update estimate Vπ(s)=Vπ(s)+α(Gi,t?Vπ(s))V^\pi(s)=V^\pi(s)+\alpha(G_{i,t}-V^\pi(s))Vπ(s)=Vπ(s)+α(Gi,t??Vπ(s))α=1N(s)\alpha=\frac{1}{N(s)}α=N(s)1?时,和every-visit MC算法等同