Pure random when estimating V(s), but epsilon_greedy when estimating Q(s,a)? #26
-
I confused about when using pure random when estimating V(s) at But using epsilon_greedy when estimating Q(s,a) at Is there any reason NOT using epsilon_greedy to estimate V(s)? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @QuantHao, Good to hear from you again! |
Beta Was this translation helpful? Give feedback.
-
Converting this to a Q&A Discussion on RL to organize for the community's benefit. |
Beta Was this translation helpful? Give feedback.
Hi @QuantHao, Good to hear from you again!
There's no reason not to use the
epsilon_greedy
policy to estimate .You can estimate the state-value function for the
epsilon_greedy
policy and even compare it with the state-value function of therandom
policy. Similarly for the action-value function.