Pure random when estimating V(s), but epsilon_greedy when estimating Q(s,a)? #26

QuantHao · 2021-05-24T02:22:49Z

QuantHao
May 24, 2021

Hi, @praveen-palanisamy

I confused about when using pure random when estimating V(s) at

Tensorflow-2-Reinforcement-Learning-Cookbook/Chapter02/4_monte_carlo_prediction_and_control_rl.py

Line 26 in 101f986

action = env.action_space.sample() # random policy

But using epsilon_greedy when estimating Q(s,a) at

Tensorflow-2-Reinforcement-Learning-Cookbook/Chapter02/4_monte_carlo_prediction_and_control_rl.py

Lines 77 to 78 in 101f986

    
           probs = epsilon_greedy_policy(action_values) 
        
           action = np.random.choice(np.arange(4), p=probs)  # random policy

Is there any reason NOT using epsilon_greedy to estimate V(s)? Thanks.

Answered by praveen-palanisamy

May 24, 2021

Hi @QuantHao, Good to hear from you again!
There's no reason not to use the epsilon_greedy policy to estimate .
You can estimate the state-value function for the epsilon_greedy policy and even compare it with the state-value function of the random policy. Similarly for the action-value function.

View full answer

praveen-palanisamy · 2021-05-24T02:57:11Z

praveen-palanisamy
May 24, 2021
Maintainer

Hi @QuantHao, Good to hear from you again!
There's no reason not to use the epsilon_greedy policy to estimate $V_{\pi}(s)$ .
You can estimate the state-value function for the epsilon_greedy policy and even compare it with the state-value function of the random policy. Similarly for the action-value function.

2 replies

QuantHao May 24, 2021
Author

Thanks so much for answering.

Your book is so far the BEST book teaching about using TF2 in RL from scratch!! Really appreciate your work.

praveen-palanisamy May 24, 2021
Maintainer

That's nice to hear and thank your for your appreciation!
Please continue to learn, build and create!
Also, report the bugs you find :). It's great to see your attention to the details and the spot-on issues you have created on this repo. 👏

praveen-palanisamy · 2021-05-24T03:02:26Z

praveen-palanisamy
May 24, 2021
Maintainer

Converting this to a Q&A Discussion on RL to organize for the community's benefit.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pure random when estimating V(s), but epsilon_greedy when estimating Q(s,a)? #26

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Pure random when estimating V(s), but epsilon_greedy when estimating Q(s,a)? #26

QuantHao May 24, 2021

Replies: 2 comments · 2 replies

praveen-palanisamy May 24, 2021 Maintainer

QuantHao May 24, 2021 Author

praveen-palanisamy May 24, 2021 Maintainer

praveen-palanisamy May 24, 2021 Maintainer

QuantHao
May 24, 2021

Replies: 2 comments 2 replies

praveen-palanisamy
May 24, 2021
Maintainer

QuantHao May 24, 2021
Author

praveen-palanisamy May 24, 2021
Maintainer

praveen-palanisamy
May 24, 2021
Maintainer