-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About markovian environments #30
Comments
Hi Lior, The environment is reset explicitly by the sampler here: https://github.com/haarnoja/sac/blob/master/sac/misc/sampler.py#L133 I hope this answers your question! Cheers |
Thanks for the reply! So if I understand correctly, you do not bootstrap in case the path length exceeds the maximum allowed path length.. Best, Lior |
Hi, we do bootstrap when the path length exceeds the maximum length, because reaching the time limit does not mean that we enter a terminal state. We don't bootstrap if we reach any of the actual terminal states, for example if the humanoid falls to the ground. Best |
Ohh, that what I thought. It makes sense, but I didn't find it in the code.
Thanks for the advice!
Lior
בתאריך יום ג׳, 29 בספט׳ 2020 ב-11:19 מאת Tuomas Haarnoja <
notifications@github.com>:
… Hi, we do bootstrap when the path length exceeds the maximum length,
because reaching the time limit does not mean that we enter a terminal
state. We don't bootstrap if we reach any of the actual terminal states,
for example if the humanoid falls to the ground.
Best
Tuomas
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#30 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACDJBQ3GNLZB444LW3GZ4L3SIGKCHANCNFSM4RJJPJ5A>
.
|
Hi,
thanks for the thorough implementation and making this code available, it really helps to understand the internal mechanisms of the SAC algorithm.
I have a question regarding the code in sac/sac/envs/gym_env.py -
At the file's header - you comment: " Rllab implementation with a HACK. See comment in GymEnv.init().", and then in the init() method, you write:
I understand the point here, but I'm not sure I followed the implementation, as it seems to be an internal Gym code and is not found in the SAC code found in this repository.
Can you explain exactly what are you doing with the TimeLimit wrapper?
If you omit the done flag, do you still terminate the episode?
Specifically - in Gym's registration.py file the env class is wrapped with:
Furthermore, in the time_limit.py file -
If you omit these lines of code - how does the environment resets itself when the max_episode_steps flag is raised?
Thanks!
Lior
The text was updated successfully, but these errors were encountered: