About markovian environments #30

shanlior · 2020-09-12T09:39:59Z

Hi,
thanks for the thorough implementation and making this code available, it really helps to understand the internal mechanisms of the SAC algorithm.

I have a question regarding the code in sac/sac/envs/gym_env.py -
At the file's header - you comment: " Rllab implementation with a HACK. See comment in GymEnv.init().", and then in the init() method, you write:

# HACK: Gets rid of the TimeLimit wrapper that sets 'done = True' when
# the time limit specified for each environment has been passed and
# therefore the environment is not Markovian (terminal condition depends
# on time rather than state).

I understand the point here, but I'm not sure I followed the implementation, as it seems to be an internal Gym code and is not found in the SAC code found in this repository.

Can you explain exactly what are you doing with the TimeLimit wrapper?
If you omit the done flag, do you still terminate the episode?

Specifically - in Gym's registration.py file the env class is wrapped with:

if env.spec.max_episode_steps is not None:
    from gym.wrappers.time_limit import TimeLimit
    env = TimeLimit(env, max_episode_steps=env.spec.max_episode_steps)

Furthermore, in the time_limit.py file -

def step(self, action):
    assert self._elapsed_steps is not None, "Cannot call env.step() before calling reset()"
    observation, reward, done, info = self.env.step(action)
    self._elapsed_steps += 1
     if self._elapsed_steps >= self._max_episode_steps:
         info['TimeLimit.truncated'] = not done
         done = True
     return observation, reward, done, info

If you omit these lines of code - how does the environment resets itself when the max_episode_steps flag is raised?

Thanks!

Lior

The text was updated successfully, but these errors were encountered:

haarnoja · 2020-09-13T12:05:49Z

Hi Lior,

The environment is reset explicitly by the sampler here: https://github.com/haarnoja/sac/blob/master/sac/misc/sampler.py#L133

I hope this answers your question!

Cheers
Tuomas

shanlior · 2020-09-24T11:25:53Z

Thanks for the reply!

So if I understand correctly, you do not bootstrap in case the path length exceeds the maximum allowed path length..

Best,

Lior

haarnoja · 2020-09-29T08:19:32Z

Hi, we do bootstrap when the path length exceeds the maximum length, because reaching the time limit does not mean that we enter a terminal state. We don't bootstrap if we reach any of the actual terminal states, for example if the humanoid falls to the ground.

Best
Tuomas

shanlior · 2020-09-29T08:38:17Z

Ohh, that what I thought. It makes sense, but I didn't find it in the code. Thanks for the advice! Lior ‫בתאריך יום ג׳, 29 בספט׳ 2020 ב-11:19 מאת ‪Tuomas Haarnoja‬‏ <‪ notifications@github.com‬‏>:‬

…

Hi, we do bootstrap when the path length exceeds the maximum length, because reaching the time limit does not mean that we enter a terminal state. We don't bootstrap if we reach any of the actual terminal states, for example if the humanoid falls to the ground. Best Tuomas — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACDJBQ3GNLZB444LW3GZ4L3SIGKCHANCNFSM4RJJPJ5A> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About markovian environments #30

About markovian environments #30

shanlior commented Sep 12, 2020 •

edited

Loading

haarnoja commented Sep 13, 2020

shanlior commented Sep 24, 2020

haarnoja commented Sep 29, 2020

shanlior commented Sep 29, 2020 via email

About markovian environments #30

About markovian environments #30

Comments

shanlior commented Sep 12, 2020 • edited Loading

haarnoja commented Sep 13, 2020

shanlior commented Sep 24, 2020

haarnoja commented Sep 29, 2020

shanlior commented Sep 29, 2020 via email

shanlior commented Sep 12, 2020 •

edited

Loading