-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Possible resolution for random seed setting and non-deterministic training #904
Comments
Hi, |
Thanks @hojae-io for digging deeper into this issue. This is a really great find and I may have an explanation. From what I suspect, at initialization, we have some random events happening. For instance, the terrain generation (a lot of random sampling there), initialization of PhysX solver and internal buffers (not sure). It would make sense that setting the seed before ensures that the randomness from these sources are limited when the seed is fixed. Definitely makes sense to fix this issue. Instead of modifying the play/train scripts, we should set the seed as the first operation when the class is constructed. We can add |
I'm having the same issue and looking forward to a contribution to fix it ASAP! |
Based on the suggestion here, I have made the fixes in #940. At least from the unit test, where I do some fixed number of env steps, the obtained obs and rewards are the same. Though this checks this within the same process. I ran the training for anymal locomotion and it looks promising: ./isaaclab.sh -p source/standalone/workflows/rsl_rl/train.py --task Isaac-Velocity-Rough-Anymal-C-v0 --headless --run_name seed_fix
|
Describe the bug
It has been observed that the training results, such as the reward curve, are not the same even if you manually set a random seed (for instance, seed=42).
Several similar issues have been submitted:
#489
#275
Steps to reproduce
Try running some code like : IsaacLab/source/standalone/workflows/rsl_rl/train.py
Or check the above issue to reproduce the problem.
System Info
Describe the characteristic of your environment:
Resolution
Here's my resolution, and now I don't have any non-deterministic / stochastic behavior, and the reward curve "exactly" overlap if I train the same code multiple times.
The problem is coming from "setting the seed after the environment is created"
You can see the seed is being set at line 118 which is after the env is created at line 90
But now, if you set the seed before the env is created (like line 92 - 102 in the image below), all the behavior becomes deterministic.
I don't know why "the time at which you set the seed" is important and it "could" cause non-deterministic behavior.
It would be nice if you could provide explanation and reflect this bug into your next pull request.
Checklist
Acceptance Criteria
The text was updated successfully, but these errors were encountered: