-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Support copying optuna params dict for all hyperparameters #121
Comments
well also some params that are searched cannot be copied too. |
[related question] Transfer hyperparameters from optuna For learning purposes I am tuning a number of algorithms for environment 'MountanCar-v0'. At the moment I am interested in PPO. I intend to share tuned hyperparameters working putting them on your repo. I try to understand the working with some depth of a variety of algorithms hands-on. SB3 and zoo are great tools to get hands-on. I execute as indicated: Output:
Then one nice result is:
The environment is solved at -110 reward, following literature. When passing these hyperparameters to the algorithm it does not work (remains at -200). I do not exactly understand why. envm = make_vec_env("MountainCar-v0", n_envs=16)
policy_kwargs = dict(activation_fn=th.nn.ReLU, net_arch=[dict(pi=[254,254], vf=[254,254])])
model = PPO("MlpPolicy", envm, verbose=1, batch_size=256, n_steps=2048, gamma=0.9999, learning_rate=0.00043216809397908225, ent_coef= 5.844122887301502e-07, clip_range=0.2, n_epochs=10, gae_lambda=0.92, max_grad_norm=2 ,vf_coef= 0.035882158772375855, policy_kwargs=policy_kwargs )
model.learn(total_timesteps=1000000)
model.save("ppo_mountaincar") As I read it in the docs, I would say it is supposed to work like that, am I wrong? Should I take something else into account? |
You are missing the normalization wrapper: Note that results may also depends on the random seed (cf. doc and issue #151 ) |
Thank you! |
Right now, only hyperparmeters that are searched by default can have their params dict be copied and reused due to naming issues. This should be extended to hyperparameters that are not searched by default, per discussion in issue #115.
The text was updated successfully, but these errors were encountered: