You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After training with train script with normalize_obs=True and normalize_act=True, and then trying to use the trained agent for evaluation leads to incorrect results.
How to reproduce
The train script used
import re
import grid2op
from grid2op.Reward import LinesCapacityReward # or any other rewards
from grid2op.Chronics import MultifolderWithCache # highly recommended
from lightsim2grid import LightSimBackend # highly recommended for training !
from l2rpn_baselines.PPO_SB3 import train
env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name,
reward_class=LinesCapacityReward,
backend=LightSimBackend(),
chronics_class=MultifolderWithCache)
env.chronics_handler.real_data.set_filter(lambda x: re.match(".*00$", x) is not None)
env.chronics_handler.real_data.reset()
try:
trained_agent = train(
env,
iterations=10_000, # any number of iterations you want
logs_dir="./logs", # where the tensorboard logs will be put
save_path="./saved_model", # where the NN weights will be saved
name="test", # name of the baseline
net_arch=[100, 100, 100], # architecture of the NN
normalize_act=True,
normalize_obs=True,
)
finally:
env.close()
Evaluation script
import grid2op
from grid2op.Reward import LinesCapacityReward # or any other rewards
from lightsim2grid import LightSimBackend # highly recommended !
from l2rpn_baselines.PPO_SB3 import evaluate
nb_episode = 7
nb_process = 1
verbose = True
env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name,
reward_class=LinesCapacityReward,
backend=LightSimBackend()
)
try:
evaluate(env,
nb_episode=nb_episode,
load_path="./saved_model", # should be the same as what has been called in the train function !
name="test", # should be the same as what has been called in the train function !
nb_process=1,
verbose=verbose,
)
runner_params = env.get_params_for_runner()
runner = Runner(**runner_params)
res = runner.run(nb_episode=nb_episode,
nb_process=nb_process
)
# Print summary
if verbose:
print("Evaluation summary for DN:")
for _, chron_name, cum_reward, nb_time_step, max_ts in res:
msg_tmp = "chronics at: {}".format(chron_name)
msg_tmp += "\ttotal score: {:.6f}".format(cum_reward)
msg_tmp += "\ttime steps: {:.0f}/{:.0f}".format(nb_time_step, max_ts)
print(msg_tmp)
finally:
env.close()
The results are very similar to Do Nothing agent, which does not happen if during training normalise_obs and normalise_act is set to False
Possible Solution
The issue is happening because of using load_path instead of my_path in the following two lines
System information
1.8.1
0.6.0.post1
osx
PPO_SB3
1.7.0
Bug description
After training with train script with normalize_obs=True and normalize_act=True, and then trying to use the trained agent for evaluation leads to incorrect results.
How to reproduce
The train script used
Evaluation script
The results are very similar to Do Nothing agent, which does not happen if during training normalise_obs and normalise_act is set to False
Possible Solution
The issue is happening because of using
load_path
instead ofmy_path
in the following two lineshttps://github.com/rte-france/l2rpn-baselines/blob/c1e2d3616f38a532f327ee85eaa9c0338552ed72/l2rpn_baselines/PPO_SB3/evaluate.py#L178
https://github.com/rte-france/l2rpn-baselines/blob/c1e2d3616f38a532f327ee85eaa9c0338552ed72/l2rpn_baselines/PPO_SB3/evaluate.py#L186
Making this change resolved the issue for my case.
The text was updated successfully, but these errors were encountered: