Evaluation with normalised observation and action space is improper for PPO_SB3 #53

AvisP · 2023-03-01T16:52:25Z

System information

Grid2op version: 1.8.1
l2rpn-baselines version: 0.6.0.post1
System: osx
Baseline concerned: eg PPO_SB3
stable-baseline3 version 1.7.0

Bug description

After training with train script with normalize_obs=True and normalize_act=True, and then trying to use the trained agent for evaluation leads to incorrect results.

How to reproduce

The train script used

import re
import grid2op
from grid2op.Reward import LinesCapacityReward  # or any other rewards
from grid2op.Chronics import MultifolderWithCache  # highly recommended
from lightsim2grid import LightSimBackend  # highly recommended for training !
from l2rpn_baselines.PPO_SB3 import train

env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name,
                   reward_class=LinesCapacityReward,
                   backend=LightSimBackend(),
                   chronics_class=MultifolderWithCache)
env.chronics_handler.real_data.set_filter(lambda x: re.match(".*00$", x) is not None)
env.chronics_handler.real_data.reset()

try:
    trained_agent = train(
          env,
          iterations=10_000,  # any number of iterations you want
          logs_dir="./logs",  # where the tensorboard logs will be put
          save_path="./saved_model",  # where the NN weights will be saved
          name="test",  # name of the baseline
          net_arch=[100, 100, 100],  # architecture of the NN
          normalize_act=True,
          normalize_obs=True,
          )
finally:
    env.close()

Evaluation script

import grid2op
from grid2op.Reward import LinesCapacityReward  # or any other rewards
from lightsim2grid import LightSimBackend  # highly recommended !
from l2rpn_baselines.PPO_SB3 import evaluate

nb_episode = 7
nb_process = 1
verbose = True
env_name = "l2rpn_case14_sandbox"

env = grid2op.make(env_name,
                   reward_class=LinesCapacityReward,
                   backend=LightSimBackend()
                   )
try:
    evaluate(env,
            nb_episode=nb_episode,
            load_path="./saved_model",  # should be the same as what has been called in the train function !
            name="test",  # should be the same as what has been called in the train function !
            nb_process=1,
            verbose=verbose,
            )
    
    runner_params = env.get_params_for_runner()
    runner = Runner(**runner_params)
    res = runner.run(nb_episode=nb_episode,
                    nb_process=nb_process
                    )
    # Print summary
    if verbose:
        print("Evaluation summary for DN:")
        for _, chron_name, cum_reward, nb_time_step, max_ts in res:
            msg_tmp = "chronics at: {}".format(chron_name)
            msg_tmp += "\ttotal score: {:.6f}".format(cum_reward)
            msg_tmp += "\ttime steps: {:.0f}/{:.0f}".format(nb_time_step, max_ts)
            print(msg_tmp)
finally:
    env.close()

The results are very similar to Do Nothing agent, which does not happen if during training normalise_obs and normalise_act is set to False

Possible Solution

The issue is happening because of using load_path instead of my_path in the following two lines

https://github.com/rte-france/l2rpn-baselines/blob/c1e2d3616f38a532f327ee85eaa9c0338552ed72/l2rpn_baselines/PPO_SB3/evaluate.py#L178

https://github.com/rte-france/l2rpn-baselines/blob/c1e2d3616f38a532f327ee85eaa9c0338552ed72/l2rpn_baselines/PPO_SB3/evaluate.py#L186

Making this change resolved the issue for my case.

The text was updated successfully, but these errors were encountered:

AvisP added the bug Something isn't working label Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation with normalised observation and action space is improper for PPO_SB3 #53

Evaluation with normalised observation and action space is improper for PPO_SB3 #53

AvisP commented Mar 1, 2023 •

edited

Loading

Evaluation with normalised observation and action space is improper for PPO_SB3 #53

Evaluation with normalised observation and action space is improper for PPO_SB3 #53

Comments

AvisP commented Mar 1, 2023 • edited Loading

System information

Bug description

How to reproduce

Possible Solution

AvisP commented Mar 1, 2023 •

edited

Loading