Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation with normalised observation and action space is improper for PPO_SB3 #53

Open
AvisP opened this issue Mar 1, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@AvisP
Copy link

AvisP commented Mar 1, 2023

System information

  • Grid2op version: 1.8.1
  • l2rpn-baselines version: 0.6.0.post1
  • System: osx
  • Baseline concerned: eg PPO_SB3
  • stable-baseline3 version 1.7.0

Bug description

After training with train script with normalize_obs=True and normalize_act=True, and then trying to use the trained agent for evaluation leads to incorrect results.

How to reproduce

The train script used

import re
import grid2op
from grid2op.Reward import LinesCapacityReward  # or any other rewards
from grid2op.Chronics import MultifolderWithCache  # highly recommended
from lightsim2grid import LightSimBackend  # highly recommended for training !
from l2rpn_baselines.PPO_SB3 import train

env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name,
                   reward_class=LinesCapacityReward,
                   backend=LightSimBackend(),
                   chronics_class=MultifolderWithCache)
env.chronics_handler.real_data.set_filter(lambda x: re.match(".*00$", x) is not None)
env.chronics_handler.real_data.reset()

try:
    trained_agent = train(
          env,
          iterations=10_000,  # any number of iterations you want
          logs_dir="./logs",  # where the tensorboard logs will be put
          save_path="./saved_model",  # where the NN weights will be saved
          name="test",  # name of the baseline
          net_arch=[100, 100, 100],  # architecture of the NN
          normalize_act=True,
          normalize_obs=True,
          )
finally:
    env.close()

Evaluation script

import grid2op
from grid2op.Reward import LinesCapacityReward  # or any other rewards
from lightsim2grid import LightSimBackend  # highly recommended !
from l2rpn_baselines.PPO_SB3 import evaluate

nb_episode = 7
nb_process = 1
verbose = True
env_name = "l2rpn_case14_sandbox"

env = grid2op.make(env_name,
                   reward_class=LinesCapacityReward,
                   backend=LightSimBackend()
                   )
try:
    evaluate(env,
            nb_episode=nb_episode,
            load_path="./saved_model",  # should be the same as what has been called in the train function !
            name="test",  # should be the same as what has been called in the train function !
            nb_process=1,
            verbose=verbose,
            )
    
    runner_params = env.get_params_for_runner()
    runner = Runner(**runner_params)
    res = runner.run(nb_episode=nb_episode,
                    nb_process=nb_process
                    )
    # Print summary
    if verbose:
        print("Evaluation summary for DN:")
        for _, chron_name, cum_reward, nb_time_step, max_ts in res:
            msg_tmp = "chronics at: {}".format(chron_name)
            msg_tmp += "\ttotal score: {:.6f}".format(cum_reward)
            msg_tmp += "\ttime steps: {:.0f}/{:.0f}".format(nb_time_step, max_ts)
            print(msg_tmp)
finally:
    env.close()

The results are very similar to Do Nothing agent, which does not happen if during training normalise_obs and normalise_act is set to False

Possible Solution

The issue is happening because of using load_path instead of my_path in the following two lines

https://github.com/rte-france/l2rpn-baselines/blob/c1e2d3616f38a532f327ee85eaa9c0338552ed72/l2rpn_baselines/PPO_SB3/evaluate.py#L178

https://github.com/rte-france/l2rpn-baselines/blob/c1e2d3616f38a532f327ee85eaa9c0338552ed72/l2rpn_baselines/PPO_SB3/evaluate.py#L186

Making this change resolved the issue for my case.

@AvisP AvisP added the bug Something isn't working label Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant