Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorboard files not saving when using SubprocVecEnv #1205

Closed
5 tasks done
atapley opened this issue Dec 7, 2022 · 19 comments
Closed
5 tasks done

Tensorboard files not saving when using SubprocVecEnv #1205

atapley opened this issue Dec 7, 2022 · 19 comments
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested

Comments

@atapley
Copy link

atapley commented Dec 7, 2022

🐛 Bug

When I train my model with a normal Monitor wrapped env I get the output tensorboard files as expected, but when I use a SubprocVecEnv with multiple parallel environments nothing seems to get logged to the tensorboard file. Is this expected when using SubprocVecEnv given the multiple environments?

Code example

My custom environment is a wrapper for an external modeler, so I can't provide a code sample that would be able to run below. I make a few changes to the DQN algorithm (adding an intrinsic curiosity module) but the tensorboard files save fine when running in a solo environment so I don't think that would be the issue. I added the reset and step methods below in case they help but they likely won't make sense without the necessary modeler.

import gym
import numpy as np

from stable_baselines3 import DQN
from stable_baselines3.common.env_checker import check_env


class CustomEnv(gym.Env):

  def __init__(
        self,
        simulation: Simulation,
        movements: List[str],
        interactions: List[str],
        attributes: List[str],
        normalized_attributes: List[str],
        deterministic: bool = False
    ) -> None:
        self.simulation = simulation
        self.movements = copy.deepcopy(movements)
        self.interactions = copy.deepcopy(interactions)
        self.attributes = attributes
        self.normalized_attributes = normalized_attributes
        self.deterministic = deterministic
        

        # ------------------

        if not set(self.normalized_attributes).issubset(self.attributes):
            raise AssertionError(
                f"All normalized attributes ({str(self.normalized_attributes)}) must be "
                f"in attributes ({str(self.attributes)})!"
            )

        # ------------------
        self.sim_agent_id = len(movements) + len(interactions) + len(interactions) + 2
        sim_attributes = self.simulation.get_attribute_data()
        sim_actions = self.simulation.get_actions()

        if not set(self.interactions).issubset(list(sim_actions.keys())):
            raise AssertionError(
                f"All interactions ({str(self.interactions)}) must be "
                f"in the simulator's actions ({str(list(sim_actions.keys()))})!"
            )
            
        self.interactions.insert(0, "none")
        self.movements.insert(0, "none")

        self._separate_sim_nonsim(sim_attributes)
        self.harness_to_sim, self.sim_to_harness = self._sim_harness_conv(sim_actions)
        self.min_maxes = self._get_min_maxes()

        # ------------------

        channel_lows = np.array(
            [[[self.min_maxes[channel]["min"]]] for channel in self.attributes]
        )
        channel_highs = np.array(
            [[[self.min_maxes[channel]["max"]]] for channel in self.attributes]
        )

        self.low = np.repeat(
            np.repeat(channel_lows, self.simulation.config.area.screen_size, axis=1),
            self.simulation.config.area.screen_size,
            axis=2,
        )

        self.high = np.repeat(
            np.repeat(channel_highs, self.simulation.config.area.screen_size, axis=1),
            self.simulation.config.area.screen_size,
            axis=2,
        )

        self.observation_space = gym.spaces.Box(
            np.float32(self.low),
            np.float32(self.high),
            shape=(
                len(self.attributes),
                self.simulation.config.area.screen_size,
                self.simulation.config.area.screen_size,
            ),
            dtype=np.float64,
        )

        action_shape = len(self.movements) * len(self.interactions)
        self.action_space = gym.spaces.Discrete(action_shape)

  def reset(self):
    self.num_burned = 0
    if not self.deterministic:
            fire_init_seed = self.simulation.get_seeds()["fire_initial_position"]
            elevation_seed = self.simulation.get_seeds()["elevation"]
            seed_dict = {"fire_initial_position": fire_init_seed + 1,
                         "elevation": elevation_seed + 1}
            self.simulation.set_seeds(seed_dict)
        
        self.simulation.reset()
        sim_observations = self._select_from_dict(
            self.simulation.get_attribute_data(), self.sim_attributes
        )
        nonsim_observations = self._select_from_dict(
            self.get_nonsim_attribute_data(), self.nonsim_attributes
        )

        if len(nonsim_observations) != len(self.nonsim_attributes):
            raise AssertionError(
                f"Data for {len(nonsim_observations)} nonsim attributes were given but "
                f"there are {len(self.nonsim_attributes)} nonsim attributes."
            )

        observations = self._normalize_obs({**sim_observations, **nonsim_observations})

        obs = []
        for attribute in self.attributes:
            obs.append(observations[attribute])

        self.state = np.stack(obs, axis=0).astype(np.float64)

        output = self.state

        point = [self.agent_pos[1], self.agent_pos[0], 0]
        self.simulation.update_agent_positions([point])
        self.num_agent_steps = 0
        return output

  def step(self, action):
        movement = (action % len(self.movements))
        movement_str = self.movements[movement]
        
        interaction = int(action / len(self.movements))
        interaction_str = self.interactions[interaction]
        
        reward = 0.0

        pos_placeholder = self.agent_pos.copy()
        screen_size = self.simulation.config.area.screen_size

        if movement_str == "none":
            pass
        elif movement_str == "up" and not self.agent_pos[0] == 0:
            pos_placeholder[0] -= 1
        elif movement_str == "down" and not self.agent_pos[0] == screen_size - 1:
            pos_placeholder[0] += 1
        elif movement_str == "left" and not self.agent_pos[1] == 0:
            pos_placeholder[1] -= 1
        elif movement_str == "right" and not self.agent_pos[1] == screen_size - 1:
            pos_placeholder[1] += 1
        else:
            pass

        self.agent_pos = pos_placeholder

        fire_map_idx = self.attributes.index("fire_map")
        is_empty = self.state[fire_map_idx][self.agent_pos[0]][self.agent_pos[1]] == 0

        if is_empty and not interaction_str == 'none':
            sim_interaction = self.harness_to_sim[interaction]
            mitigation_update = (self.agent_pos[1], self.agent_pos[0], sim_interaction)
            self.simulation.update_mitigation([mitigation_update])

        point = [self.agent_pos[1], self.agent_pos[0], 0]
        self.simulation.update_agent_positions([point])

        if self.num_agent_steps % self.agent_speed == 0:
            sim_fire_map, sim_active = self.simulation.run(1)
            fire_map = np.copy(sim_fire_map)
            fire_map[self.agent_pos[0]][self.agent_pos[1]] = self.sim_agent_id
            reward += self._calculate_reward(fire_map)
        else:
            sim_active = True
            sim_fire_map = self.simulation.fire_map
            fire_map = np.copy(sim_fire_map)
            fire_map[self.agent_pos[0]][self.agent_pos[1]] = self.sim_agent_id

        self.state[fire_map_idx] = fire_map

        if not sim_active:
            reward += 10

        self.num_agent_steps += 1
        return self.state, reward, not sim_active, {}

env = CustomEnv()
check_env(env)

model = DQN("CNNPolicy", env, verbose=1).learn(1000)

Relevant log output / Error message

No response

System Info

OS: Linux-5.4.0-80-generic-x86_64-with-glibc2.27 #90~18.04.1-Ubuntu SMP Tue Jul 13 19:40:02 UTC 2021
Python: 3.9.15
Stable-Baselines3: 1.6.2
PyTorch: 1.12.1+cu102
GPU Enabled: False
Numpy: 1.22.4
Gym: 0.21.0

({'OS': 'Linux-5.4.0-80-generic-x86_64-with-glibc2.27 #9018.04.1-Ubuntu SMP Tue Jul 13 19:40:02 UTC 2021', 'Python': '3.9.15', 'Stable-Baselines3': '1.6.2', 'PyTorch': '1.12.1+cu102', 'GPU Enabled': 'False', 'Numpy': '1.22.4', 'Gym': '0.21.0'}, 'OS: Linux-5.4.0-80-generic-x86_64-with-glibc2.27 #9018.04.1-Ubuntu SMP Tue Jul 13 19:40:02 UTC 2021\nPython: 3.9.15\nStable-Baselines3: 1.6.2\nPyTorch: 1.12.1+cu102\nGPU Enabled: False\nNumpy: 1.22.4\nGym: 0.21.0\n')

Checklist

  • I have checked that there is no similar issue in the repo
  • I have read the documentation
  • I have provided a minimal working example to reproduce the bug
  • I have checked my env using the env checker
  • I've used the markdown code blocks for both code and stack traces.
@atapley atapley added custom gym env Issue related to Custom Gym Env question Further information is requested labels Dec 7, 2022
@atapley

This comment was marked as off-topic.

@qgallouedec
Copy link
Collaborator

model = DQN("CNNPolicy", env, verbose=1).learn(1000)

Reading the code you give, I don't see any instruction for tensorboard, nor any use of the SubprocVecEnv. So I have trouble understanding your problem. Can you rephrase it?

The following works well for me:

# test_1205.py
from stable_baselines3 import DQN
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv

if __name__ == "__main__":
    venv = make_vec_env("CartPole-v1", vec_env_cls=SubprocVecEnv, n_envs=2)
    model = DQN("MlpPolicy", venv, tensorboard_log="./tensorboard")
    model.learn(1000)
$ python test_1205.py
$ ls tensorboard
DQN_1

@atapley
Copy link
Author

atapley commented Dec 7, 2022

Ah, forgot to change that part. The instantiation of the SubprocVecEnv is here

env = make_vec_env(<ENV>, env_kwargs=<ENV_KWARGS>, n_envs=2, seed=0, vec_env_cls=SubprocVecEnv)

There is nothing for tensorboard because I don't directly do anything with it. It should be taken care of by the stable-baselines3 code within the DQN_ICM's parent class's (OffPolicyAlgorithm) learn method I believe. The tensorboard stuff is all internal to stable-baselines, which is why I'm confused about why it isn't working within the subprocenv.

If I use the below env I am able to get the correct tensorboard.

env = Monitor(gym.make(<ENV>, <ENV_KWARGS>))

Similar to you, the tensorboard file itself does get saved. However, when I run Tensorboard there are no metrics that are saved to the file somehow.

image

@qgallouedec
Copy link
Collaborator

Another thing I can't figure out: does the problem occur only with your custom environment, or also with the other standard environments?

@atapley
Copy link
Author

atapley commented Dec 7, 2022

Just ran the sample code you commented and it appears to work as expected - I can see the tensorboard metrics within the file. So this appears to be an issue with just the CustomEnv, but I have not made any changes to the tensorboard code and the issue only appears when using SubprocVecEnv - I don't have the issue with the single env case.

One thing that might be relevant is that after training finishes, my code hangs and does not close out when using the SubprocVecEnv - kind of looks like a process or thread doesn't close properly. When I run through the debugger I don't have that issue surprisingly. Maybe since it hangs it doesn't save to tensorboard properly? Although it saves the files without issues so I don't know if that would be it

@qgallouedec
Copy link
Collaborator

qgallouedec commented Dec 7, 2022

So if I sum up simply, your problem is that this:

from stable_baselines3 import DQN
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv

class CustomEnv(gym.Env): ...

if __name__ == "__main__":
    venv = make_vec_env(CustomEnv, vec_env_cls=SubprocVecEnv, n_envs=2)
    model = DQN("MlpPolicy", venv, tensorboard_log="./tensorboard")
    model.learn(1000)

doesn't output anything on your tensorboard, right?

@atapley
Copy link
Author

atapley commented Dec 7, 2022

Yes, more or less. The above does not output anything to tensorboard but

from stable_baselines3 import DQN
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv

class CustomEnv(gym.Env): ...

if __name__ == "__main__":
    model = DQN("MlpPolicy", CustomEnv, tensorboard_log="./tensorboard")
    model.learn(1000)

does.

(Only difference is I use my modified DQN_ICM algorithm in both cases instead of base DQN)

@qgallouedec
Copy link
Collaborator

    model = DQN("MlpPolicy", CustomEnv, tensorboard_log="./tensorboard")

Ok, then the problem does not come from SubprocVecEnv since you do not use it. Correct?

@atapley
Copy link
Author

atapley commented Dec 7, 2022

Incorrect, I'm just saying that it does work when I do

model = DQN("MlpPolicy", CustomEnv, tensorboard_log="./tensorboard")

but does not when I do

venv = make_vec_env(CustomEnv, vec_env_cls=SubprocVecEnv, n_envs=2)
model = DQN("MlpPolicy", venv, tensorboard_log="./tensorboard")

And the SubprocVecEnv is what I am currently trying to work with

@qgallouedec
Copy link
Collaborator

Incorrect, I'm just saying that it does work when I do

Sorry I read the opposite.

(Only difference is I use my modified DQN_ICM algorithm in both cases instead of base DQN)

Does using DQN instead of DQN_ICM solve the problem?

@atapley
Copy link
Author

atapley commented Dec 7, 2022

Just gave it a try and looks like normal DQN does not work either

@qgallouedec
Copy link
Collaborator

Looking quickly at your code, I don't see anything that could explain this. In order for us to help you, we need to be able to reproduce the "error", so you need to provide a minimal code that allows it.
From what we've just discussed, it should look just like the one in #1205 (comment)

@atapley
Copy link
Author

atapley commented Dec 7, 2022

Okay, let me try and minimize the current code for an MVP and link the open-source modeler used in the environment.

@qgallouedec
Copy link
Collaborator

link the open-source modeler used in the environment.

Make sure that the modeler is required for the bug. Most likely it is not.

@atapley
Copy link
Author

atapley commented Dec 7, 2022

Got an MVP working and surprisingly am able to view the tensorboard metrics in the MVP but not the full codebase. Guess that means it's something with the full codebase that's causing the issue - this helps me narrow it down at least! I'll keep adding until it breaks.

@qgallouedec
Copy link
Collaborator

I advise you to do the opposite: keep removing until the problem disappears. Otherwise you won't get a minimal example.

@atapley
Copy link
Author

atapley commented Dec 8, 2022

Was able to get it working! Switched up the logger, added a Monitor to the vec_env, and added some callbacks. Not sure which was the winner, but the files have logging data in them now. Seems like it wasn't an issue with the environment or stable-baselines, just some things were missing. Thanks for helping out with this! Closing the issue.

@atapley atapley closed this as completed Dec 8, 2022
@qgallouedec
Copy link
Collaborator

qgallouedec commented Dec 8, 2022

Was able to get it working!

That's good to hear.

Not sure which was the winner,

Unfortunately, no one else will benefit from it. If you ever figure out what was missing, please post it here.

@atapley
Copy link
Author

atapley commented Dec 8, 2022

Looks like it came down to the EvalCallback - removing the EvalCallback results in a tensorboard file with no data within it. Looks like nothing is getting logged within the tensorboard file unless the callback gets called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants