Training ends shortly after entering play #5999

popcron · 2023-10-19T21:42:18Z

Describe the bug
When running mlagents-learn (with and without --force) and entering play, it exits almost immediately after and prints Debug.Log calls about 33 times

To Reproduce
Steps to reproduce the behavior:

Start training with the command and press play
Observe it close, with traceback errors in console output

Console logs / stack traces

PS C:\repos\the big game\Saw and UFO> mlagents-learn --force
[W ..\torch\csrc\utils\tensor_numpy.cpp:77] Warning: Failed to initialize NumPy: module compiled against API version 0x10 but this version of numpy is 0xe (function operator ())

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖╬│││││┘  ╬│││││╬╖
 ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
 ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
 ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
 ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
 ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
 ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
 ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
   ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
      ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
          ╙╬╬╬╣╣╣╜
             ╙

 Version information:
  ml-agents: 0.30.0,
  ml-agents-envs: 0.30.0,
  Communicator API: 1.5.0,
  PyTorch: 1.13.1+cpu
[W ..\torch\csrc\utils\tensor_numpy.cpp:77] Warning: Failed to initialize NumPy: module compiled against API version 0x10 but this version of numpy is 0xe (function operator ())
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 3.0.0-exp.1 and communication version 1.5.0
[INFO] Connected new brain: Cell?team=0
[WARNING] Behavior name Cell does not match any behaviors specified in the trainer configuration file. A default configuration will be used.
[WARNING] Deleting TensorBoard data events.out.tfevents.1697750225.pop.26916.0 that was left over from a previous run.
[INFO] Hyperparameters for behavior name Cell:
        trainer_type:   ppo
        hyperparameters:
          batch_size:   1024
          buffer_size:  10240
          learning_rate:        0.0003
          beta: 0.005
          epsilon:      0.2
          lambd:        0.95
          num_epoch:    3
          shared_critic:        False
          learning_rate_schedule:       linear
          beta_schedule:        linear
          epsilon_schedule:     linear
        network_settings:
          normalize:    False
          hidden_units: 128
          num_layers:   2
          vis_encode_type:      simple
          memory:       None
          goal_conditioning_type:       hyper
          deterministic:        False
        reward_signals:
          extrinsic:
            gamma:      0.99
            strength:   1.0
            network_settings:
              normalize:        False
              hidden_units:     128
              num_layers:       2
              vis_encode_type:  simple
              memory:   None
              goal_conditioning_type:   hyper
              deterministic:    False
        init_path:      None
        keep_checkpoints:       5
        checkpoint_interval:    500000
        max_steps:      500000
        time_horizon:   64
        summary_freq:   50000
        threaded:       False
        self_play:      None
        behavioral_cloning:     None
[INFO] Exported results\ppo\Cell\Cell-0.onnx
[INFO] Copied results\ppo\Cell\Cell-0.onnx to results\ppo\Cell.onnx.
Traceback (most recent call last):
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\learn.py", line 264, in main
    run_cli(parse_command_line())
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\learn.py", line 260, in run_cli
    run_training(run_seed, options, num_areas)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\learn.py", line 136, in run_training
    tc.start_learning(env_manager)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\trainer_controller.py", line 175, in start_learning
    n_steps = self.advance(env_manager)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\trainer_controller.py", line 233, in advance
    new_step_infos = env_manager.get_steps()
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\env_manager.py", line 124, in get_steps
    new_step_infos = self._step()
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 408, in _step
    self._queue_steps()
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 302, in _queue_steps
    env_action_info = self._take_step(env_worker.previous_step)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 543, in _take_step
    all_action_info[brain_name] = self.policies[brain_name].get_action(
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 130, in get_action
    run_out = self.evaluate(decision_requests, global_agent_ids)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 93, in evaluate
    masks = self._extract_masks(decision_requests)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 77, in _extract_masks
    mask = torch.as_tensor(
RuntimeError: Could not infer dtype of numpy.int32

Environment (please complete the following information):

Unity 2023.3.0a10
Windows 11, Torch 1.13.1+cpu, Python 3.10.0, numpy 1.12.1
Package source for mlagents and mlagents.extensions are from develop branch (as upm references)

The text was updated successfully, but these errors were encountered:

popcron · 2023-10-19T21:43:49Z

this is also technically separate, but had to pip install protobuf==3.20.1 as well to remove some of the issues mentioned here), im not sure if this actually stopped it from running but any mlagents-learn command would've thrown errors otherwise this was ok

The packages installed werent from the repo when i tested originally, installing the two packages in the correct order instead also results the same

PatrickM92 · 2023-10-25T04:18:29Z

I'm getting this same issue after following the install instructions here

I just started a new venv and ran only these commands:

pip3 install torch~=1.13.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install ./ml-agents-envs
pip install ./ml-agents
mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun

Ran the 3DBall Test from the into docs and it errors out after I hit play in unity with "[W ..\torch\csrc\utils\tensor_numpy.cpp:77] Warning: Failed to initialize NumPy: module compiled against API version 0x10 but this version of numpy is 0xe (function operator ())"

torch: 1.13.1+cu117
numpy: 1.21.2
mlagents: 1.0.0 (pulled from release21)
mlagents-envs: 1.0.0 (pulled from release21)
Unity 2022.3.4f1

xyz2022 · 2023-10-28T05:23:05Z

Solution here: #6002
TLDR:
numpy 1.21.x and 1.22.x aren't compatible with the install guide you followed.
numpy 1.23.1 works fine. You need to edit ./ml-agents-dev/setup.py and ./ml-agents/setup.py

OP, python 3.10.0 I didn't test, but I anticipate it may be a problem. In addition to updating numpy, you may need to update python version. My test included python 3.10.13 (from conda)
The final binary version of python 3.10 is https://www.python.org/downloads/release/python-31011/
however it also might be a problem (the guide says version 3.10.12+ is required).

miguelalonsojr · 2023-10-29T11:15:05Z

Installing directly from the develop branch of the repo should resolve this issue and was taken care of with this PR: #5997

popcron added the bug Issue describes a potential bug in ml-agents. label Oct 19, 2023

miguelalonsojr closed this as completed Oct 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training ends shortly after entering play #5999

Training ends shortly after entering play #5999

popcron commented Oct 19, 2023

popcron commented Oct 19, 2023 •

edited

Loading

PatrickM92 commented Oct 25, 2023

xyz2022 commented Oct 28, 2023

miguelalonsojr commented Oct 29, 2023

Training ends shortly after entering play #5999

Training ends shortly after entering play #5999

Comments

popcron commented Oct 19, 2023

popcron commented Oct 19, 2023 • edited Loading

PatrickM92 commented Oct 25, 2023

xyz2022 commented Oct 28, 2023

miguelalonsojr commented Oct 29, 2023

popcron commented Oct 19, 2023 •

edited

Loading