Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.3: Upgrade RL Workflow; Add RL Benchmarks; Update Package Version #588

Merged
merged 589 commits into from
Mar 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
589 commits
Select commit Hold shift + click to select a range
2a3e90d
call policy update only for AbsCorePolicy
Jinyu-W Jun 1, 2021
2e925ab
add limitation of AbsCorePolicy in Actor.collect()
Jinyu-W Jun 2, 2021
e3710e3
modify the supply_chain example to use the new rl toolkit architecture
Jinyu-W Jun 2, 2021
03c26b8
refined actor to return only experiences for policies that received n…
Jun 2, 2021
aff0f44
fix MsgKey issue in rollout_manager
Jinyu-W Jun 2, 2021
fba2ccf
Merge branch 'v0.2_rl_refinement' into v0.2_sc_0506_updated
Jinyu-W Jun 2, 2021
8c05562
fix typo in learner
Jinyu-W Jun 2, 2021
ce155d4
Merge branch 'v0.2_rl_refinement' into v0.2_sc_0506_updated
Jinyu-W Jun 2, 2021
cc0c555
call exit function for parallel rollout manager
Jinyu-W Jun 3, 2021
74ede26
Merge branch 'v0.2_rl_refinement' into v0.2_sc_0506_updated
Jinyu-W Jun 3, 2021
034e5bb
update supply chain example distributed training scripts
Jinyu-W Jun 3, 2021
13d7d9b
1. moved exploration scheduling to rollout manager; 2. fixed bug in l…
Jun 3, 2021
5337aa9
fixed merge conflicts
Jun 3, 2021
6dbfd36
reformat render
Jinyu-W Jun 3, 2021
146eeb6
fix supply chain business engine action type problem
Jinyu-W Jun 3, 2021
a80f6c3
reset supply chain example render figsize from 4 to 3
Jinyu-W Jun 3, 2021
731bc59
Add render to all modes of supply chain example
Jinyu-W Jun 3, 2021
ebe5065
fix or policy typos
Jinyu-W Jun 3, 2021
0549a14
1. added parallel policy manager prototype; 2. used training ep for e…
Jun 4, 2021
023a9d3
refined parallel policy manager
Jun 9, 2021
5a57e01
updated rl/__init__/py
Jun 9, 2021
1f62f3a
fixed lint issues and CIM local learner bugs
Jun 9, 2021
6208d86
deleted unwanted supply_chain test files
Jun 9, 2021
11ca4be
revised default config for cim-dqn
Jun 9, 2021
36d4178
removed test_store.py as it is no longer needed
Jun 10, 2021
0fab08b
1. changed Actor class to rollout_worker function; 2. renamed algorit…
Jun 11, 2021
3b5faeb
updated figures
Jun 11, 2021
7911162
removed unwanted import
Jun 11, 2021
4f2182f
refactored CIM-DQN example
Jun 15, 2021
2b1541b
added MultiProcessRolloutManager and MultiProcessTrainingManager
Jun 16, 2021
6392fcf
updated doc
Jun 17, 2021
5089f7c
lint issue fix
Jun 18, 2021
41a7b27
lint issue fix
Jun 18, 2021
35cf25a
fixed import formatting
Jun 18, 2021
ceadf4f
[Feature] Prioritized Experience Replay (#355)
ysqyang Jun 18, 2021
248d1e4
rm AbsDecisionGenerator
Jun 18, 2021
721d91b
Merge branch 'v0.2_rl_refinement' of github.com:microsoft/maro into v…
Jun 18, 2021
85e304a
small fixes
Jun 18, 2021
2601970
bug fix
Jun 18, 2021
f72e884
reorganized training folder structure
Jun 20, 2021
4f4d5bb
fixed lint issues
Jun 20, 2021
96b9cce
fixed lint issues
Jun 20, 2021
78c225a
policy manager refined
Jun 21, 2021
9acae80
lint fix
Jun 21, 2021
424cabb
restructured CIM-dqn sync code
Jun 21, 2021
18f73f2
added policy version index and used it as a measure of experience sta…
Jun 22, 2021
49d93c2
lint issue fix
Jun 22, 2021
bc96c5e
lint issue fix
Jun 22, 2021
1bb4b56
switched log_dir and proxy_kwargs order
Jun 22, 2021
20c6385
cim example refinement
Jun 23, 2021
42c24ab
eval schedule sorted only when it's a list
Jun 28, 2021
8db90d5
eval schedule sorted only when it's a list
Jun 28, 2021
81f574a
update sc env wrapper
Jinyu-W Jun 28, 2021
5ad21e4
added docker scripts for cim-dqn
Jun 28, 2021
bb25e71
Merge branch 'master' into v0.2
Jinyu-W Jun 29, 2021
a56d4c2
refactored example folder structure and added workflow templates
Jun 29, 2021
2525327
fixed merge conflicts
Jun 29, 2021
f427b07
fixed lint issues
Jun 30, 2021
b8dc7e4
fixed lint issues
Jun 30, 2021
92a51da
fixed template bugs
Jun 30, 2021
31b68f3
removed unused imports
Jun 30, 2021
bab8128
refactoring sc in progress
Jun 30, 2021
f964924
simplified cim meta
Jun 30, 2021
f9ccf2a
updated sc code
Jun 30, 2021
5ad3e54
fixed build.sh path bug
Jun 30, 2021
916d8ad
refined sc and template code
Jun 30, 2021
06c1cd3
template refinement
Jun 30, 2021
c17557c
fixed merge conflicts
Jun 30, 2021
ff76caa
deleted obsolete svgs
Jul 1, 2021
4842d16
merged with remote
Jul 1, 2021
35e55a7
updated learner logs
Jul 1, 2021
ae1e93f
minor edits
Jul 1, 2021
04c53e6
refactored templates for easy merge with async PR
Jul 1, 2021
1315f04
added component names for rollout manager and policy manager
Jul 1, 2021
de40647
fixed incorrect position to add last episode to eval schedule
Jul 1, 2021
360240f
added max_lag option in templates
Jul 1, 2021
315e85f
formatting edit in docker_compose_yml script
Jul 1, 2021
953c873
moved local learner and early stopper outside sync_tools
Jul 1, 2021
ed9d44a
refactored rl toolkit folder structure
Jul 1, 2021
d2b433e
refactored rl toolkit folder structure
Jul 2, 2021
9f799d4
moved env_wrapper and agent_wrapper inside rl/learner
Jul 2, 2021
f8cccca
refined scripts
Jul 2, 2021
a4491a7
modified sc imports according to changes in rl toolkit folder structure
Jul 2, 2021
0906577
fixed typo in script
Jul 2, 2021
a13322b
changes needed for running sc
Jul 2, 2021
8ec0282
removed unwanted imports
Jul 2, 2021
56af26a
Merge branch 'v0.2_rl_refinement' into v0.2_rl_refinement_sc
Jul 2, 2021
894c376
config change for testing sc scenario
Jul 2, 2021
743e9f3
changes for perf testing
Jul 4, 2021
8e97adc
Asynchronous Training (#364)
ysqyang Jul 5, 2021
fac6006
renamed sync to synchronous and async to asynchronous to avoid confli…
Jul 5, 2021
0004dfe
fixed merge conflicts
Jul 5, 2021
60a7423
added missing policy version increment in LocalPolicyManager
Jul 5, 2021
c004697
Merge remote-tracking branch 'origin/v0.2_rl_refinement' into v0.2_rl…
Jul 5, 2021
a163554
refined rollout manager recv logic
Jul 5, 2021
803faad
removed a debugging print
Jul 5, 2021
0c10f36
moved supply_chain inside examples/rl
Jul 5, 2021
34b47a5
added sleep in distributed launcher to avoid hanging
Jul 6, 2021
c41ca35
updated api doc and rl toolkit doc
Jul 7, 2021
a2244b5
refined dynamic imports using importlib
Jul 7, 2021
edf9df4
Merge branch 'master' into v0.2
Jul 8, 2021
740efa7
1. moved policy update triggers to policy manager; 2. added version c…
Jul 8, 2021
c278693
fixed a few bugs and updated cim RL example
Jul 8, 2021
455751a
fixed a few more bugs
Jul 8, 2021
ef50957
resolved merge conflicts
Jul 8, 2021
9a04a99
Merge remote-tracking branch 'origin/v0.2' into v0.2_rl_refinement
Jul 9, 2021
746f0f9
added agent wrapper instantiation to workflows
Jul 9, 2021
18cd676
added agent wrapper instantiation to workflows
Jul 9, 2021
c5cf9df
removed abs_block and added max_prob option for DiscretePolicyNet and…
Jul 9, 2021
1f3b590
fixed incorrect get_ac_policy signature for CIM
Jul 9, 2021
dd017d3
moved exploration inside core policy
Jul 9, 2021
98d0961
added state to exploration call to support context-dependent exploration
Jul 11, 2021
17f1655
updated sc example according to RL toolkit changes
Jul 11, 2021
bbb6ba7
separated non_rl_policy_index and rl_policy_index in workflows
Jul 11, 2021
c70105d
Merge branch 'v0.2_rl_refinement' into v0.2_rl_refinement_sc
Jul 11, 2021
f004fba
modified sc example code according to workflow changes
Jul 11, 2021
2be9114
modified sc example code according to workflow changes
Jul 11, 2021
9b04ad5
added replay_agent_ids parameter to get_env_func for RL examples
Jul 12, 2021
c004323
Merge branch 'v0.2_rl_refinement' into v0.2_rl_refinement_sc
Jul 12, 2021
700b149
fixed a few bugs
Jul 12, 2021
b9afaef
added maro/simulator/scenarios/supply_chain as bind mount
Jul 12, 2021
87066c9
added post-step, post-collect, post-eval and post-update callbacks
Jul 14, 2021
f0a29ef
fixed lint issues
Jul 14, 2021
56fd2d6
fixed lint issues
Jul 14, 2021
cb533fa
fixed some bugs
Jul 15, 2021
d2d66cd
moved instantiation of policy manager inside simple learner
Jul 15, 2021
513ca40
Merge branch 'v0.2_rl_refinement' into v0.2_rl_refinement_sc
Jul 15, 2021
07fba7a
fixed env_wrapper get_reward signature
Jul 15, 2021
a9e6b11
minor edits
Jul 15, 2021
1d5c242
Merge branch 'v0.2_rl_refinement' into v0.2_rl_refinement_sc
Jul 15, 2021
8a84b11
removed get_eperience kwargs from env_wrapper
Jul 15, 2021
2cc0f7b
Merge branch 'v0.2_rl_refinement' into v0.2_rl_refinement_sc
Jul 15, 2021
ec338fb
1. renamed step_callback to post_step in env_wrapper; 2. added get_ev…
Jul 15, 2021
8f00dc7
Merge branch 'v0.2_rl_refinement' into v0.2_rl_refinement_sc
Jul 15, 2021
1c94b62
added rollout exp disribution option in RL examples
Jul 15, 2021
2b04cc0
fixed merge conflicts
Jul 15, 2021
36092c2
Merge branch 'v0.2_sc_0506_updated' into v0.2_sc
lihuoran Jul 16, 2021
b4f5afa
Merge branch 'v0.2_sc' into v0.2_rl_refinement_sc
lihuoran Jul 16, 2021
4252a0c
removed unwanted files
Jul 16, 2021
c2c1c62
1. made logger internal in learner; 2 removed logger creation in abs …
Jul 16, 2021
be82de3
fixed merge conflicts
Jul 16, 2021
0a05184
checked out supply chain test files from v0.2_sc
Jul 16, 2021
c7bca77
1. added missing model.eval() to choose_action; 2.added entropy featu…
Jul 19, 2021
81245dd
fixed a bug in ac entropy
Jul 19, 2021
e56e9b8
abbreviated coefficient to coeff
Jul 19, 2021
072d9de
removed -dqn from job name in rl example config
Jul 22, 2021
103eb40
added tmp patch to dev.df
Jul 22, 2021
9c5e135
renamed image name for running rl examples
Jul 22, 2021
d96aa44
added get_loss interface for core policies
Jul 28, 2021
1f37369
added policy manager in rl_toolkit.rst
Jul 30, 2021
12ac058
1. env_wrapper bug fix; 2. policy manager update logic refinement
Jul 30, 2021
fc14e66
refactored policy and algorithms
Aug 3, 2021
7702eba
policy interface redesigned
Aug 5, 2021
704c17f
refined policy interfaces
Aug 8, 2021
56a54cb
fixed typo
Aug 8, 2021
0b57d70
fixed bugs in refactored policy interface
Aug 9, 2021
cad2872
fixed some bugs
Aug 9, 2021
3ba96d4
refactoring in progress
Aug 11, 2021
5f6c47c
policy interface and policy manager redesigned
Aug 17, 2021
cb8a355
1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts
Aug 18, 2021
f0222a7
fixed bug in distributed policy manager
Aug 18, 2021
c0a8480
fixed lint issues
Aug 18, 2021
3a10544
fixed lint issues
Aug 18, 2021
026bcd3
added scipy in setup
Aug 18, 2021
00df5d8
1. trimmed rollout manager code; 2. added option to docker scripts
Aug 19, 2021
8619408
updated api doc for policy manager
Aug 20, 2021
ca7b0d9
1. simplified rl/learning code structure; 2. fixed bugs in rl example…
Aug 23, 2021
aefd3b5
1. simplified rl example structure; 2. fixed lint issues
Aug 23, 2021
db99ce2
further rl toolkit code simplifications
Aug 25, 2021
b3a244d
more numpy-based optimization in RL toolkit
Aug 26, 2021
505cf4e
moved replay buffer inside policy
Aug 27, 2021
af1eed6
bug fixes
Aug 27, 2021
e924495
numpy optimization and associated refactoring
Aug 29, 2021
7c407a4
extracted shaping logic out of env_sampler
Aug 31, 2021
07a051b
fixed bug in CIM shaping and lint issues
Aug 31, 2021
6a027fa
preliminary implemetation of parallel batch inference
Sep 1, 2021
fde7895
fixed bug in ddpg transition recording
Sep 2, 2021
b9010ef
put get_state, get_env_actions, get_reward back in EnvSampler
Sep 2, 2021
aa69409
simplified exploration and core model interfaces
Sep 5, 2021
2dbf3c3
bug fixes and doc update
Sep 6, 2021
f136e3c
added improve() interface for RLPolicy for single-thread support
Sep 6, 2021
92561f6
fixed simple policy manager bug
Sep 7, 2021
013d0fb
updated doc, rst, notebook
Sep 11, 2021
8f652b4
updated notebook
Sep 11, 2021
8dd708f
fixed lint issues
Sep 11, 2021
971fd04
fixed entropy bugs in ac.py
Sep 12, 2021
bf3cadb
reverted to simple policy manager as default
Sep 12, 2021
e89b6db
1. unified single-thread and distributed mode in learning_loop.py; 2.…
Sep 14, 2021
3738bd1
fixed lint issues and updated rl toolkit images
Sep 14, 2021
69c5a56
removed obsolete images
Sep 15, 2021
372d44c
Merge branch 'v0.2_rl_refinement' of github.com:microsoft/maro into v…
Sep 15, 2021
9030200
added back agent2policy for general workflow use
Sep 15, 2021
f2dd5c0
V0.2 rl refinement dist (#377)
buptchan Sep 16, 2021
1a70410
Merge branch 'v0.2_rl_refinement' of github.com:microsoft/maro into v…
Sep 17, 2021
5f70b65
added checkpointing for simple and multi-process policy managers
Sep 17, 2021
7b76dce
1. bug fixes in checkpointing; 2. removed version and max_lag in roll…
Sep 17, 2021
c1d9871
added missing set_state and get_state for CIM policies
Sep 17, 2021
fc59379
removed blank line
Sep 17, 2021
f23fdc6
updated RL workflow README
Sep 22, 2021
78a2cb8
Integrate `data_parallel` arguments into `worker_allocator` (#402)
buptchan Sep 22, 2021
f8f2e6a
1. simplified workflow config; 2. added comments to CIM shaping
Sep 22, 2021
0b5fcd1
lint issue fix
Sep 22, 2021
190802f
1. added algorithm type setting in CIM config; 2. added try-except cl…
Sep 22, 2021
6e941a4
1. moved post_step callback inside env sampler; 2. updated README for…
Sep 24, 2021
1edd4c4
refined READEME for CIM
Sep 24, 2021
2b4d4eb
VM scheduling with RL (#375)
ysqyang Sep 26, 2021
3a928b9
SC refinement (#397)
lihuoran Sep 26, 2021
ea7fdde
refined workflow scripts
Oct 9, 2021
c1f8faf
fixed bug in ParallelAgentWrapper
Oct 9, 2021
cf1430a
1. fixed lint issues; 2. refined main script in workflows
Oct 10, 2021
485ffd7
lint issue fix
Oct 10, 2021
4e1d37c
restored default config for rl example
Oct 10, 2021
5b21e67
Update rollout.py
ysqyang Oct 10, 2021
868bd53
refined env var processing in policy manager workflow
Oct 11, 2021
12ffd98
added hasattr check in agent wrapper
Oct 12, 2021
c0bae0b
updated docker_compose_yml.py
Oct 12, 2021
a5ddfd5
Minor refinement
lihuoran Oct 13, 2021
0f2f83e
Merge branch 'v0.2_rl_refinement' into v0.3
lihuoran Oct 14, 2021
6a1179c
Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)
lihuoran Dec 6, 2021
ff0f706
Merge latest master into v0.3 (#426)
lihuoran Dec 8, 2021
8a25f9e
Change `Env.set_seed()` logic (#456)
lihuoran Jan 24, 2022
526627c
Remove all SC related files (#473)
lihuoran Mar 4, 2022
696f5b5
RL Toolkit V3 (#471)
lihuoran Mar 7, 2022
7b3d78a
RL renaming v2 (#476)
lihuoran Mar 9, 2022
00fbcee
Cherry pick latest RL (#498)
lihuoran Mar 31, 2022
0e11ae9
Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`)…
lihuoran Apr 22, 2022
1219513
RL incremental refactor (#501)
lihuoran Apr 24, 2022
333986f
RL component bundle (#513)
lihuoran May 10, 2022
ae83ac0
Add method to get mapping of available tick to frame index (#415)
chaosddp May 16, 2022
10b9c02
Cherry pick from sc_refinement (#527)
lihuoran May 18, 2022
0d132cc
Refine `terminal` / `next_agent_state` logic (#531)
lihuoran May 25, 2022
a3dade7
Merge master into v0.3 (#536)
lihuoran May 31, 2022
3f74eb2
Merge master into v0.3 (#545)
lihuoran Jun 9, 2022
569d7b1
Merge branch 'master' into v0.3
Jinyu-W Jun 10, 2022
278a881
Merge branch 'master' into v0.3
Jinyu-W Jun 14, 2022
ed951f0
Update requirements. (#552)
lihuoran Jun 23, 2022
3e7a43b
Done (#554)
lihuoran Aug 26, 2022
8022cd9
Update requirements in example and notebook (#553)
lihuoran Aug 26, 2022
9fd91ff
Refine decision event logic (#559)
lihuoran Aug 31, 2022
135d2fc
Refine rl component bundle (#549)
lihuoran Dec 27, 2022
a783b57
merge master into v0.3
Dec 27, 2022
a4e3168
:Merge branch 'master' into v0.3
Dec 27, 2022
eb6324c
Remove numpy data type (#571)
lihuoran Jan 11, 2023
214383f
RL benchmark on GYM (#575)
lihuoran Feb 6, 2023
b8a955e
Refine RL workflow & tune RL models under GYM (#577)
lihuoran Feb 17, 2023
f42d5b7
DDPG parameters update (#583)
Jinyu-W Feb 22, 2023
d859a4b
Update RL Benchmarks (#584)
Jinyu-W Mar 20, 2023
c6ed5c9
Merge branch 'master' into v0.3
Jinyu-W Mar 20, 2023
71157f8
Update Input Template of RL Policy to Improve Module Flexisiblity (#589)
Jinyu-W Mar 29, 2023
a5e1f57
update code version to 0.3.2a1
Mar 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions examples/cim/rl/algorithms/ac.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
actor_net_conf = {
"hidden_dims": [256, 128, 64],
"activation": torch.nn.Tanh,
"output_activation": torch.nn.Tanh,
"softmax": True,
"batch_norm": False,
"head": True,
Expand All @@ -19,6 +20,7 @@
"hidden_dims": [256, 128, 64],
"output_dim": 1,
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": True,
"head": True,
Expand Down
1 change: 1 addition & 0 deletions examples/cim/rl/algorithms/dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
q_net_conf = {
"hidden_dims": [256, 128, 64, 32],
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": True,
"skip_connection": False,
Expand Down
2 changes: 2 additions & 0 deletions examples/cim/rl/algorithms/maddpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
actor_net_conf = {
"hidden_dims": [256, 128, 64],
"activation": torch.nn.Tanh,
"output_activation": torch.nn.Tanh,
"softmax": True,
"batch_norm": False,
"head": True,
Expand All @@ -22,6 +23,7 @@
"hidden_dims": [256, 128, 64],
"output_dim": 1,
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": True,
"head": True,
Expand Down
26 changes: 20 additions & 6 deletions examples/cim/rl/env_sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,25 @@ def post_collect(self, info_list: list, ep: int) -> None:
for info in info_list:
print(f"env summary (episode {ep}): {info['env_metric']}")

# print the average env metric
if len(info_list) > 1:
metric_keys, num_envs = info_list[0]["env_metric"].keys(), len(info_list)
avg_metric = {key: sum(info["env_metric"][key] for info in info_list) / num_envs for key in metric_keys}
print(f"average env summary (episode {ep}): {avg_metric}")
# average env metric
metric_keys, num_envs = info_list[0]["env_metric"].keys(), len(info_list)
avg_metric = {key: sum(info["env_metric"][key] for info in info_list) / num_envs for key in metric_keys}
print(f"average env summary (episode {ep}): {avg_metric}")

self.metrics.update(avg_metric)
self.metrics = {k: v for k, v in self.metrics.items() if not k.startswith("val/")}

def post_evaluate(self, info_list: list, ep: int) -> None:
self.post_collect(info_list, ep)
# print the env metric from each rollout worker
for info in info_list:
print(f"env summary (episode {ep}): {info['env_metric']}")

# average env metric
metric_keys, num_envs = info_list[0]["env_metric"].keys(), len(info_list)
avg_metric = {key: sum(info["env_metric"][key] for info in info_list) / num_envs for key in metric_keys}
print(f"average env summary (episode {ep}): {avg_metric}")

self.metrics.update({"val/" + k: v for k, v in avg_metric.items()})

def monitor_metrics(self) -> float:
return -self.metrics["val/container_shortage"]
2 changes: 1 addition & 1 deletion examples/cim/rl/rl_component_bundle.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

# Environments
learn_env = Env(**env_conf)
test_env = learn_env
test_env = Env(**env_conf)

# Agent, policy, and trainers
num_agents = len(learn_env.agent_idx_list)
Expand Down
2 changes: 1 addition & 1 deletion examples/rl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This folder contains scenarios that employ reinforcement learning. MARO's RL too
The entrance of a RL workflow is a YAML config file. For readers' convenience, we call this config file `config.yml` in the rest part of this doc. `config.yml` specifies the path of all necessary resources, definitions, and configurations to run the job. MARO provides a comprehensive template of the config file with detailed explanations (`maro/maro/rl/workflows/config/template.yml`). Meanwhile, MARO also provides several simple examples of `config.yml` under the current folder.

There are two ways to start the RL job:
- If you only need to have a quick look and try to start an out-of-box workflow, just run `python .\examples\rl\run_rl_example.py PATH_TO_CONFIG_YAML`. For example, `python .\examples\rl\run_rl_example.py .\examples\rl\cim.yml` will run the complete example RL training workflow of CIM scenario. If you only want to run the evaluation workflow, you could start the job with `--evaluate_only`.
- If you only need to have a quick look and try to start an out-of-box workflow, just run `python .\examples\rl\run.py PATH_TO_CONFIG_YAML`. For example, `python .\examples\rl\run.py .\examples\rl\cim.yml` will run the complete example RL training workflow of CIM scenario. If you only want to run the evaluation workflow, you could start the job with `--evaluate_only`.
- (**Require install MARO from source**) You could also start the job through MARO CLI. Use the command `maro local run [-c] path/to/your/config` to run in containerized (with `-c`) or non-containerized (without `-c`) environments. Similar, you could add `--evaluate_only` if you only need to run the evaluation workflow.

## Create Your Own Scenarios
Expand Down
9 changes: 5 additions & 4 deletions examples/rl/cim.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,17 @@
# Please refer to `maro/rl/workflows/config/template.yml` for the complete template and detailed explanations.

# Run this workflow by executing one of the following commands:
# - python .\examples\rl\run_rl_example.py .\examples\rl\cim.yml
# - (Requires installing MARO from source) maro local run .\examples\rl\cim.yml
# - python ./examples/rl/run.py ./examples/rl/cim.yml
# - (Requires installing MARO from source) maro local run ./examples/rl/cim.yml

job: cim_rl_workflow
scenario_path: "examples/cim/rl"
log_path: "log/rl_job/cim.txt"
log_path: "log/cim_rl/"
main:
num_episodes: 30 # Number of episodes to run. Each episode is one cycle of roll-out and training.
num_steps: null
eval_schedule: 5
early_stop_patience: 5
logging:
stdout: INFO
file: DEBUG
Expand All @@ -27,7 +28,7 @@ training:
load_path: null
load_episode: null
checkpointing:
path: "checkpoint/rl_job/cim"
path: "log/cim_rl/checkpoints"
interval: 5
logging:
stdout: INFO
Expand Down
10 changes: 5 additions & 5 deletions examples/rl/cim_distributed.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

# Example RL config file for CIM scenario.
# Example RL config file for CIM scenario (distributed version).
# Please refer to `maro/rl/workflows/config/template.yml` for the complete template and detailed explanations.

# Run this workflow by executing one of the following commands:
# - python .\examples\rl\run_rl_example.py .\examples\rl\cim.yml
# - (Requires installing MARO from source) maro local run .\examples\rl\cim.yml
# - python ./examples/rl/run.py ./examples/rl/cim_distributed.yml
# - (Requires installing MARO from source) maro local run ./examples/rl/cim_distributed.yml

job: cim_rl_workflow
scenario_path: "examples/cim/rl"
log_path: "log/rl_job/cim.txt"
log_path: "log/cim_rl/"
main:
num_episodes: 30 # Number of episodes to run. Each episode is one cycle of roll-out and training.
num_steps: null
Expand All @@ -35,7 +35,7 @@ training:
load_path: null
load_episode: null
checkpointing:
path: "checkpoint/rl_job/cim"
path: "log/cim_rl/checkpoints"
interval: 5
proxy:
host: "127.0.0.1"
Expand Down
File renamed without changes.
8 changes: 4 additions & 4 deletions examples/rl/vm_scheduling.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@
# Please refer to `maro/rl/workflows/config/template.yml` for the complete template and detailed explanations.

# Run this workflow by executing one of the following commands:
# - python .\examples\rl\run_rl_example.py .\examples\rl\vm_scheduling.yml
# - (Requires installing MARO from source) maro local run .\examples\rl\vm_scheduling.yml
# - python ./examples/rl/run.py ./examples/rl/vm_scheduling.yml
# - (Requires installing MARO from source) maro local run ./examples/rl/vm_scheduling.yml

job: vm_scheduling_rl_workflow
scenario_path: "examples/vm_scheduling/rl"
log_path: "log/rl_job/vm_scheduling.txt"
log_path: "log/vm_rl/"
main:
num_episodes: 30 # Number of episodes to run. Each episode is one cycle of roll-out and training.
num_steps: null
Expand All @@ -27,7 +27,7 @@ training:
load_path: null
load_episode: null
checkpointing:
path: "checkpoint/rl_job/vm_scheduling"
path: "log/vm_rl/checkpoints"
interval: 5
logging:
stdout: INFO
Expand Down
2 changes: 2 additions & 0 deletions examples/vm_scheduling/rl/algorithms/ac.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
actor_net_conf = {
"hidden_dims": [64, 32, 32],
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": True,
"batch_norm": False,
"head": True,
Expand All @@ -19,6 +20,7 @@
critic_net_conf = {
"hidden_dims": [256, 128, 64],
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": False,
"head": True,
Expand Down
1 change: 1 addition & 0 deletions examples/vm_scheduling/rl/algorithms/dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
q_net_conf = {
"hidden_dims": [64, 128, 256],
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": False,
"skip_connection": False,
Expand Down
2 changes: 1 addition & 1 deletion maro/__misc__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
# Licensed under the MIT license.


__version__ = "0.3.1a2"
__version__ = "0.3.2a1"

__data_version__ = "0.2"
5 changes: 2 additions & 3 deletions maro/cli/data_pipeline/citi_bike.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
from enum import Enum

import geopy.distance
import numpy as np
import pandas as pd
from yaml import safe_load

Expand Down Expand Up @@ -320,7 +319,7 @@ def _process_distance(self, station_info: pd.DataFrame):
0,
index=station_info["station_index"],
columns=station_info["station_index"],
dtype=np.float,
dtype=float,
)
look_up_df = station_info[["latitude", "longitude"]]
return distance_adj.apply(
Expand Down Expand Up @@ -617,7 +616,7 @@ def _gen_distance(self, station_init: pd.DataFrame):
0,
index=station_init["station_index"],
columns=station_init["station_index"],
dtype=np.float,
dtype=float,
)
look_up_df = station_init[["latitude", "longitude"]]
distance_df = distance_adj.apply(
Expand Down
5 changes: 3 additions & 2 deletions maro/cli/local/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def get_redis_conn(port=None):


# Functions executed on CLI commands
def run(conf_path: str, containerize: bool = False, evaluate_only: bool = False, **kwargs):
def run(conf_path: str, containerize: bool = False, seed: int = None, evaluate_only: bool = False, **kwargs):
# Load job configuration file
parser = ConfigParser(conf_path)
if containerize:
Expand All @@ -71,13 +71,14 @@ def run(conf_path: str, containerize: bool = False, evaluate_only: bool = False,
LOCAL_MARO_ROOT,
DOCKERFILE_PATH,
DOCKER_IMAGE_NAME,
seed=seed,
evaluate_only=evaluate_only,
)
except KeyboardInterrupt:
stop_rl_job_with_docker_compose(parser.config["job"], LOCAL_MARO_ROOT)
else:
try:
start_rl_job(parser, LOCAL_MARO_ROOT, evaluate_only=evaluate_only)
start_rl_job(parser, LOCAL_MARO_ROOT, seed=seed, evaluate_only=evaluate_only)
except KeyboardInterrupt:
sys.exit(1)

Expand Down
12 changes: 9 additions & 3 deletions maro/cli/local/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import os
import subprocess
from copy import deepcopy
from typing import List
from typing import List, Optional

import docker
import yaml
Expand Down Expand Up @@ -110,12 +110,15 @@ def exec(cmd: str, env: dict, debug: bool = False) -> subprocess.Popen:
def start_rl_job(
parser: ConfigParser,
maro_root: str,
seed: Optional[int],
evaluate_only: bool,
background: bool = False,
) -> List[subprocess.Popen]:
procs = [
exec(
f"python {script}" + ("" if not evaluate_only else " --evaluate_only"),
f"python {script}"
+ ("" if not evaluate_only else " --evaluate_only")
+ ("" if seed is None else f" --seed {seed}"),
format_env_vars({**env, "PYTHONPATH": maro_root}, mode="proc"),
debug=not background,
)
Expand Down Expand Up @@ -169,6 +172,7 @@ def start_rl_job_with_docker_compose(
context: str,
dockerfile_path: str,
image_name: str,
seed: Optional[int],
evaluate_only: bool,
) -> None:
common_spec = {
Expand All @@ -185,7 +189,9 @@ def start_rl_job_with_docker_compose(
**deepcopy(common_spec),
**{
"container_name": component,
"command": f"python3 {script}" + ("" if not evaluate_only else " --evaluate_only"),
"command": f"python3 {script}"
+ ("" if not evaluate_only else " --evaluate_only")
+ ("" if seed is None else f" --seed {seed}"),
"environment": format_env_vars(env, mode="docker-compose"),
},
}
Expand Down
8 changes: 7 additions & 1 deletion maro/rl/model/abs_net.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from __future__ import annotations

from abc import ABCMeta
from typing import Any, Dict
from typing import Any, Dict, Optional

import torch.nn
from torch.optim import Optimizer
Expand All @@ -18,6 +18,8 @@ class AbsNet(torch.nn.Module, metaclass=ABCMeta):
def __init__(self) -> None:
super(AbsNet, self).__init__()

self._device: Optional[torch.device] = None

@property
def optim(self) -> Optimizer:
optim = getattr(self, "_optim", None)
Expand Down Expand Up @@ -119,3 +121,7 @@ def unfreeze_all_parameters(self) -> None:
"""Unfreeze all parameters."""
for p in self.parameters():
p.requires_grad = True

def to_device(self, device: torch.device) -> None:
self._device = device
self.to(device)
17 changes: 13 additions & 4 deletions maro/rl/model/algorithm_nets/ac_based.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,23 @@ class ContinuousACBasedNet(ContinuousPolicyNet, metaclass=ABCMeta):
- set_state(self, net_state: dict) -> None:
"""

def _get_actions_impl(self, states: torch.Tensor, exploring: bool) -> torch.Tensor:
actions, _ = self._get_actions_with_logps_impl(states, exploring)
def _get_actions_impl(self, states: torch.Tensor, exploring: bool, **kwargs) -> torch.Tensor:
actions, _ = self._get_actions_with_logps_impl(states, exploring, **kwargs)
return actions

def _get_actions_with_probs_impl(self, states: torch.Tensor, exploring: bool) -> Tuple[torch.Tensor, torch.Tensor]:
def _get_actions_with_probs_impl(
self,
states: torch.Tensor,
exploring: bool,
**kwargs,
) -> Tuple[torch.Tensor, torch.Tensor]:
# Not used in Actor-Critic or PPO
pass

def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor) -> torch.Tensor:
def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in Actor-Critic or PPO
pass

def _get_random_actions_impl(self, states: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in Actor-Critic or PPO
pass
22 changes: 18 additions & 4 deletions maro/rl/model/algorithm_nets/ddpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,32 @@ class ContinuousDDPGNet(ContinuousPolicyNet, metaclass=ABCMeta):
- set_state(self, net_state: dict) -> None:
"""

def _get_actions_with_probs_impl(self, states: torch.Tensor, exploring: bool) -> Tuple[torch.Tensor, torch.Tensor]:
def _get_actions_with_probs_impl(
self,
states: torch.Tensor,
exploring: bool,
**kwargs,
) -> Tuple[torch.Tensor, torch.Tensor]:
# Not used in DDPG
pass

def _get_actions_with_logps_impl(self, states: torch.Tensor, exploring: bool) -> Tuple[torch.Tensor, torch.Tensor]:
def _get_actions_with_logps_impl(
self,
states: torch.Tensor,
exploring: bool,
**kwargs,
) -> Tuple[torch.Tensor, torch.Tensor]:
# Not used in DDPG
pass

def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor) -> torch.Tensor:
def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in DDPG
pass

def _get_states_actions_logps_impl(self, states: torch.Tensor, actions: torch.Tensor) -> torch.Tensor:
def _get_states_actions_logps_impl(self, states: torch.Tensor, actions: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in DDPG
pass

def _get_random_actions_impl(self, states: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in DDPG
pass
Loading