-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs for qlib.rl #1322
Merged
Merged
Add docs for qlib.rl #1322
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
ddbbcc6
Add docs for qlib.rl
lwwang1995 9dd860c
Update docs for qlib.rl
lwwang1995 c7b68b1
Add homepage introduct to RL framework
you-n-g d13262f
Update index Link
you-n-g 8cacc96
Fix Icon
you-n-g c3577e1
typo
you-n-g b139e7d
Merge remote-tracking branch 'origin/main' into HEAD
you-n-g a0d3621
Update catelog
you-n-g 1a62af9
Update docs for qlib.rl
lwwang1995 c9b2198
Update docs for qlib.rl
lwwang1995 5a75c9d
Update figure
lwwang1995 77fbb16
Update docs for qlib.rl
lwwang1995 5d2f21f
Update setup.py
you-n-g 160f951
Merge remote-tracking branch 'origin/main' into HEAD
you-n-g 4b705a9
FIx setup.py
you-n-g 145414b
Update docs and fix some typos
lwwang1995 f215418
Fix the reference to RL docs
lwwang1995 b688db7
Update framework.svg
you-n-g 5035af1
Update framework.svg
you-n-g 3b182e1
Update framework.svg
you-n-g 834c4f4
Update docs for qlibrl.
lwwang1995 0b17397
Update docs for qlibrl.
lwwang1995 7bfc937
Update docs for Qlibrl.
lwwang1995 21b765d
Update docs for qlibrl.
lwwang1995 1703492
Update docs for qlibrl.
lwwang1995 4d73676
Update docs for qlibrl.
lwwang1995 f7713e2
Add new framework
you-n-g a59f844
Update jpg
you-n-g 47667a7
Update framework.svg
you-n-g 129c1a8
Update framework.svg
you-n-g db543fc
Update Qlib framework and description
you-n-g 34e2bc4
Update grammar
you-n-g 8d7df20
Update README.md
you-n-g b3eec1c
Update README.md
you-n-g 946177d
Update docs/component/rl.rst
lwwang1995 04a9b8f
Update docs/component/rl.rst
lwwang1995 e248066
Update docs for qlib.rl
lwwang1995 6020b86
Change theme for docs.
lwwang1995 5db5ea9
Update docs for qlib.rl
lwwang1995 7b84f49
Update docs for qlib.rl
lwwang1995 c47a460
Update docs for qlib.rl
lwwang1995 cf3642d
Update docs for qlib.rl.
lwwang1995 d723685
Update docs for qlib.rl
lwwang1995 6484cfa
Update docs for qlib.rl
lwwang1995 0db199c
Update docs for qlib.rl
lwwang1995 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,7 +19,7 @@ Base Modules | |
|
||
EnvWrapper | ||
------------ | ||
EnvWrapper is the complete capsulation of the simulated environment. It receives actions from outside (policy / strategy / agent), simulates the changes of the market, and then replies rewards and updated states, thus forming an interaction loop. | ||
EnvWrapper is the complete capsulation of the simulated environment. It receives actions from outside (policy / strategy / agent), simulates the changes in the market, and then replies rewards and updated states, thus forming an interaction loop. | ||
|
||
In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper: | ||
|
||
|
@@ -32,7 +32,7 @@ In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary i | |
- `Reward function` | ||
The reward function returns a numerical reward to the policy after each time the policy takes an action. | ||
|
||
EnvWrapper will organically organize these components. Such decomposition allows for better flexibility in development. For example, if the developers want to train multiple types of policies in one same environment, they only need to design one simulator, and design different state interpreters / action interpreters / reward functions for a different types of policies. | ||
EnvWrapper will organically organize these components. Such decomposition allows for better flexibility in development. For example, if the developers want to train multiple types of policies in the same environment, they only need to design one simulator, and design different state interpreters / action interpreters / reward functions for different types of policies. | ||
|
||
QlibRL has well-defined base classes for all these 4 components. All the developers need to do is define their own components by inheriting the base classes and then implementing all interfaces required by the base classes. | ||
|
||
|
@@ -60,15 +60,15 @@ Order Execution | |
------------ | ||
As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Essentially, the goal of order execution is twofold: it not only requires to fulfill the whole order but also targets a more economical execution with maximizing profit gain (or minimizing capital loss). The order execution with only one order of liquidation or acquirement is called single-asset order execution. | ||
|
||
Considering stock investment always aim to pursue long-term maximized profits, is usually manifests as a sequential process of continuously adjusting the asset portfolios, execution for multiple orders, including order of liquidation and acquirement, brings more constraints and making the sequence of execution for different orders should be considered, e.g. before executing an order to buy some stocks, we have to sell at least one stock. The order execution with multiple assets is called multi-asset order execution. | ||
Considering stock investment always aim to pursue long-term maximized profits, it usually manifests as a sequential process of continuously adjusting the asset portfolios, execution for multiple orders, including order of liquidation and acquirement, brings more constraints and makes the sequence of execution for different orders should be considered, e.g. before executing an order to buy some stocks, we have to sell at least one stock. The order execution with multiple assets is called multi-asset order execution. | ||
|
||
According to the order execution’s trait of sequential decision making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy through interacting with the market environment. | ||
According to the order execution’s trait of sequential decision-making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy by interacting with the market environment. | ||
|
||
With QlibRL, the RL algorithm in the above scenarios can be easily implemented. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can add an extra section for nested Portfolio Construction & Order Execution |
||
Nested Portfolio Construction and Order Executor | ||
------------ | ||
QlibRL make it possible to jointly optimize different levels of strategies/models/agents. Take `Nested Decision Execution Framework <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution>`_ an example of, optimization of order execution strategy and portfolio management strategy can interact with each other to maximize returns. | ||
QlibRL makes it possible to jointly optimize different levels of strategies/models/agents. Take `Nested Decision Execution Framework <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution>`_ as an example of, the optimization of order execution strategy and portfolio management strategy can interact with each other to maximize returns. | ||
|
||
Base Class & Interface | ||
============ | ||
|
@@ -99,7 +99,7 @@ If developers have already defined their simulator / interpreters / reward funct | |
policy=policy, | ||
reward=PAPenaltyReward(), | ||
vessel_kwargs={ | ||
"episode_per_iter": 100, 6 | ||
"episode_per_iter": 100, | ||
"update_kwargs": { | ||
"batch_size": 64, | ||
"repeat": 5, | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference to the RL docs will be better instead of RL API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been fixed.