High-Dimensional Continuous Control Using Generalized Advantage Estimation

Authors: John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, Pieter Abeel

Year: 2017

Algorithm: GAE

(Note: The summary here omits deduction processes of each formula. For greater details, please refer to the original paper. A clear explanation of GAE can also be found in the blog post here.)

Problems
- Two main challenges for policy gradient methods:
  - The large number of sample required
  - Difficulty in obtaining stable and steady improvement
- High bias is more harmful than high variance - it can cause the algorithm to fail to converge, or converge to a poor solution.
Proposed solution
- For the first challenge: Use value functions to reduce the variance of policy gradient estimates with an estimator of the advantage function.
- For the second challenge: Use Trust Region Optimization (TRPO) Procedure for both the policy and the value function.
GAE (Generalized Advantage Estimator)
- What it is: A family of policy gradient estimators
- Goal: Significantly reduce variance while maintaining a tolerable level of bias
- (This algorithm reduces policy gradient's variance at the cost of introducing bias)
- A general summary of policy gradient methods
- Define gamma-just for an estimator
- Producing an accurate estimator
  - 0 < lambda < 1, and thus adjusting the value of lambda is making a tradeoff between bias and variance
- Using the generalized advantage estimator, the discounted policy gradient is thus:
Value function estimation
- Use TRPO for adjusting policy network
  
  (For details about TRPO, please refer to the summary here)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

009 High-Dimensional Continuous Control Using Generalized Advantage Estimation.md

009 High-Dimensional Continuous Control Using Generalized Advantage Estimation.md

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Files

009 High-Dimensional Continuous Control Using Generalized Advantage Estimation.md

Latest commit

History

009 High-Dimensional Continuous Control Using Generalized Advantage Estimation.md

File metadata and controls

High-Dimensional Continuous Control Using Generalized Advantage Estimation