title

booktitle

abstract

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

Proceedings of the 39th International Conference on Machine Learning

We focus on parameterized policy search for reinforcement learning over continuous action spaces. Typically, one assumes the score function associated with a policy is bounded, which {fails to hold even for Gaussian policies. } To properly address this issue, one must introduce an exploration tolerance parameter to quantify the region in which it is bounded. Doing so incurs a persistent bias that appears in the attenuation rate of the expected policy gradient norm, which is inversely proportional to the radius of the action space. To mitigate this hidden bias, heavy-tailed policy parameterizations may be used, which exhibit a bounded score function, but doing so can cause instability in algorithmic updates. To address these issues, in this work, we study the convergence of policy gradient algorithms under heavy-tailed parameterizations, which we propose to stabilize with a combination of mirror ascent-type updates and gradient tracking. Our main theoretical contribution is the establishment that this scheme converges with constant batch sizes, whereas prior works require these parameters to respectively shrink to null or grow to infinity. Experimentally, this scheme under a heavy-tailed policy parameterization yields improved reward accumulation across a variety of settings as compared with standard benchmarks.

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

bedi22a

0

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

1716

1731

1716-1731

1716

false

Bedi, Amrit Singh and Chakraborty, Souradip and Parayil, Anjaly and Sadler, Brian M and Tokekar, Pratap and Koppel, Alec

given	family
Amrit Singh	Bedi

given	family
Souradip	Chakraborty

given	family
Anjaly	Parayil

given	family
Brian M	Sadler

given	family
Pratap	Tokekar

given	family
Alec	Koppel

2022-06-28

Proceedings of the 39th International Conference on Machine Learning

162

inproceedings

date-parts

2022

6

28

https://proceedings.mlr.press/v162/bedi22a/bedi22a.pdf

label	link
Other Files	https://media.icml.cc/Conferences/ICML2022/supplementary/bedi22a-supp.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2022-06-28-bedi22a.md

2022-06-28-bedi22a.md

Files

2022-06-28-bedi22a.md

Latest commit

History

2022-06-28-bedi22a.md

File metadata and controls