Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about prior_action in SAC #209

Open
0xJchen opened this issue Aug 6, 2021 · 0 comments
Open

Clarification about prior_action in SAC #209

0xJchen opened this issue Aug 6, 2021 · 0 comments

Comments

@0xJchen
Copy link

0xJchen commented Aug 6, 2021

Hi, I noticed that in SAC implementation, an action_prior is introduced at init:

if self.action_prior == "uniform":
	prior_log_pi = 0.0
elif self.action_prior == "gaussian":
	prior_log_pi = self.action_prior_distribution.log_likelihood(
	action, GaussianDistInfo(mean=torch.zeros_like(action)))

and it is used when calculating policy loss:

prior_log_pi = self.get_action_prior(new_action.cpu())
pi_losses = self._alpha * log_pi - min_log_target - prior_log_pi

May I ask the purpose of this action_prior? First, I found almost no literature on SAC mentioned this (please inform me if I am missed,). Second, in the official SAC implementation, the policy loss does not involve this term:

with tf.GradientTape() as tape:
    actions, log_pis = self._policy.actions_and_log_probs(observations)
    Qs_log_targets = tuple(
        Q.values(observations, actions) for Q in self._Qs)
    Q_log_targets = tf.reduce_mean(Qs_log_targets, axis=0)
    policy_losses = entropy_scale * log_pis - Q_log_targets
    policy_loss = tf.nn.compute_average_loss(policy_losses)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant