Clarification about prior_action in SAC #209

0xJchen · 2021-08-06T03:28:36Z

Hi, I noticed that in SAC implementation, an action_prior is introduced at init:

if self.action_prior == "uniform":
	prior_log_pi = 0.0
elif self.action_prior == "gaussian":
	prior_log_pi = self.action_prior_distribution.log_likelihood(
	action, GaussianDistInfo(mean=torch.zeros_like(action)))

and it is used when calculating policy loss:

prior_log_pi = self.get_action_prior(new_action.cpu())
pi_losses = self._alpha * log_pi - min_log_target - prior_log_pi

May I ask the purpose of this action_prior? First, I found almost no literature on SAC mentioned this (please inform me if I am missed,). Second, in the official SAC implementation, the policy loss does not involve this term:

with tf.GradientTape() as tape:
    actions, log_pis = self._policy.actions_and_log_probs(observations)
    Qs_log_targets = tuple(
        Q.values(observations, actions) for Q in self._Qs)
    Q_log_targets = tf.reduce_mean(Qs_log_targets, axis=0)
    policy_losses = entropy_scale * log_pis - Q_log_targets
    policy_loss = tf.nn.compute_average_loss(policy_losses)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification about prior_action in SAC #209

Clarification about prior_action in SAC #209

0xJchen commented Aug 6, 2021 •

edited

Loading

Clarification about prior_action in SAC #209

Clarification about prior_action in SAC #209

Comments

0xJchen commented Aug 6, 2021 • edited Loading

0xJchen commented Aug 6, 2021 •

edited

Loading