You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
May I ask the purpose of this action_prior? First, I found almost no literature on SAC mentioned this (please inform me if I am missed,). Second, in the official SAC implementation, the policy loss does not involve this term:
with tf.GradientTape() as tape:
actions, log_pis = self._policy.actions_and_log_probs(observations)
Qs_log_targets = tuple(
Q.values(observations, actions) for Q in self._Qs)
Q_log_targets = tf.reduce_mean(Qs_log_targets, axis=0)
policy_losses = entropy_scale * log_pis - Q_log_targets
policy_loss = tf.nn.compute_average_loss(policy_losses)
The text was updated successfully, but these errors were encountered:
Hi, I noticed that in
SAC
implementation, anaction_prior
is introduced at init:and it is used when calculating policy loss:
May I ask the purpose of this
action_prior
? First, I found almost no literature on SAC mentioned this (please inform me if I am missed,). Second, in the official SAC implementation, the policy loss does not involve this term:The text was updated successfully, but these errors were encountered: