Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; Offline RL] CQL: Support multi-GPU/CPU setup and different learning rates for actor, critic, and alpha. #47402

Merged

Commits on Aug 22, 2024

  1. Fixed a bug in SAC/CQL due to twin-Q parameters in backward pass. Sta…

    …rted to rewrite CQL loss.
    
    Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
    simonsays1980 committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    8923dc2 View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2024

  1. Raw changes. Just saving.

    Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
    simonsays1980 committed Aug 28, 2024
    Configuration menu
    Copy the full SHA
    75d91ce View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2024

  1. Securing changes.

    Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
    simonsays1980 committed Aug 29, 2024
    Configuration menu
    Copy the full SHA
    331f0ef View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fe79e3d View commit details
    Browse the repository at this point in the history
  3. Moved all forward passes from learner to module and added learning ra…

    …tes for actor, critic, and alpha. Multi-learner setups works.
    
    Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
    simonsays1980 committed Aug 29, 2024
    Configuration menu
    Copy the full SHA
    ecbd588 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2be80fd View commit details
    Browse the repository at this point in the history
  5. Removed all 'deterministic_loss' uses.

    Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
    simonsays1980 committed Aug 29, 2024
    Configuration menu
    Copy the full SHA
    cb9ecbe View commit details
    Browse the repository at this point in the history

Commits on Aug 30, 2024

  1. Modified comment.

    Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
    simonsays1980 committed Aug 30, 2024
    Configuration menu
    Copy the full SHA
    54b5d2d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e6b3769 View commit details
    Browse the repository at this point in the history
  3. Removed another forward pass through the Q-network and used the alrea…

    …dy sampled log-probabilities.
    
    Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
    simonsays1980 committed Aug 30, 2024
    Configuration menu
    Copy the full SHA
    62d6dfb View commit details
    Browse the repository at this point in the history
  4. Multiple smaller modifications to CQL algorithm following the origina…

    …l one published by Kumar et al. (2020).
    
    Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
    simonsays1980 committed Aug 30, 2024
    Configuration menu
    Copy the full SHA
    edbe263 View commit details
    Browse the repository at this point in the history
  5. Adapted hyperparameters after tuning the example.

    Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
    simonsays1980 committed Aug 30, 2024
    Configuration menu
    Copy the full SHA
    100aabe View commit details
    Browse the repository at this point in the history