Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] torch compile / integrating intel extension for pytorch #1564

Open
1 task done
george-adams1 opened this issue Jun 20, 2023 · 6 comments
Open
1 task done
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers help wanted Help from contributors is welcomed

Comments

@george-adams1
Copy link

🚀 Feature

Request to integrate the Intel extension for PyTorch into sb3. The Intel extension for PyTorch optimizes the PyTorch library to better utilize the computational capabilities of Intel processors. By integrating this extension, SB3 users that utilize Intel processors could potentially experience significant performance improvements.

Motivation

Maximizing the performance of Intel processors. This can lead to faster training times and more efficient resource utilization, improving the overall experience for users of SB3 who use Intel processors.

Pitch

The integration of the Intel extension for PyTorch into SB3 would involve modifying the library to utilize the extension when it detects that it is running on an Intel processor.

Alternatives

An alternative to this would be to provide guidelines on how users can manually integrate the Intel extension into their SB3 setups.

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
@george-adams1 george-adams1 added the enhancement New feature or request label Jun 20, 2023
@araffin
Copy link
Member

araffin commented Jun 20, 2023

Hello,
could you give some pointer/example on how to integrate that extension?

will you be willing to contribute such extension?

@duburcqa
Copy link

duburcqa commented Jun 25, 2023

I believe it was a reference to this project: https://github.com/intel/intel-extension-for-pytorch

The latest version is compatible with torch.compile. This means that it is now much easier to leverage their optimizations without a complete rework of the libraries relying on pytorch. Here is the example script. Yet, it is describe here as an inference only backend.

@araffin
Copy link
Member

araffin commented Jun 25, 2023

Thanks for the pointers @duburcqa =)!

I see. In that case, it is already supported as SB3 exposes its PyTorch policy via the .policy attribute (this is a nn.Module, see doc).

Related: #1391 and #1439

You should also know that the bottleneck at train time is the gradient update, so it won't help you much to optimize inference time (unless you are at test time), we do have an experimental Jax version (SBX) if you need significant speed up (see README).

@araffin araffin added documentation Improvements or additions to documentation and removed enhancement New feature or request labels Jun 25, 2023
@araffin
Copy link
Member

araffin commented Jun 25, 2023

I would actually welcome a PR that shows how to use th.compile in our doc =)

@araffin araffin added the help wanted Help from contributors is welcomed label Jun 25, 2023
@araffin araffin added the good first issue Good for newcomers label Feb 28, 2024
@araffin araffin changed the title [Feature Request] integrating intel extension for pytorch [Feature Request] torch compile / integrating intel extension for pytorch Oct 24, 2024
@araffin
Copy link
Member

araffin commented Oct 24, 2024

As a follow up, it would be nice to investigate where to have torch compile for some speed boost (see https://github.com/pytorch-labs/LeanRL, cc @vmoens).
Given SBX experience (https://github.com/araffin/sbx), biggest speed boost should come from compiling the backward pass and combining multiple gradient steps to avoid for loops and minimizing transfer between CPU/GPU (to see the speed boost, maybe starting with CrossQ from SB3 contrib).

@araffin araffin added the enhancement New feature or request label Oct 24, 2024
@vmoens
Copy link

vmoens commented Oct 24, 2024

Hey
Happy to help with this.
Compiling several steps together isn't a bad idea.
The only thing that usually works best is this:

  • put all the updates in a single update function, including zero_grad, forward, backward, clip grad norm, step and target params updates
  • if possible group optimizers together (this can be done using param_groups in they all have the same number of steps)
  • have a look at the graph breaks using TORCH_LOGS="+graph_breaks,recompiles" python code.py
  • avoid cpu-gpu transfers in the compiled region, and use torch whenever possible (not numpy).
  • prefer torch.where to masking, avoid control flow if can be avoided (again sometimes torch.where can help)
  • use mode="reduce-overhead" or tensordict's CudaGraphModule like we do in leanrl

That's the 101 but I'd be happy to review or help implement a compiled version of any script.

@araffin araffin pinned this issue Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers help wanted Help from contributors is welcomed
Projects
None yet
Development

No branches or pull requests

4 participants