Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to select tfrt backend for CPU. #7042

Merged
merged 1 commit into from
Jun 22, 2021

Conversation

zhangqiaorjc
Copy link
Collaborator

@zhangqiaorjc zhangqiaorjc commented Jun 21, 2021

The new TFRT CPU backend has been running without issues internally for a while.

Adding an option to use it in open source.

Benchmarks show significant dispatch performance improvements (wall time)

name                                  old time/op             new time/op             delta
eager_unary_dispatch                  41.0µs ± 1%             32.7µs ± 2%  -20.30%         (p=0.000 n=10+9)
eager_unary                           41.8µs ± 1%             33.5µs ± 2%  -19.94%         (p=0.000 n=10+9)
eager_binary_dispatch                 50.4µs ± 1%             41.3µs ± 2%  -17.88%         (p=0.000 n=10+9)
eager_binary                          51.2µs ± 0%             42.0µs ± 2%  -18.02%          (p=0.000 n=9+9)
jit_trivial_dispatch                  55.1µs ± 2%             54.9µs ± 2%     ~            (p=0.604 n=10+9)
jit_trivial                           55.8µs ± 2%             55.7µs ± 2%     ~            (p=0.905 n=10+9)
jit_simple_dispatch                   8.99µs ± 1%             2.13µs ± 4%  -76.26%        (p=0.000 n=10+10)
jit_simple                            10.0µs ± 4%              2.5µs ± 4%  -74.84%          (p=0.000 n=9+8)
jit_simple_many_args_dispatch_10      16.4µs ±11%              4.4µs ±16%  -73.13%        (p=0.000 n=10+10)
jit_simple_many_args_10               17.0µs ±11%              4.6µs ± 2%  -73.10%         (p=0.000 n=10+8)
jit_simple_pruned_args_dispatch_10    8.85µs ± 1%             2.50µs ± 3%  -71.75%          (p=0.000 n=8+8)
jit_simple_pruned_args_10             10.6µs ±12%              2.9µs ± 4%  -72.34%         (p=0.000 n=10+8)
jit_simple_many_args_dispatch_100     75.3µs ±11%             25.7µs ± 2%  -65.82%         (p=0.000 n=10+8)
jit_simple_many_args_100              78.0µs ± 9%             26.0µs ± 3%  -66.61%         (p=0.000 n=10+8)
jit_simple_pruned_args_dispatch_100   16.7µs ± 9%              8.8µs ± 3%  -47.55%         (p=0.000 n=10+8)
jit_simple_pruned_args_100            17.6µs ± 9%              9.7µs ±16%  -44.83%        (p=0.000 n=10+10)
jit_simple_many_args_dispatch_1000     606µs ± 1%              309µs ± 4%  -49.01%         (p=0.000 n=9+10)
jit_simple_many_args_1000              622µs ± 2%              314µs ± 5%  -49.51%        (p=0.000 n=10+10)
jit_simple_pruned_args_dispatch_1000  83.8µs ± 3%             73.7µs ± 4%  -12.10%         (p=0.000 n=10+9)
jit_simple_pruned_args_1000           84.3µs ± 1%             75.7µs ± 1%  -10.17%          (p=0.000 n=9+7)
jit_simple_many_args_dispatch_2000    1.22ms ± 2%             0.69ms ± 5%  -43.39%          (p=0.000 n=9+9)
jit_simple_many_args_2000             1.23ms ± 3%             0.70ms ± 5%  -43.32%          (p=0.000 n=9+9)
jit_simple_pruned_args_dispatch_2000   179µs ± 3%              167µs ± 4%   -7.08%          (p=0.000 n=9+9)
jit_simple_pruned_args_2000            180µs ± 2%              168µs ± 5%   -6.82%          (p=0.000 n=9+9)
jit_dispatch_without_transfer          797µs ± 5%              683µs ± 6%  -14.32%        (p=0.000 n=10+10)
jit_dispatch_with_transfer             817µs ± 8%              759µs ± 5%   -7.07%        (p=0.001 n=10+10)
sda_index_1                           12.6µs ±11%             11.4µs ± 8%   -9.56%         (p=0.022 n=10+9)

@google-cla google-cla bot added the cla: yes label Jun 21, 2021
@zhangqiaorjc zhangqiaorjc requested a review from hawkinsp June 21, 2021 19:16
@zhangqiaorjc zhangqiaorjc self-assigned this Jun 21, 2021
@zhangqiaorjc zhangqiaorjc added the pull ready Ready for copybara import and testing label Jun 22, 2021
else:
register_backend_factory('cpu', xla_client.make_cpu_client,
priority=0)
register_backend_factory('cpu', xla_client.make_cpu_client,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this registration still be here? I think it's subsumed by the ones above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@zhangqiaorjc zhangqiaorjc added pull ready Ready for copybara import and testing and removed pull ready Ready for copybara import and testing labels Jun 22, 2021
@copybara-service copybara-service bot merged commit afe2d90 into jax-ml:main Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes pull ready Ready for copybara import and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants