-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilization derived metric #101
base: master
Are you sure you want to change the base?
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
@petermattson , what do you think of this? |
I think Comparing utilization for different software implementations for fixed hardware seems useful, but utilization comparisons between different hardware seems meaningless. I guess the intention is to promote Given the experience in MLPerf Inference with NIC to accelerator bandwidth, I would prefer not to involve numbers in MLPerf that cannot be measured directly. I don't want to adjudicate complaints that someone is not calculating the peak tensor operations of their hardware correctly, and I also don't want to get into the business of measuring FLOPS. The The definition of |
Should we consider making this a "recommended methodology" or something
rather than binding it tightly into results guidelines?
Then we can resolve some of these issues through discretion in application,
but the methodology is citable for consistency when appropriate.
…On Tue, Feb 15, 2022 at 12:48 AM tjablin ***@***.***> wrote:
- peak_system_tensor_flops_per_second means the peak tensor operations
of the hardware, counting only tensor math throughput and not additional
vector or pointwise math datapaths.
I think peak_system_tensor_flops_per_second is well-defined for
architectures like NVIDIA GPUs and Google TPUs, but is not well-defined for
CPUs, DSPs, or FPGAs. Furthermore, it is possible that some architectures
may allow overlapping tensor and vector math operations such to achieve
greater than 100% throughput.
Comparing utilization for different software implementations for fixed
hardware seems useful, but utilization comparisons between different
hardware seems meaningless. I guess the intention is to promote
utilization as a conversion factor between FLOPS and actual performance,
but I think we should try to promote comparisons based on performance
directly and not try to fix FLOPS.
Given the experience in MLPerf Inference with NIC to accelerator
bandwidth, I would prefer not to involve numbers in MLPerf that cannot be
measured directly. I don't want to adjudicate complaints that someone is
not calculating the peak tensor operations of their hardware correctly, and
I also don't want to get into the business of measuring FLOPS.
The model_tensor_flops term seems like a pain to compute with many
possible edge cases and room for disagreements. If this is proposal is
accepted, I would prefer that MLCommons provide an official
model_tensor_flops for model rather than allowing submitters to calculate
their own.
The definition of model_tensor_flops implies that there is a single
unambiguous number of operations required by a given model, but its unclear
how to count required operations when an implementation may choose a
sub-cubic dot implementation. How are model_tensor_flops counted for
sparse operations?
—
Reply to this email directly, view it on GitHub
<#101 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIIVUHITDFMEKU5R67JAK3DU3GIE7ANCNFSM5OMXN3LQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
No description provided.