You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we've been given a whole model graph, the client only supplies the initial inputs and only wants the final outputs. This means that only the memory layouts for the first and last operations in the graph are constrained. Within the graph, we have the freedom to choose any data layout we like to maximize performance.
What this is
A pass that analyzes a model graph and adds relayout operations to ensure we get good performance.
What this is not
Something we need to upstream.
Why?
Performance!
We're stepping into the whole-graph role.
Why not?
It's very unclear what layouts we should select, and it's not clear how we'd work that out, especially since that may be a function of tuning, which may be a function of layout, which ... yeah.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Motivation
If we've been given a whole model graph, the client only supplies the initial inputs and only wants the final outputs. This means that only the memory layouts for the first and last operations in the graph are constrained. Within the graph, we have the freedom to choose any data layout we like to maximize performance.
What this is
A pass that analyzes a model graph and adds relayout operations to ensure we get good performance.
What this is not
Something we need to upstream.
Why?
Performance!
We're stepping into the whole-graph role.
Why not?
It's very unclear what layouts we should select, and it's not clear how we'd work that out, especially since that may be a function of tuning, which may be a function of layout, which ... yeah.
Beta Was this translation helpful? Give feedback.
All reactions