-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataGrid] Filtering performance #9120
Conversation
Netlify deploy previewNetlify deploy preview: https://deploy-preview-9120--material-ui-x.netlify.app/ Updated pagesNo updates. These are the results for the performance tests:
|
Sorry if you already proposed it. Something like: cellFilterParas: {
value: TValue,
getCellParams: () => GridCellParams
} Or if we want to avoid creating the callback (not sure how negligible that is on large dataset), we just say that now people can access the cellFilterParas: {
value: TValue,
id: GridRowId;
field: string;
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! 🚀
packages/grid/x-data-grid/src/hooks/features/filter/gridFilterUtils.ts
Outdated
Show resolved
Hide resolved
@@ -20,14 +20,10 @@ export function useGridApiMethod< | |||
return; | |||
} | |||
apiMethodsNames.forEach((methodName) => { | |||
if (!privateApiRef.current.hasOwnProperty(methodName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here was to have stable references to API methods.
For example, with this change, it's now impossible to spy
on API methods, because the spy
gets overridden with the new function reference (this is why the tests fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use useEventCallback
on the methods instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason other than testing for which we would like to keep the stable references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've benchmarked this one, for the raw filtering time, this indirection degrades performance by 11% if we take the final results as the baseline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, testing is the only reason to keep the stable references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll try to make it work without the trampoline then.
if (appliers.length === 1) { | ||
const applier = appliers[0]; | ||
const applierFn = applier.fn; | ||
|
||
const filteredAppliers = shouldApplyFilter | ||
? appliers.filter((applier) => shouldApplyFilter(applier.item.field)) | ||
: appliers; | ||
const applierCall = applier.v7 | ||
? 'applierFn(row)' | ||
: 'applierFn(getRowId ? getRowId(row) : row.id)'; | ||
const fn = eval(` | ||
(row, shouldApplyFilter) => { | ||
// ${applierFn.name} <- Keep a ref, prevent the bundler from optimizing away | ||
|
||
filteredAppliers.forEach((applier) => { | ||
resultPerItemId[applier.item.id!] = applier.fn(rowId); | ||
}); | ||
if (shouldApplyFilter && !shouldApplyFilter(applier.item.field)) { | ||
return { '${applier.item.id!}': false }; | ||
} | ||
return { '${applier.item.id!}': ${applierCall} }; | ||
} | ||
`); | ||
|
||
return fn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we consider the final results as the baseline, and we decide to not apply this change, the performance degrades by 20%. I know "eval is evil", but in this case we're getting a real performance improvement, so I feel like we should keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exotic indeed, but also risky - try running this locally 😅
import * as React from 'react';
import { DataGrid } from '@mui/x-data-grid';
const columns = [{ field: 'id' }];
export default function InitialFilters() {
const [rows, setRows] = React.useState<any[]>([]);
React.useEffect(() => {
setRows([{ id: 1 }]);
}, []);
return (
<div style={{ height: 400, width: '100%' }}>
<DataGrid
columns={columns}
rows={rows}
filterModel={{
items: [
{
field: 'id',
operator: 'equals',
value: '1',
id: "1': alert('hello') } //",
},
],
}}
/>
</div>
);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point! Could we reliably escape that value with ... { ${JSON.stringify(String(applier.item.id))}: ...
? I feel that if String
is guaranteed to return a string, then JSON.stringify
should be guaranteed to return valid javascript string representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow.
${JSON.stringify(String(applier.item.id))}
would return the same string, and it will be evaluated as before. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I forgot to remove ' '
around JSON.stringify(...)
😅
I agree that considering the performance gains, it's okay to use eval in this case 👍
for (let i = 0; i < rows.length; i += 1) { | ||
const row = rows[i]; | ||
|
||
isRowMatchingFilters(row, undefined, result); | ||
|
||
const isRowPassing = passFilterLogicSingle( | ||
result.passingFilterItems, | ||
result.passingQuickFilterValues, | ||
params.filterModel, | ||
apiRef, | ||
); | ||
|
||
if (isRowPassing) filteredRowsLookup.add(row.id); | ||
} | ||
|
||
// XXX: Is props.rows what we want? | ||
// XXX: Handle footer rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add autogenerated rows after the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using (tree[GRID_ROOT_GROUP_ID]).children;
here as before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See point 9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. We still use the tree for non-flat filtering though.
Maybe it's worth giving another try to Map
for dataRowIdToModelLookup
? Looking at the related discussion we had in #9120 (comment), it should make it faster, so maybe there was something else causing the slowdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This quick benchmark shows that at least property access should be faster when using Map
: https://jsperf.app/petawa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I benchmarked by replacing our state model by a Map
inside d8. The complexity of JS engines makes it hard to predict the performance solely with a microbenchmark, we should benchmark further the whole process before switching. I'll start with the other points, once I'm done I can look further into using a Map
.
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
packages/grid/x-data-grid-premium/src/DataGridPremium/useDataGridPremiumComponent.tsx
Show resolved
Hide resolved
@@ -154,6 +160,38 @@ export function useGridParamsApi(apiRef: React.MutableRefObject<GridPrivateApiCo | |||
[apiRef, getBaseCellParams], | |||
); | |||
|
|||
const getRowValue = React.useCallback<GridParamsApi['getRowValue']>( | |||
(row, colDef) => { | |||
const id = getRowId ? getRowId(row) : row.id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to produce the same result as getCellValue
. What is the added value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to point 9: if we're iterating the row objects directly, we need a way to know the cell value without going through dataRowIdToModelLookup
, which .getCellValue
does. Big lookup objects are expensive, in particular if we need to access them in a loop that's already O(n)
to start with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using Set
for the dataRowIdToModelLookup
help here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Map
No, I've tried using Maps and it slows down the benchmark. The issue is really the indirection & additional memory accesses. The row
object is basically a direct pointer to the row data. Passing row.id
to an hashmap to get the row data again is wasted cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is dataRowIdToModelLookup
slow, because we change its shape many times (adding more and more keys to it) and therefore all its properties are "slow properties" (according to https://v8.dev/blog/fast-properties)?
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I have a suggestion for the getCellValue
then - can we support conditional row
and column
arguments and use them if available, ani f not - it will use getRow()
and getColumn
as fallback.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getCellValue: <V extends any = any>(
id: GridRowId,
field: string,
row?: GridRowModel,
colDef?: GridColDef,
) => V;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that we'll always have either both objects or none. How about we keep the API like it's implemented but leave it undocumented, and we can keep iterating & refactoring it before v7? I'll open a PR for the v7 filters shortly, we can continue the discussion there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!filterItem.value) { | ||
return null; | ||
} | ||
|
||
const valueAsBoolean = filterItem.value === 'true'; | ||
return ({ value }): boolean => { | ||
return (value, _, __, ___): boolean => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not skip the unused arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old habit, V8 used to have an adaptor for functions with mismatched arity that was expensive, but the overhead is mostly gone now. But I still like to write javascript code that is easy & predictable for engines to optimize. In particular for functions like this one that are run in a hot loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the article, I have learned a lot about v8 lately 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Np, I love learning about JS engines internals, helps a lot with performance optimization. I'm less familiar with SpiderMonkey though, and I know very little about JSC. But V8 is the most common engine by far.
If you're interested in reading more, those links are all interesting:
for (let i = 0; i < rows.length; i += 1) { | ||
const row = rows[i]; | ||
|
||
isRowMatchingFilters(row, undefined, result); | ||
|
||
const isRowPassing = passFilterLogicSingle( | ||
result.passingFilterItems, | ||
result.passingQuickFilterValues, | ||
params.filterModel, | ||
apiRef, | ||
); | ||
|
||
if (isRowPassing) filteredRowsLookup.add(row.id); | ||
} | ||
|
||
// XXX: Is props.rows what we want? | ||
// XXX: Handle footer rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using (tree[GRID_ROOT_GROUP_ID]).children;
here as before?
From what I can quickly test on PR: https://deploy-preview-9120--material-ui-x.netlify.app/x/react-data-grid/filtering/#header-filters vs. HEAD: https://mui.com/x/react-data-grid/filtering/header-filters/ with: In the logs of mui-x/packages/grid/x-data-grid/src/components/panel/filterPanel/GridFilterInputValue.tsx Line 8 in 8201d0c
(should likely be configurable) mui-x/packages/grid/x-data-grid/src/components/toolbar/GridToolbarQuickFilter.tsx Lines 63 to 67 in 8201d0c
What I would personally explore for client-side filters is to run the filter in idle time (likely overkill for column filter but more relevant for quick search), yield to the main thread when it asks for it, have no debounce or a very small one, and cancel the previous filtering tasks when the input changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work 🎉
If you plan something for v7, maybe add that as a point or GH issue in this umbrella issue, the goal is to have all the planned changes in a single place.
@@ -13,6 +13,7 @@ import { | |||
import { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the specific discussions going on, generally speaking, the initiatives taken for the performance enhancement in this PR are really solid, a few of them were actually new to me and they strengthened my basic concepts about how things operate under the hood, so a huge thanks. I think the direction we are going in is outstanding.
I'll really appreciate it if it's possible to extract these changes (as we already discussed) into smaller PRs to:
- Have a stronger spotlight on the optimization each of the changes is bringing in
- To measure the impact of each change separately (and possibly push it further)
- To assess which of the changes could be applicable to the other areas of the application.
1, 2: Points no 1 and 2 are certainly the most impacting ones, although we are going to have a workaround (.v7), but with v7 around the corner and the benefits internally applied, that seems a step forward.
3: Non-cached selector option is a nice improvement too, why need to cache something when it could be accessed with a simple (.) notation.
4: It seems, the indirection (or checking of properties was to stop initializing if already did or have stable references, but I am not sure why it was wrapped inside the register function). It'll be good to simplify it if it doesn't cause a side effect.
5. I have never used eval this way, maybe discuss this one separately and see if there are any loose grounds?
6, 7: These are good, I think we should do them where possible in a non-breakable way and plan for v7 for the remaining part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's continue the ongoing discussions in the threads above, I'll open separate PRs as we reach an agreement for each of those.
TODO: |
Superseded by the split PRs. |
Summary
This PR is a POC to demonstrate how we can improve the performance of our filtering, and other operations. It's not meant to be merged as it is, I'm just exploring ways to evolve our API & architecture to improve performance. Please add comments.
https://deploy-preview-9120--material-ui-x.netlify.app/x/react-data-grid/filtering/#header-filters
Results
This PR improves the speed of one-column string-contains filtering by about 75%.
(edit: the first 2 commits of this PR do, and the later commits improve to 93%)
Changes & observations
Our biggest cost by far is memory allocations. The amount of objects & functions (closures) being allocated to pass data around creates much more CPU work than anything else.
1. Fast filters
The first change is to define some of our filter functions as "fast filters". What this means is that they only use the
cellParams.value
field of their input arguments, so we can avoid calling the expensive API function.getCellParams()
and instead only call.getCellValue()
.In practice, from what I've seen, all of our filters could be fast filters, so we could change the API. But the filter API is public so we can't change it without a major version increase.
2. Memoize filtered items
The second change is to avoid calling
getFilterCallbackFromItem()
duringpassFilterLogic()
. The former is called simply to filter the model items, but it's a very expensive call (it basically recreates the filter function) and it was done for every row, so for N rows we were re-creating the filtering function N times (plus 1, for the filtering function actually used for filtering).Edit: I've added more changes below, see next comment for details. Each commit correspond to one change, and it's probably easier to read each commit independently than the PR change as a whole.
Benchmarks
The measurements were done by filtering 100,000 rows with the string
"am"
. TheElapsed (ms)
results below are generated by wrapping theflatFilteringMethod
withperformance.now()
timings.NOTE: The results above don't use the same set of 100,000 rows (it's the
Employee
dataset, randomly generated), thus why theCount
differs. These results are approximate but give an accurate perspective and were consistently reproducible.