[DataGrid] Filtering performance #9120

romgrk · 2023-05-26T00:14:26Z

Summary

This PR is a POC to demonstrate how we can improve the performance of our filtering, and other operations. It's not meant to be merged as it is, I'm just exploring ways to evolve our API & architecture to improve performance. Please add comments.

https://deploy-preview-9120--material-ui-x.netlify.app/x/react-data-grid/filtering/#header-filters

Results

This PR improves the speed of one-column string-contains filtering by about 75%.
(edit: the first 2 commits of this PR do, and the later commits improve to 93%)

Changes & observations

Our biggest cost by far is memory allocations. The amount of objects & functions (closures) being allocated to pass data around creates much more CPU work than anything else.

1. Fast filters

The first change is to define some of our filter functions as "fast filters". What this means is that they only use the cellParams.value field of their input arguments, so we can avoid calling the expensive API function .getCellParams() and instead only call .getCellValue().

In practice, from what I've seen, all of our filters could be fast filters, so we could change the API. But the filter API is public so we can't change it without a major version increase.

2. Memoize filtered items

The second change is to avoid calling getFilterCallbackFromItem() during passFilterLogic(). The former is called simply to filter the model items, but it's a very expensive call (it basically recreates the filter function) and it was done for every row, so for N rows we were re-creating the filtering function N times (plus 1, for the filtering function actually used for filtering).

Edit: I've added more changes below, see next comment for details. Each commit correspond to one change, and it's probably easier to read each commit independently than the PR change as a whole.

Benchmarks

The measurements were done by filtering 100,000 rows with the string "am". The Elapsed (ms) results below are generated by wrapping the flatFilteringMethod with performance.now() timings.

Before	After

NOTE: The results above don't use the same set of 100,000 rows (it's the Employee dataset, randomly generated), thus why the Count differs. These results are approximate but give an accurate perspective and were consistently reproducible.

mui-bot · 2023-05-26T00:20:10Z

Netlify deploy preview

Netlify deploy preview: https://deploy-preview-9120--material-ui-x.netlify.app/

Updated pages

No updates.

These are the results for the performance tests:

Test case	Unit	Min	Max	Median	Mean	σ
Filter 100k rows	ms	313.2	502.7	351.9	376.5	70.729
Sort 100k rows	ms	628.4	1,164.4	628.4	961.52	186.242
Select 100k rows	ms	185.7	333.9	277.3	272.5	49.567
Deselect 100k rows	ms	134	305.1	269.4	238.3	66.966

Generated by 🚫 dangerJS against 189acf0

romgrk · 2023-05-26T02:10:30Z

I've added more changes, some of them quite exotic, to see how much I could push the performance. They've improved the filtering by another 10%, so 85% compared to the baseline. They're described here.

Changes

3. Non-memoized selectors

The selectors created with createSelector() have a cost, which become apparent when they're called in a loop like filtering does. Most of our selector don't need the memoization of createSelector(), so I've added createRawSelector() to avoid paying the memoization cost.

I recommend that we apply this change to all our existing selectors who don't need memoization (all those that don't derive their input argument).

4. Reduce API methods indirection

The useGridApiMethod hook creates indirection by wrapping in a trampoline function. If we remove it, we avoid function calls and allocations (the rest arguments).

privateApiRef.current.register(visibility, {
  [methodName]: (...args: any[]) => {
    const fn = apiMethodsRef.current[methodName] as Function;
    return fn(...args);
  },
});

If there aren't apparent downsides to this, I would suggest that we do it.

5. Avoid dynamic object props

The pattern { [dynamicKey]: value } is expensive for JS engines, I've used eval(...) to create a function with a static { key: value }. This is exotic/evil, I don't really recommend that we implement it, but it's an interesting optimization.

6. Direct state access

Avoid selectors and using direct apiRef.current.state. access in our internal functions can improve the performance, selector functions (even unmemoized) still have a cost.

7. Use `Set`

We often use simple objects for use cases where we do Set operations. Using the proper data structure improves performance, but these changes are also semver major because they're selectable by the public selectors.

Benchmarks

flaviendelangle · 2023-05-26T09:59:05Z

In practice, from what I've seen, all of our filters could be fast filters, so we could change the API. But the filter API is public so we can't change it without a major version increase.

Sorry if you already proposed it.
But for this kind of scenario where we want to allow people to access some rarely used data, it would probably be better to pass a callback so that it's only executed if the user actually wants it.

Something like:

cellFilterParas: {
  value: TValue,
  getCellParams: () => GridCellParams
}

Or if we want to avoid creating the callback (not sure how negligible that is on large dataset), we just say that now people can access the apiRef and call apiRef.current.getCellParams themselves, so we pass:

cellFilterParas: {
  value: TValue,
  id: GridRowId;
  field: string;
}

cherniavskii

Great work! 🚀

packages/grid/x-data-grid/src/hooks/features/filter/gridFilterUtils.ts

cherniavskii · 2023-05-26T12:37:55Z

packages/grid/x-data-grid/src/hooks/utils/useGridApiMethod.ts

@@ -20,14 +20,10 @@ export function useGridApiMethod<
      return;
    }
    apiMethodsNames.forEach((methodName) => {
-      if (!privateApiRef.current.hasOwnProperty(methodName)) {


The goal here was to have stable references to API methods.
For example, with this change, it's now impossible to spy on API methods, because the spy gets overridden with the new function reference (this is why the tests fail).

Could we use useEventCallback on the methods instead ?

Is there any reason other than testing for which we would like to keep the stable references?

I've benchmarked this one, for the raw filtering time, this indirection degrades performance by 11% if we take the final results as the baseline.

As far as I know, testing is the only reason to keep the stable references.

Got it, I'll try to make it work without the trampoline then.

romgrk · 2023-05-28T18:06:27Z

Did some more profiling, I found two more substantial improvements:

8. Remove API Proxy wrapper

The wrapper around the API object is expensive, in particular if it needs unwrapping. Removing it would improve our performance across all the DataGrid functions.

9. Iterate rows directly

We are currently iterating row ids in flatFilteringMethod, which is slow and unefficient because then we need to do dataRowIdToModelLookup[id] to retrieve the row, and it is a huge cost if it's a huge object. If we know we are filtering all the rows, then we're much better off iterating row objects directly. Below is a comparison.

Before (Builtins_KeyedLoadIC_Megamorphic is the relevant entry)

After

Final results

The improvement compared to the baseline is 93% with these last changes. I haven't added the commits here yet, I'll cleanup what I have and evaluate which commits we can merge cleanly without a semver major change.

romgrk · 2023-05-29T15:49:02Z

packages/grid/x-data-grid/src/hooks/features/filter/gridFilterUtils.ts

+  if (appliers.length === 1) {
+    const applier = appliers[0];
+    const applierFn = applier.fn;

-    const filteredAppliers = shouldApplyFilter
-      ? appliers.filter((applier) => shouldApplyFilter(applier.item.field))
-      : appliers;
+    const applierCall = applier.v7
+      ? 'applierFn(row)'
+      : 'applierFn(getRowId ? getRowId(row) : row.id)';
+    const fn = eval(`
+      (row, shouldApplyFilter) => {
+        // ${applierFn.name} <- Keep a ref, prevent the bundler from optimizing away

-    filteredAppliers.forEach((applier) => {
-      resultPerItemId[applier.item.id!] = applier.fn(rowId);
-    });
+        if (shouldApplyFilter && !shouldApplyFilter(applier.item.field)) {
+          return { '${applier.item.id!}': false };
+        }
+        return { '${applier.item.id!}': ${applierCall} };
+      }
+    `);
+
+    return fn;


So if we consider the final results as the baseline, and we decide to not apply this change, the performance degrades by 20%. I know "eval is evil", but in this case we're getting a real performance improvement, so I feel like we should keep it.

Exotic indeed, but also risky - try running this locally 😅

import * as React from 'react'; import { DataGrid } from '@mui/x-data-grid'; const columns = [{ field: 'id' }]; export default function InitialFilters() { const [rows, setRows] = React.useState<any[]>([]); React.useEffect(() => { setRows([{ id: 1 }]); }, []); return ( <div style={{ height: 400, width: '100%' }}> <DataGrid columns={columns} rows={rows} filterModel={{ items: [ { field: 'id', operator: 'equals', value: '1', id: "1': alert('hello') } //", }, ], }} /> </div> ); }

Fair point! Could we reliably escape that value with ... { ${JSON.stringify(String(applier.item.id))}: ...? I feel that if String is guaranteed to return a string, then JSON.stringify should be guaranteed to return valid javascript string representation.

I'm not sure I follow.
${JSON.stringify(String(applier.item.id))} would return the same string, and it will be evaluated as before. Am I missing something?

JSON.stringify is doing the string-quoting, which means it's escaped properly for javascript:

Alternatively using parseInt(applier.item.id) could also work. It would return NaN for invalid payloads, and it's guaranteed to return a number.

Right, I forgot to remove ' ' around JSON.stringify(...) 😅
I agree that considering the performance gains, it's okay to use eval in this case 👍

romgrk · 2023-05-29T15:52:53Z

packages/grid/x-data-grid/src/hooks/features/filter/useGridFilter.tsx

+      for (let i = 0; i < rows.length; i += 1) {
+        const row = rows[i];
+
+        isRowMatchingFilters(row, undefined, result);
+
+        const isRowPassing = passFilterLogicSingle(
+          result.passingFilterItems,
+          result.passingQuickFilterValues,
+          params.filterModel,
+          apiRef,
+        );
+
+        if (isRowPassing) filteredRowsLookup.add(row.id);
+      }
+
+      // XXX: Is props.rows what we want?
+      // XXX: Handle footer rows


TODO: Add autogenerated rows after the loop.

Why not using (tree[GRID_ROOT_GROUP_ID]).children; here as before?

See point 9

Right. We still use the tree for non-flat filtering though.

Maybe it's worth giving another try to Map for dataRowIdToModelLookup? Looking at the related discussion we had in #9120 (comment), it should make it faster, so maybe there was something else causing the slowdown?

This quick benchmark shows that at least property access should be faster when using Map: https://jsperf.app/petawa

I benchmarked by replacing our state model by a Map inside d8. The complexity of JS engines makes it hard to predict the performance solely with a microbenchmark, we should benchmark further the whole process before switching. I'll start with the other points, once I'm done I can look further into using a Map.

packages/grid/x-data-grid/src/models/gridFilterOperator.ts

packages/grid/x-data-grid/src/utils/createSelector.ts

github-actions · 2023-06-01T09:50:26Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

packages/grid/x-data-grid-premium/src/DataGridPremium/useDataGridPremiumComponent.tsx

cherniavskii · 2023-06-02T16:11:45Z

packages/grid/x-data-grid/src/hooks/features/rows/useGridParamsApi.ts

@@ -154,6 +160,38 @@ export function useGridParamsApi(apiRef: React.MutableRefObject<GridPrivateApiCo
    [apiRef, getBaseCellParams],
  );

+  const getRowValue = React.useCallback<GridParamsApi['getRowValue']>(
+    (row, colDef) => {
+      const id = getRowId ? getRowId(row) : row.id;


It seems to produce the same result as getCellValue. What is the added value here?

This is related to point 9: if we're iterating the row objects directly, we need a way to know the cell value without going through dataRowIdToModelLookup, which .getCellValue does. Big lookup objects are expensive, in particular if we need to access them in a loop that's already O(n) to start with.

Would using Set for the dataRowIdToModelLookup help here?

*Map

No, I've tried using Maps and it slows down the benchmark. The issue is really the indirection & additional memory accesses. The row object is basically a direct pointer to the row data. Passing row.id to an hashmap to get the row data again is wasted cycles.

Is dataRowIdToModelLookup slow, because we change its shape many times (adding more and more keys to it) and therefore all its properties are "slow properties" (according to https://v8.dev/blog/fast-properties)?
Is this correct?

Gotcha. I have a suggestion for the getCellValue then - can we support conditional row and column arguments and use them if available, ani f not - it will use getRow() and getColumn as fallback.
What do you think?

What would be the signature?

getCellValue: <V extends any = any>( id: GridRowId, field: string, row?: GridRowModel, colDef?: GridColDef, ) => V;

I have a feeling that we'll always have either both objects or none. How about we keep the API like it's implemented but leave it undocumented, and we can keep iterating & refactoring it before v7? I'll open a PR for the v7 filters shortly, we can continue the discussion there.

cherniavskii · 2023-06-02T16:14:49Z

packages/grid/x-data-grid/src/colDef/gridBooleanOperators.ts

      if (!filterItem.value) {
        return null;
      }

      const valueAsBoolean = filterItem.value === 'true';
-      return ({ value }): boolean => {
+      return (value, _, __, ___): boolean => {


Why not skip the unused arguments?

Old habit, V8 used to have an adaptor for functions with mismatched arity that was expensive, but the overhead is mostly gone now. But I still like to write javascript code that is easy & predictable for engines to optimize. In particular for functions like this one that are run in a hot loop.

Thanks for the article, I have learned a lot about v8 lately 🎉

Np, I love learning about JS engines internals, helps a lot with performance optimization. I'm less familiar with SpiderMonkey though, and I know very little about JSC. But V8 is the most common engine by far.

If you're interested in reading more, those links are all interesting:

https://v8.dev/blog

https://github.com/thlorenz/v8-perf

https://mrale.ph/

cherniavskii · 2023-06-02T16:20:07Z

packages/grid/x-data-grid/src/hooks/features/filter/useGridFilter.tsx

+      for (let i = 0; i < rows.length; i += 1) {
+        const row = rows[i];
+
+        isRowMatchingFilters(row, undefined, result);
+
+        const isRowPassing = passFilterLogicSingle(
+          result.passingFilterItems,
+          result.passingQuickFilterValues,
+          params.filterModel,
+          apiRef,
+        );
+
+        if (isRowPassing) filteredRowsLookup.add(row.id);
+      }
+
+      // XXX: Is props.rows what we want?
+      // XXX: Handle footer rows


Why not using (tree[GRID_ROOT_GROUP_ID]).children; here as before?

oliviertassinari · 2023-06-03T22:26:56Z

From what I can quickly test on

PR: https://deploy-preview-9120--material-ui-x.netlify.app/x/react-data-grid/filtering/#header-filters vs. HEAD: https://mui.com/x/react-data-grid/filtering/header-filters/

with:

In the logs of applyStrategyProcessor I see that we move from 500ms to 20ms 🥹. I suspect that we should reduce the debounce, it feels like a lot:

mui-x/packages/grid/x-data-grid/src/components/panel/filterPanel/GridFilterInputValue.tsx

Line 8 in 8201d0c

export const SUBMIT_FILTER_STROKE_TIME = 500;

(should likely be configurable)

mui-x/packages/grid/x-data-grid/src/components/toolbar/GridToolbarQuickFilter.tsx

Lines 63 to 67 in 8201d0c

    
             /** 
        
              * The debounce time in milliseconds. 
        
              * @default 500 
        
              */ 
        
             debounceMs?: number;

What I would personally explore for client-side filters is to run the filter in idle time (likely overkill for column filter but more relevant for quick search), yield to the main thread when it asks for it, have no debounce or a very small one, and cancel the previous filtering tasks when the input changes.

MBilalShafi

Excellent work 🎉

If you plan something for v7, maybe add that as a point or GH issue in this umbrella issue, the goal is to have all the planned changes in a single place.

MBilalShafi · 2023-06-06T15:51:04Z

packages/grid/x-data-grid-premium/src/hooks/features/rowGrouping/gridRowGroupingUtils.ts

@@ -13,6 +13,7 @@ import {
 import {


Apart from the specific discussions going on, generally speaking, the initiatives taken for the performance enhancement in this PR are really solid, a few of them were actually new to me and they strengthened my basic concepts about how things operate under the hood, so a huge thanks. I think the direction we are going in is outstanding.
I'll really appreciate it if it's possible to extract these changes (as we already discussed) into smaller PRs to:

Have a stronger spotlight on the optimization each of the changes is bringing in

To measure the impact of each change separately (and possibly push it further)

To assess which of the changes could be applicable to the other areas of the application.

1, 2: Points no 1 and 2 are certainly the most impacting ones, although we are going to have a workaround (.v7), but with v7 around the corner and the benefits internally applied, that seems a step forward.
3: Non-cached selector option is a nice improvement too, why need to cache something when it could be accessed with a simple (.) notation.
4: It seems, the indirection (or checking of properties was to stop initializing if already did or have stable references, but I am not sure why it was wrapped inside the register function). It'll be good to simplify it if it doesn't cause a side effect.
5. I have never used eval this way, maybe discuss this one separately and see if there are any loose grounds?
6, 7: These are good, I think we should do them where possible in a non-breakable way and plan for v7 for the remaining part

Yes, let's continue the ongoing discussions in the threads above, I'll open separate PRs as we reach an agreement for each of those.

romgrk · 2023-06-12T16:10:10Z

TODO: getDefaultFilterModel() allocates a new value each time and is called on every iteration.

romgrk · 2024-01-17T10:53:05Z

Superseded by the split PRs.

perf: dont do expensive allocations

2117914

romgrk added performance component: data grid This is the name of the generic UI component, not the React module! labels May 26, 2023

romgrk added 6 commits May 25, 2023 21:45

perf: fast filters

0b1d56e

perf: non-memoized selectors

5409eb7

perf: reduced indirection in API methods

32620fc

perf: avoid dynamic object prop

19902ee

perf: direct state access

a117b38

perf: use Set for rows lookup

e222e9a

romgrk force-pushed the perf-filtering branch from ad395ee to e222e9a Compare May 26, 2023 01:57

romgrk mentioned this pull request May 26, 2023

[DataGrid] Avoid allocations in hydrateRowsMeta #9121

Merged

cherniavskii reviewed May 26, 2023

View reviewed changes

packages/grid/x-data-grid/src/hooks/features/filter/gridFilterUtils.ts Show resolved Hide resolved

packages/grid/x-data-grid/src/hooks/features/filter/gridFilterUtils.ts Outdated Show resolved Hide resolved

cherniavskii reviewed May 26, 2023

View reviewed changes

romgrk added 13 commits May 28, 2023 19:11

perf: remove allocations

efb90cf

perf: specialize createRawSelector

bbe66fd

lint

2fb95ff

lint

a674f20

perf: specialize passFilterLogic

e687e0d

lint

02977be

lint

b917232

chore: synchronous demo data

65694fb

perf: new filtering API

6296c81

perf: new filtering API

460b6e3

perf: remove API Proxy wrapper

b665573

lint

4b5164c

fix: minor issues

072c659

romgrk commented May 29, 2023

View reviewed changes

packages/grid/x-data-grid/src/models/gridFilterOperator.ts Outdated Show resolved Hide resolved

romgrk commented May 29, 2023

View reviewed changes

packages/grid/x-data-grid/src/utils/createSelector.ts Show resolved Hide resolved

lint

aa90c79

This was referenced May 31, 2023

[DataGrid] Quick filter performance #9167

Open

[DataGrid] Indicate internal loading state #8197

Open

github-actions bot added the PR: out-of-date The pull request has merge conflicts and can't be merged label Jun 1, 2023

cherniavskii reviewed Jun 2, 2023

View reviewed changes

packages/grid/x-data-grid-premium/src/DataGridPremium/useDataGridPremiumComponent.tsx Show resolved Hide resolved

cherniavskii reviewed Jun 2, 2023

View reviewed changes

lint

189acf0

romgrk mentioned this pull request Jun 3, 2023

[DataGrid] WASM-powered quick filter #9206

Open

romgrk mentioned this pull request Jun 4, 2023

[DataGrid] Improve grouping performance for large datasets #9200

Merged

MBilalShafi reviewed Jun 6, 2023

View reviewed changes

This was referenced Jun 7, 2023

[DataGrid] Filtering performance: V7 API #9254

Merged

[DataGrid] Filtering performance: cache values #9284

Merged

[DataGrid] Filtering performance: use unmemoized selectors by default #9287

Merged

romgrk mentioned this pull request Jun 14, 2023

[DataGrid] Filtering performance: remove indirection #9334

Merged

romgrk mentioned this pull request Jul 10, 2023

[DataGrid] Filtering performance: compile filter applier with eval #9635

Merged

This was referenced Jul 10, 2023

[data grid] Input lag in GridToolbarQuickFilter on slowly typing #6783

Closed

[data grid] Improve quick filter performance (INP) #9657

Open

This was referenced Aug 16, 2023

[DataGrid] Fix eval blocked by CSP #9863

Merged

[DataGrid] Fix row id bug #10051

Merged

[DataGrid] Filtering performance: use Set() when applicable #10068

Closed

romgrk closed this Jan 17, 2024

romgrk deleted the perf-filtering branch January 17, 2024 10:57

[DataGrid] Filtering performance #9120

[DataGrid] Filtering performance #9120

Conversation

romgrk commented May 26, 2023 • edited Loading

Summary

Results

Changes & observations

1. Fast filters

2. Memoize filtered items

Benchmarks

mui-bot commented May 26, 2023 • edited Loading

Netlify deploy preview

Updated pages

romgrk commented May 26, 2023 • edited Loading

Changes

3. Non-memoized selectors

4. Reduce API methods indirection

5. Avoid dynamic object props

6. Direct state access

7. Use Set

Benchmarks

flaviendelangle commented May 26, 2023

cherniavskii left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romgrk commented May 28, 2023 • edited Loading

8. Remove API Proxy wrapper

9. Iterate rows directly

Final results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oliviertassinari commented Jun 3, 2023 • edited Loading

MBilalShafi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romgrk commented Jun 12, 2023

romgrk commented Jan 17, 2024

romgrk commented May 26, 2023 •

edited

Loading

mui-bot commented May 26, 2023 •

edited

Loading

romgrk commented May 26, 2023 •

edited

Loading

7. Use `Set`

romgrk commented May 28, 2023 •

edited

Loading

oliviertassinari commented Jun 3, 2023 •

edited

Loading