fix: issue with max call stack when search #437

H4ad · 2023-07-05T00:09:09Z

Closes #301

I just have a simple issue with the .push.

I added a function instead of just the if statement because probably orama can have other parts with the same issue.

PS: To be able to test/run the script on #301 without taking several minutes/hours, merge this branch with #434.

vercel · 2023-07-05T00:09:13Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
orama-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 6, 2023 0:52am

allevo · 2023-07-06T09:19:30Z

Hi!
Thanks for this contribution! Will this affect the search performance when the result count is less than the limit?
How the value 10_000 is chosen?

H4ad · 2023-07-06T11:27:57Z

@allevo I describe more the reason here: https://github.com/oramasearch/orama/pull/437/files#diff-eeb84b4af585397ddd81a97513980b6a568f6b3e82a0a597effba24233873a40R13-R18

But essentially, was just the max value divided by 10.

About the search time, from what I see, the performance for more than 10K could be increased but not too much, both operations are very performant compared to the time that will take to search that amount of items.

Also, push is more performance than concat: https://www.measurethat.net/Benchmarks/Show/4223/0/array-concat-vs-spread-operator-vs-push

allevo · 2023-07-11T15:31:41Z

I make some benchmarks for this PR, which apparently introduces a performance gain.

I used this code:

import { create as createDev, insert as insertDev } from "orama-dev"
import { create as createStable, insert as insertStable } from "@orama/orama"
import { Benchmark } from "kelonio"
import { create as createDev, insert as insertDev } from "orama-dev"
import { create as createStable, insert as insertStable } from "@orama/orama"
import { faker } from '@faker-js/faker'

async function createSearch() {
  const n = 30000
  const data1 = Array.from({ length: n }).map(() => faker.string.sample())
  const data2 = [...data1]

  const benchmark = new Benchmark()
  const db1 = await createStable({ schema: { name: "string" } })
  for (const name of data1) {
    await insertStable(db1, { name })
  }  
  
  const db2 = await createDev({ schema: { name: "string" } })
  for (const name of data2) {
    await insertDev(db2, { name })
  }  

  await benchmark.record("search - stable", async () => await insertStable(db1, { name: data1.pop() }), {
    iterations: n,
  })
  await benchmark.record("search - dev", async () => await insertDev(db2, { name: data2.pop() }), {
    iterations: n,
  })
  printResult(benchmark)
}


function printResult(benchmark: Benchmark) {
  const data = benchmark.data
  const names = Object.keys(data)
  const results = names.map(name => {
    const result = data[name]
    const mean = result.durations.reduce((a, b) => a + b, 0) / result.durations.length

    return { name, mean, totalDuration: result.totalDuration }
  })

  results.sort((a, b) => a.mean - b.mean)

  console.log(
    results.map(result => `${result.name}: ${result.mean} ms`).join("\n")
  )
  console.log('\n\n\n')
}

Prints

search - stable: 1.754566665599999 ms
search - dev: 2.1923525156000085 ms

H4ad · 2023-07-11T15:35:56Z

Are you not testing just the insert? This PR will only affect the search.

allevo · 2023-07-11T15:42:20Z

Sorry my bad. I posted the wrong test:

async function createSearch() {
  const n = 30000
  const data1 = Array.from({ length: n }).map(() => faker.string.sample())
  const data2 = [...data1]

  const benchmark = new Benchmark()
  const db1 = await createStable({ schema: { name: "string" } })
  for (const name of data1) {
    await insertStable(db1, { name })
  }  
  
  const db2 = await createDev({ schema: { name: "string" } })
  for (const name of data2) {
    await insertDev(db2, { name })
  }  

  await benchmark.record("search - stable", async () => await insertStable(db1, { name: data1.pop() }), {
    iterations: n,
  })
  await benchmark.record("search - dev", async () => await insertDev(db2, { name: data2.pop() }), {
    iterations: n,
  })
  printResult(benchmark)
}

H4ad · 2023-07-18T12:03:38Z

Hey @allevo, do you have any suggestions on this change?

allevo · 2023-07-18T15:02:05Z

Something like

export function safeAddNewItems<T>(arr: T[], newArr: T[]) {

  if (newArr.length < MAX_ARGUMENT_FOR_STACK) {
    arr.push(...newArr)
  } else {
    for (let i = 0; i < newArr.length; i += MAX_ARGUMENT_FOR_STACK) {
      arr.push(...newArr.slice(i, i + MAX_ARGUMENT_FOR_STACK))
    }
  }
}

I didn't tested the performance differences.

rawpixel-vincent · 2023-07-19T07:42:47Z

Hi,
I've got this issue and tested this PR, it fixes my issues but I had to use safeAddNewItems in two more place inside of searchByWhereClause for my search query to work.

Attached the patch I use against 1.0.10 (also have a fix in removeDocument that crash when the node doesnt exists, part of updateMultiple)
@orama+orama+1.0.10.patch

Search query I use:

await SearchDB.search('program', {
    sortBy: { property: 'date', order: 'DESC' },
    exact: false,
    where: {
      business: businessIds?.length ? ['none', ...businessIds] : 'none',
      published: true,
    },
    properties: ['title', 'author', 'tags'],
    term: matchKeys,
    limit: pagesize,
    offset: offset || 0,
    threshold: 0.6,
  });

Schema:

const SCHEMA = {
id: 'string',
title: 'string',
author: 'string',
tags: 'string',
published: 'boolean',
date: 'number',
business: 'string',
};

the dpack size is aroun 400MB (900k docs)

ps: Performance are really good, love the idea of this project ❤️

H4ad · 2023-07-20T01:38:40Z

@allevo I changed to use your suggestion, way better than cloning the array every time.

And @rawpixel-vincent, thanks for your suggestions, I ended up adding the method for every .push in the code just to be sure. Also, I added you as Co-Author for the fix of avl.ts, thanks.

packages/orama/src/components/groups.ts

allevo · 2023-07-20T07:43:16Z

@H4ad Could you align this branch to the main? I'm curious to see the difference in the performance.

micheleriva

LGTM

allevo · 2023-07-20T13:43:44Z

This PR introduces an overhead:

search string - stable: 2.092553788 ms
search string - dev: 2.301789926000001 ms

I wonder if we want to fix the issue: your search is very short term. In fact, the problem is raised only when an index returns more than 100K elements. In that case, an error is thrown.

@micheleriva WDYT?

micheleriva · 2023-07-20T13:58:19Z

By an index returns more than 100K elements you mean when limit is set to 100_000? If so, we shouldn't return 100k results in the first place.

H4ad · 2023-07-20T14:11:58Z

Orama gets the IDs of all documents to filter and sort, then, it limits that list by the number of items the user wants, usually 10.

Now, for every list that push more than 10K elements, we batch push instead of push all those items to avoid issue with max call stack.

micheleriva · 2023-07-24T08:06:47Z

I think that looks good, @H4ad could you solve conflicts before merging? Thanks!

H4ad · 2023-07-24T11:35:08Z

@micheleriva Done!

micheleriva

LGTM

micheleriva · 2023-08-17T08:36:10Z

This PR introduces an overhead:
search string - stable: 2.092553788 ms
search string - dev: 2.301789926000001 ms
I wonder if we want to fix the issue: your search is very short term. In fact, the problem is raised only when an index returns more than 100K elements. In that case, an error is thrown.

@micheleriva WDYT?

Hi @H4ad, I'm a bit concerned about this performance degradation. I'm not sure it's worth it to add such a overhead for this edge case

H4ad · 2023-08-17T20:12:26Z

@micheleriva The overhead is about 0.3ms for 30k items, for most of the cases, it will not even be noticed.

I think worth the change since is better to be a little bit slower than crash.

Also, we can try experiment increase the amount of items per slice, the limit is more than 100k and we are using an arbitrary number of 10k, we can try increase to 20k, 40k, etc...

allevo · 2023-08-18T08:11:17Z

The crash happens only when a limit is reached. What happens if we set MAX_ARGUMENT_FOR_STACK to 99_000?
Are there some performance gains also when the limit is high?

H4ad · 2023-08-27T17:49:37Z

@allevo I think we can increase, @micheleriva can you do more tests with different values of MAX_ARGUMENT_FOR_STACK, it just need to be less than 100K.

H4ad · 2023-08-27T17:57:03Z

Accords to this link, webkit has a lower limit, 65k items.

allevo · 2023-09-04T13:32:17Z

It's OK to put the magic number to 65K: this allows not to throw and limits the performance issue to large results searches.

micheleriva · 2023-09-05T04:44:45Z

Let's use 65k as limit then. I'll merge this as soon as the conflicts are fixed

Co-authored-by: Vincent Baronnet <vincent@rawpixel.com>

Co-authored-by: Alexandr Pestryakov <mertico@yandex.ru>

micheleriva

LGTM

BanderasPRO · 2023-10-03T17:17:41Z

still presented:
#469 (comment)

reopen or new issue ?

vercel bot deployed to Preview July 5, 2023 00:10 View deployment

vercel bot deployed to Preview July 20, 2023 01:38 View deployment

micheleriva reviewed Jul 20, 2023

View reviewed changes

packages/orama/src/components/groups.ts Show resolved Hide resolved

vercel bot deployed to Preview July 20, 2023 11:25 View deployment

micheleriva approved these changes Jul 20, 2023

View reviewed changes

vercel bot deployed to Preview July 24, 2023 11:32 View deployment

micheleriva approved these changes Jul 28, 2023

View reviewed changes

vercel bot deployed to Preview August 16, 2023 13:27 View deployment

H4ad mentioned this pull request Aug 27, 2023

fix: Fixed copy and maximum call stack exceeded #469

Closed

H4ad and others added 3 commits September 5, 2023 21:21

fix: issue with max call stack when search

44007b9

fixup! fix: issue with max call stack when search

a87d61e

fix: node being possible nullable when removing document

9530055

Co-authored-by: Vincent Baronnet <vincent@rawpixel.com>

H4ad force-pushed the fix/max-call-stack-length branch from 855b7fa to 78460dd Compare September 6, 2023 00:27

vercel bot deployed to Preview September 6, 2023 00:29 View deployment

perf(utils): increase max arguments & avoid shallow clone

b21577c

Co-authored-by: Alexandr Pestryakov <mertico@yandex.ru>

H4ad force-pushed the fix/max-call-stack-length branch from 78460dd to b21577c Compare September 6, 2023 00:52

vercel bot deployed to Preview September 6, 2023 00:52 View deployment

micheleriva approved these changes Sep 6, 2023

View reviewed changes

micheleriva merged commit cdb2d5d into oramasearch:main Sep 6, 2023
2 checks passed

H4ad deleted the fix/max-call-stack-length branch September 6, 2023 00:59

BanderasPRO mentioned this pull request Oct 3, 2023

RangeError: Maximum call stack size exceeded #506

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: issue with max call stack when search #437

fix: issue with max call stack when search #437

H4ad commented Jul 5, 2023

vercel bot commented Jul 5, 2023 •

edited

Loading

allevo commented Jul 6, 2023

H4ad commented Jul 6, 2023

allevo commented Jul 11, 2023

H4ad commented Jul 11, 2023

allevo commented Jul 11, 2023

H4ad commented Jul 18, 2023

allevo commented Jul 18, 2023

rawpixel-vincent commented Jul 19, 2023 •

edited

Loading

H4ad commented Jul 20, 2023

allevo commented Jul 20, 2023

micheleriva left a comment

allevo commented Jul 20, 2023 •

edited

Loading

micheleriva commented Jul 20, 2023

H4ad commented Jul 20, 2023

micheleriva commented Jul 24, 2023

H4ad commented Jul 24, 2023

micheleriva left a comment

micheleriva commented Aug 17, 2023

H4ad commented Aug 17, 2023 •

edited

Loading

allevo commented Aug 18, 2023

H4ad commented Aug 27, 2023

H4ad commented Aug 27, 2023

allevo commented Sep 4, 2023

micheleriva commented Sep 5, 2023

micheleriva left a comment

BanderasPRO commented Oct 3, 2023

fix: issue with max call stack when search #437

fix: issue with max call stack when search #437

Conversation

H4ad commented Jul 5, 2023

vercel bot commented Jul 5, 2023 • edited Loading

allevo commented Jul 6, 2023

H4ad commented Jul 6, 2023

allevo commented Jul 11, 2023

H4ad commented Jul 11, 2023

allevo commented Jul 11, 2023

H4ad commented Jul 18, 2023

allevo commented Jul 18, 2023

rawpixel-vincent commented Jul 19, 2023 • edited Loading

H4ad commented Jul 20, 2023

allevo commented Jul 20, 2023

micheleriva left a comment

Choose a reason for hiding this comment

allevo commented Jul 20, 2023 • edited Loading

micheleriva commented Jul 20, 2023

H4ad commented Jul 20, 2023

micheleriva commented Jul 24, 2023

H4ad commented Jul 24, 2023

micheleriva left a comment

Choose a reason for hiding this comment

micheleriva commented Aug 17, 2023

H4ad commented Aug 17, 2023 • edited Loading

allevo commented Aug 18, 2023

H4ad commented Aug 27, 2023

H4ad commented Aug 27, 2023

allevo commented Sep 4, 2023

micheleriva commented Sep 5, 2023

micheleriva left a comment

Choose a reason for hiding this comment

BanderasPRO commented Oct 3, 2023

vercel bot commented Jul 5, 2023 •

edited

Loading

rawpixel-vincent commented Jul 19, 2023 •

edited

Loading

allevo commented Jul 20, 2023 •

edited

Loading

H4ad commented Aug 17, 2023 •

edited

Loading