-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: issue with max call stack when search #437
fix: issue with max call stack when search #437
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Hi! |
@allevo I describe more the reason here: https://github.com/oramasearch/orama/pull/437/files#diff-eeb84b4af585397ddd81a97513980b6a568f6b3e82a0a597effba24233873a40R13-R18 But essentially, was just the max value divided by 10. About the search time, from what I see, the performance for more than 10K could be increased but not too much, both operations are very performant compared to the time that will take to search that amount of items. Also, push is more performance than concat: https://www.measurethat.net/Benchmarks/Show/4223/0/array-concat-vs-spread-operator-vs-push |
I make some benchmarks for this PR, which apparently introduces a performance gain. I used this code: import { create as createDev, insert as insertDev } from "orama-dev"
import { create as createStable, insert as insertStable } from "@orama/orama"
import { Benchmark } from "kelonio"
import { create as createDev, insert as insertDev } from "orama-dev"
import { create as createStable, insert as insertStable } from "@orama/orama"
import { faker } from '@faker-js/faker'
async function createSearch() {
const n = 30000
const data1 = Array.from({ length: n }).map(() => faker.string.sample())
const data2 = [...data1]
const benchmark = new Benchmark()
const db1 = await createStable({ schema: { name: "string" } })
for (const name of data1) {
await insertStable(db1, { name })
}
const db2 = await createDev({ schema: { name: "string" } })
for (const name of data2) {
await insertDev(db2, { name })
}
await benchmark.record("search - stable", async () => await insertStable(db1, { name: data1.pop() }), {
iterations: n,
})
await benchmark.record("search - dev", async () => await insertDev(db2, { name: data2.pop() }), {
iterations: n,
})
printResult(benchmark)
}
function printResult(benchmark: Benchmark) {
const data = benchmark.data
const names = Object.keys(data)
const results = names.map(name => {
const result = data[name]
const mean = result.durations.reduce((a, b) => a + b, 0) / result.durations.length
return { name, mean, totalDuration: result.totalDuration }
})
results.sort((a, b) => a.mean - b.mean)
console.log(
results.map(result => `${result.name}: ${result.mean} ms`).join("\n")
)
console.log('\n\n\n')
} Prints
|
Are you not testing just the insert? This PR will only affect the search. |
Sorry my bad. I posted the wrong test: async function createSearch() {
const n = 30000
const data1 = Array.from({ length: n }).map(() => faker.string.sample())
const data2 = [...data1]
const benchmark = new Benchmark()
const db1 = await createStable({ schema: { name: "string" } })
for (const name of data1) {
await insertStable(db1, { name })
}
const db2 = await createDev({ schema: { name: "string" } })
for (const name of data2) {
await insertDev(db2, { name })
}
await benchmark.record("search - stable", async () => await insertStable(db1, { name: data1.pop() }), {
iterations: n,
})
await benchmark.record("search - dev", async () => await insertDev(db2, { name: data2.pop() }), {
iterations: n,
})
printResult(benchmark)
} |
Hey @allevo, do you have any suggestions on this change? |
Something like export function safeAddNewItems<T>(arr: T[], newArr: T[]) {
if (newArr.length < MAX_ARGUMENT_FOR_STACK) {
arr.push(...newArr)
} else {
for (let i = 0; i < newArr.length; i += MAX_ARGUMENT_FOR_STACK) {
arr.push(...newArr.slice(i, i + MAX_ARGUMENT_FOR_STACK))
}
}
} I didn't tested the performance differences. |
Hi, Attached the patch I use against 1.0.10 (also have a fix in removeDocument that crash when the node doesnt exists, part of updateMultiple) Search query I use:
Schema:
the dpack size is aroun 400MB (900k docs) ps: Performance are really good, love the idea of this project ❤️ |
@allevo I changed to use your suggestion, way better than cloning the array every time. And @rawpixel-vincent, thanks for your suggestions, I ended up adding the method for every |
@H4ad Could you align this branch to the main? I'm curious to see the difference in the performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR introduces an overhead:
I wonder if we want to fix the issue: your search is very short term. In fact, the problem is raised only when an index returns more than 100K elements. In that case, an error is thrown. @micheleriva WDYT? |
By |
Orama gets the IDs of all documents to Now, for every list that push more than 10K elements, we batch push instead of push all those items to avoid issue with max call stack. |
I think that looks good, @H4ad could you solve conflicts before merging? Thanks! |
@micheleriva Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @H4ad, I'm a bit concerned about this performance degradation. I'm not sure it's worth it to add such a overhead for this edge case |
@micheleriva The overhead is about 0.3ms for 30k items, for most of the cases, it will not even be noticed. I think worth the change since is better to be a little bit slower than crash. Also, we can try experiment increase the amount of items per slice, the limit is more than 100k and we are using an arbitrary number of 10k, we can try increase to 20k, 40k, etc... |
The crash happens only when a limit is reached. What happens if we set |
@allevo I think we can increase, @micheleriva can you do more tests with different values of |
Accords to this link, webkit has a lower limit, 65k items. |
It's OK to put the magic number to 65K: this allows not to throw and limits the performance issue to large results searches. |
Let's use 65k as limit then. I'll merge this as soon as the conflicts are fixed |
Co-authored-by: Vincent Baronnet <vincent@rawpixel.com>
855b7fa
to
78460dd
Compare
Co-authored-by: Alexandr Pestryakov <mertico@yandex.ru>
78460dd
to
b21577c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
still presented: reopen or new issue ? |
Closes #301
I just have a simple issue with the
.push
.I added a function instead of just the if statement because probably
orama
can have other parts with the same issue.PS: To be able to test/run the script on #301 without taking several minutes/hours, merge this branch with #434.