-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: uplift Lucene to 4.10.4 #74
Conversation
It passes the COUCH_USER=adm COUCH_PASS=pass .venv/bin/python3 -m nose2 ... ........s.ss.s.s.............s.ss.s.s.......F.FF.s.ss.s.s........................................................................................................................................FF...................s...................................................FF...............................................................................F.....FF.......
======================================================================
FAIL: test_elem_match (03-operator-test.OperatorTextTests.test_elem_match)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/03-operator-test.py", line 48, in test_elem_match
self.assertEqual(len(docs), 1)
AssertionError: 0 != 1
======================================================================
FAIL: test_eq_null_does_not_include_missing (03-operator-test.OperatorTextTests.test_eq_null_does_not_include_missing)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/03-operator-test.py", line 113, in test_eq_null_does_not_include_missing
self.assertUserIds(user_ids, docs)
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/03-operator-test.py", line 22, in assertUserIds
self.assertEqual(user_ids, user_ids_returned)
AssertionError: Lists differ: [9] != []
First list contains 1 additional elements.
First extra element 0:
9
- [9]
? -
+ []
======================================================================
FAIL: test_exists_false (03-operator-test.OperatorTextTests.test_exists_false)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/03-operator-test.py", line 106, in test_exists_false
self.assertUserIds(user_ids, docs)
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/03-operator-test.py", line 22, in assertUserIds
self.assertEqual(user_ids, user_ids_returned)
AssertionError: Lists differ: [2, 3, 5, 6, 7, 8, 10, 11, 12, 14] != []
First list contains 10 additional elements.
First extra element 0:
2
- [2, 3, 5, 6, 7, 8, 10, 11, 12, 14]
+ []
======================================================================
FAIL: test_gt (06-basic-text-test.BasicTextTests.test_gt)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/06-basic-text-test.py", line 152, in test_gt
assert len(docs) == 2
^^^^^^^^^^^^^^
AssertionError
======================================================================
FAIL: test_gte (06-basic-text-test.BasicTextTests.test_gte)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/06-basic-text-test.py", line 175, in test_gte
assert len(docs) == 2
^^^^^^^^^^^^^^
AssertionError
======================================================================
FAIL: test_limit_bookmark (08-text-limit-test.LimitTests.test_limit_bookmark)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/08-text-limit-test.py", line 97, in test_limit_bookmark
self.run_bookmark_check(i)
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/08-text-limit-test.py", line 115, in run_bookmark_check
assert len(seen_docs) == len(limit_docs.DOCS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
======================================================================
FAIL: test_limit_field (08-text-limit-test.LimitTests.test_limit_field)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/08-text-limit-test.py", line 23, in test_limit_field
assert len(docs) == 8
^^^^^^^^^^^^^^
AssertionError
======================================================================
FAIL: test_guess_dup_type_sort (09-text-sort-test.SortTests.test_guess_dup_type_sort)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/09-text-sort-test.py", line 79, in test_guess_dup_type_sort
self.assertEqual(len(docs), 15)
AssertionError: 0 != 15
======================================================================
FAIL: test_number_sort (09-text-sort-test.SortTests.test_number_sort)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/09-text-sort-test.py", line 22, in test_number_sort
self.assertEqual(len(docs), 15)
AssertionError: 0 != 15
======================================================================
FAIL: test_number_sort_desc (09-text-sort-test.SortTests.test_number_sort_desc)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/gaborpali/projects/github/couchdb/src/mango/test/09-text-sort-test.py", line 28, in test_number_sort_desc
self.assertEqual(len(docs), 15)
AssertionError: 0 != 15
----------------------------------------------------------------------
Ran 362 tests in 26.674s
FAILED (failures=10, skipped=16) The common reason for the failures is that Clouseau does not return any value. Based on my investigations so far, it works as expected up to 4.8.1, and it starts to fail for 4.9.0 and later versions. I have checked the change log, but none of the API changes directly indicate anything related. |
Awesome work! I tested it, and see the same test failures:
Also noting that my machine appears to be somewhat slower than yours (my ~40s vs your ~27s), which I wonder if accounts for why the write lock timeout error seems more reliable on your hardware? FWIW, I also ran these 2 tests 1000 times in a row, and didn't see any errors:
|
👍 But, as indicated, I will have to nag @rnewson for hints on why it does not fully work.
Thanks for the confirmation!
Yes, that is possible. However, I thought that you have an 2021 M1 Max too.
You will not, these tests are faulty. And I think I have found the cause for that -- I will soon submit a PR about that for |
Yeah, I've been debugging them a bit today. Seems like the user docs get bulk loaded into the db, but then db info only shows the ddocs, but also no deleted docs. I'm looking forward to the 🤦 |
Here are the bugs of |
One of the queries that return 0 hits starting from 4.9.0:
{
"_id": {"$gt": null},
"bang": {"$elemMatch": {"foo": {"$gte": 1}, "bam": true}},
} which translates to:
Note that the following queries work at the same time:
and
|
After taking a closer look at the rest of the failing tests cases, I noticed that it is the ranges on numbers that cause the problems. More specifically, ranges with no lower or upper limits, i.e. the use of the The previous query works if the
If CouchDB is modified according to this observation, diff --git a/src/mango/src/mango_selector_text.erl b/src/mango/src/mango_selector_text.erl
index 1f8609ac2..4b46031eb 100644
--- a/src/mango/src/mango_selector_text.erl
+++ b/src/mango/src/mango_selector_text.erl
@@ -304,11 +304,11 @@ range(gt, Arg) ->
[<<"{">>, value_str(Arg), <<" TO ", Max/binary, "]">>].
get_range(min, Arg) when is_number(Arg) ->
- <<"-Infinity">>;
+ <<"-infinity">>;
get_range(min, _Arg) ->
<<"\"\"">>;
get_range(max, Arg) when is_number(Arg) ->
- <<"Infinity">>;
+ <<"infinity">>;
get_range(max, _Arg) ->
<<"\u0x10FFFF">>. |
Unfortunately, this latest fix does not solve the problem, only masks it. That is because using I was playing further with the Lucene queries. For the
This can be lowered until
When
At the same time, the inclusive ranges do not include the value of the lower bound. That is the following query will not include
|
For the reference, here are the commands that I used for the investigation: # Run in the CouchDB git clone:
MANGO_TESTS_KEEP_DBS=1 make mango-test MANGO_TEST_OPTS="03-operator-test.OperatorTextTests.test_elem_match"
# Start CouchDB manually via dev/run (-a adm:pass -n 1), then:
MANGO_DB=$(curl -sS http://adm:pass@localhost:15984/_all_dbs | jq -r '.[]|select(startswith("mango_test_"))')
# Set the query:
QUERY="age_3anumber:[2.1 TO Infinity]"
# Run the query:
curl -s -X POST -H 'Content-Type: application/json' "http://adm:pass@localhost:15984/$MANGO_DB/_design/cedc01a027213706d7260b5e5b73c70b9233743a/_search/cedc01a027213706d7260b5e5b73c70b9233743a" -d '{"q":"'$QUERY'","include_docs":true}' | jq -r '.' |
I think I have the source of the problem. LUCENE-5609 applies an optimization which technically implies a change in the run-time behavior because |
@@ -62,7 +62,7 @@ class ClouseauQueryParser(version: Version, | |||
startInclusive: Boolean, | |||
endInclusive: Boolean): Query = { | |||
if (isNumber(lower) && isNumber(upper)) { | |||
NumericRangeQuery.newDoubleRange(field, 8, lower.toDouble, | |||
NumericRangeQuery.newDoubleRange(field, 16, lower.toDouble, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This triggered some investigation on my part, coupled with the other comments about precision step default change.
Seems we currently index with precision step of 4 (default for 4.6.1) but query with precision step 8. That's probably a bug, albeit one we've had for a long time.
I don't think we can change this when we upgrade as it might cause search results to change.
Instead, we need to keep the above as 8
and change how we index doubles to continue to do so with precision step of 4 not the 'new' default of 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 742670b for my attempt at doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have had a feeling that this is going to be something that is problematic. The optimization leads to smaller indexes which means they need to be rebuilt, right? The integration tests work with fresh indexes, but for existing deployments this could be a problem.
Thank you very much for the proposal, I will have take a look at it and integrate soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3519e68 rather
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it means that search results could change. some double values will be quantised to different values before and after this change. Existing indexes will have a mixture of the old and new precision step. Simply put, we can change precision step (and other things) but it requires an index rebuild to get the right answers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also quoting from 5609;
To be simple, i think a change we make has to be a multiple of 4: the current default.
This way we have easy backwards compat with existing indexes for users that just go with the default (since NumericRangeQuery has ctors that use the default precisionStep, and multiples of the index-time work at query-time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I do not see how does all this translates to the current situation. Should we change the Double
field type wrapper to set the numeric precision step to 8
instead of 4
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, there are several issues here.
Clouseau has apparently always been wrong on this. We are indexing numeric values with precision step of 4 but only ever querying with precision step of 8. Half of those terms are never queried, there's no possible path to them, but we took the time to index them and store them.
Later version of Lucene have changed the default (from 4 to 16), which triggered this conversation.
I don't think we should change to 16 as it would change the results of current queries (I think, but this should be challenged, perhaps I misunderstand the implications).
So, the least we must do is prevent the default from changing to 16, and that requires the FieldType changes I show in my commit above.
The only question remaining is whether we keep exactly what we have today (4 for indexing, 8 for querying), or whether we address that mistake and switch to 8 for indexing and querying.
If I'm wrong about the implications of 16 then we could of course do nothing, but someone other than me will need to do that investigation, or it's just an echo chamber.
Specifically, I worry that a document might no longer match a given query if we go from 4 to 16 because the numeric term that would match in the former case simply doesn't exist in the latter case.
(as a primer, the way old versions of Lucene index numbers is to convert them into a series of specially formatted strings which are then indexed as terms. Newer versions of Lucene use a better system called points
and the notion of precision step no longer applies).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting there's a good chance that precision step is "just" about performance. the actual numeric value is indexed in all cases, but not all of the speed-up / lower precision terms are. This is all ancient history and poorly remembered (by me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation.
This uplift would require a validation to eliminate the generic worries around the potential index format changes anyway. Once we had the related end-to-end tests implemented, we can also experiment with the various configurations on precision step.
} | ||
drilldownQuery.add(categoryPaths: _*) | ||
val dim = category.head | ||
for (path <- category.tail) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same effect? It appears to loop over each item in the tail and add each of them to the drilldownQuery. whereas the original code only calls drilldownQuery.add once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Facet API has changed, it is not possible to call DrillDownQuery/add
with the full list any more, c.f. 4.6.1 vs. 4.10.4. Note that there is a loop inside the 4.6.1 version. It uses the following definition to determine the dim
:
String dim = paths[0].components[0];
and that is what I considered the head
here, also in accordance with the removed else
branch (the one with the note about the removal once leaving Lucene 4.6 behind).
29df361
to
37c6329
Compare
Update Clouseau to use the latest available Lucene version from the 4.x series. This commit brings many bug fixes while it maintains the backward compatibility for the existing indexes. 4.10.4 has 45 API changes and 116 bug fixes since 4.6.1, for the details see the change log [1]. [1] https://lucene.apache.org/core/4_10_4/changes/Changes.html
I am closing now in favor of #80. I will leave the changes around and we can revisit the uplift later. |
Update Clouseau to use the latest available Lucene version from the 4.x series. This change brings many bug fixes while it maintains the backward compatibility for the existing indexes. 4.10.4 has 45 API changes and 116 bug fixes since 4.6.1, for the details see the change log.