Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #767: Second part of "no-index" changes: allow large non-indexed String values, implicit allow for "$vector" #792

Merged
merged 23 commits into from
Jan 18, 2024

Conversation

tatu-at-datastax
Copy link
Contributor

@tatu-at-datastax tatu-at-datastax commented Jan 12, 2024

What this PR does:

Completes changes needed by #767 by:

  1. Adding more testing for Shredder
  2. Fix the issue wrt dropping indexing of "$vector" (needs handling similar to _id for query projections, i.e. always index)
  3. Changing validation to ignore length of non-Indexed Strings (only limited by max doc size)

These are resolved by refactoring validation logic into smaller parts, run at different times during shredding process.

Which issue(s) this PR fixes:
Fixes #767

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

@tatu-at-datastax tatu-at-datastax self-assigned this Jan 12, 2024
Copy link
Collaborator

@vkarpov15 vkarpov15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far 👍

@tatu-at-datastax
Copy link
Contributor Author

tatu-at-datastax commented Jan 16, 2024

Remaining issue: when returning "after" Document from findOneAndXxx() need to retain it without pruning for "no-index", may need to create a deep copy.

@tatu-at-datastax tatu-at-datastax marked this pull request as ready for review January 16, 2024 22:28
@tatu-at-datastax tatu-at-datastax requested a review from a team as a code owner January 16, 2024 22:28
Copy link
Contributor

@maheshrajamani maheshrajamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good except for the clarification on $vector field

@tatu-at-datastax
Copy link
Contributor Author

@maheshrajamani On $vector: I was thinking about first automatically adding it to allow list (sort of similar to _id) like you suggested, but since vector handling is different in other parts this seemed like slightly better way. I tried removing it automatically, but then noticed the problem with "after" document returning where it usually needs to be retained. But I think either way there is need for some special casing -- and if we need to support multiple vectors, this probably needs to be rewritten anyway.

But I wonder... if $vector is NOT automatically added, would it prevent use of $vector in filter clause?

I'll have to think about this bit more, thanks!

@tatu-at-datastax
Copy link
Contributor Author

Sounds like implicit addition of $vector for non-empty "allow" -- similar to _id for filter Projection -- is what we'll be going with. Users can still add it to "deny" list if they really want to exclude it from being indexed; but that seems like an odd use case (why add $vector in Collection at all then?).

@tatu-at-datastax tatu-at-datastax marked this pull request as draft January 17, 2024 19:00
@tatu-at-datastax
Copy link
Contributor Author

Need to make change for auto-inclusion of $vector, moving back to Draft.

@tatu-at-datastax tatu-at-datastax marked this pull request as ready for review January 17, 2024 22:41
@tatu-at-datastax
Copy link
Contributor Author

Ready for re-review.

@tatu-at-datastax tatu-at-datastax changed the title Fixes #767: Second part of "no-index" changes: allow large non-indexed String values Fixes #767: Second part of "no-index" changes: allow large non-indexed String values, implicit allow for "$vector" Jan 18, 2024
@tatu-at-datastax
Copy link
Contributor Author

@maheshrajamani "$vector" handling changed as discussed: any other concerns? (I know you have approved but wanted to double-check before merging)

@maheshrajamani
Copy link
Contributor

@maheshrajamani "$vector" handling changed as discussed: any other concerns? (I know you have approved but wanted to double-check before merging)

@tatu-at-datastax Changes LGTM

@tatu-at-datastax tatu-at-datastax merged commit 3805300 into main Jan 18, 2024
2 checks passed
@tatu-at-datastax tatu-at-datastax deleted the tatu/767-no-index-part2 branch January 18, 2024 19:26
Copy link
Collaborator

@vkarpov15 vkarpov15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Indexing options] Shredder changes to index fields
3 participants