Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add no-index support for Shredder (part 1) #786

Merged
merged 14 commits into from
Jan 11, 2024

Conversation

tatu-at-datastax
Copy link
Contributor

@tatu-at-datastax tatu-at-datastax commented Jan 10, 2024

What this PR does:

Adds index allow/deny handling to Shredder based on IndexingConfig added via CreateCollection

Which issue(s) this PR fixes:
Fixes #767

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

@tatu-at-datastax tatu-at-datastax self-assigned this Jan 10, 2024
this(
allowed,
denied,
Suppliers.memoize(() -> DocumentProjector.createForIndexing(allowed, denied)));
Copy link
Contributor Author

@tatu-at-datastax tatu-at-datastax Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done so that DocumentProjector is only created first time it is needed (if at all) and reused if used in future (as CollectionSettings are effectively cached)

@tatu-at-datastax tatu-at-datastax changed the title (WIP) Add no-index support for Shredder Add no-index support for Shredder (part 1) Jan 11, 2024
@tatu-at-datastax tatu-at-datastax marked this pull request as ready for review January 11, 2024 16:43
@tatu-at-datastax tatu-at-datastax requested a review from a team as a code owner January 11, 2024 16:43
Copy link
Contributor

@maheshrajamani maheshrajamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, can you clarify on the comments?

}

public DocumentProjector indexingProjector() {
return indexedProject.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does build the Projector for every call. Is the idea to get this once use it in code passing everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it uses Guava's memoize() to avoid just that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to come up with a test to verify that we only get one instance; maybe CreateCollectionCommandResolverTest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, easier to verify with stand-alone unit test: see CollectionSettingsTest which verifies that instance is created dynamically, and then same instance returned for further calls to get instance.

// Note that (5) is effectively same as (3) and included for sake of uniformity
if (allowed != null && !allowed.isEmpty()) {
// (special) Case 5:
if (allowed.size() == 1 && allowed.contains("*")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed can't have *, it's only supported for denied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is something I wanted to discuss (should have brought up) -- for consistency it would make sense to allow this. But for minimal approach not. Will ask on Stargate channel.

My main concern is if users would pass this, assuming it works, and it meaning something totally different.

""";

// Try with overlapping paths
assertDenyProjection(Arrays.asList("a", "a.b"), INPUT_DOC, EXP_OUTPUT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the output if denied list is only "a.b"? Will we index field "a.b.z"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We take most generic path, first, so this is same as only using "a".

And so a, a.b, a.b.c would all be excluded (not indexed).

// First with empty Sets:
assertAllowProjection(Arrays.asList(), DOC, DOC);
// And then "*" notation
assertAllowProjection(Arrays.asList("*"), DOC, DOC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a valid case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to earlier comment: so, correct, depends on our definition.

@tatu-at-datastax tatu-at-datastax merged commit c62f1e0 into main Jan 11, 2024
2 checks passed
@tatu-at-datastax tatu-at-datastax deleted the tatu/767-shredder-no-index branch January 11, 2024 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Indexing options] Shredder changes to index fields
2 participants