Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17504: CoreContainer calls UpdateHandler.commit. #2786

Merged
merged 5 commits into from
Nov 19, 2024

Conversation

bruno-roustant
Copy link
Contributor

https://issues.apache.org/jira/browse/SOLR-17504

Description

This proposal is about unifying calls to UpdateHandler.commit.

CoreContainer.reload may call directly IndexWriter.commit if the core to reload is readonly (property of the Collection). This is an issue because it bypasses some logic in UpdateHandler.commit. In addition, the current code commits without taking the commit lock.

DirectUpdateHandler2.closeWriter may commit if the update log is not empty (not committed yet). But it does not call DirectUpdateHandler2.commit(), it copies commit() code instead for some reasons (comments in the code mention test failures).

Solution

Make CoreContainer.reload call UpdateHandler.commit.

Add the same call to DirectUpdateHandler2.shouldCommit() as in the commit() method, otherwise commit metadata are lost. In addition shouldCommit() can be extended correctly, for example in the solr-sandbox encryption module which needs to intercept calls to shouldCommit().

Tests

A new DirectUpdateHandlerWithUpdateLogTest is added to test specifically that the UpdateHandler.shouldCommit hook is called. It's a new test because it requires the update log to test the UpdateHandler.closeWriter logic, but DirectUpdateHandlerTest does not support the update log.

@@ -28,6 +28,10 @@ public class TimeOut {
private final long timeoutAt, startTime;
private final TimeSource timeSource;

public TimeOut(long intervalSeconds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just taking a long (a primitive) isn't particularly obvious; I recommend not adding this.

Comment on lines 1125 to 1127
* searcher will be opened with a standard {@link CommitUpdateCommand}. This internal {@link
* ClosingCommitUpdateCommand} is a way to not add another public flag to {@link
* CommitUpdateCommand}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why would that be bad? My first reaction is, why is Bruno subclassing when this seems like another boolean on a commit. You could add a factory method for this scenario -- closeOnCommit so we know exactly what this looks like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose there should be an enum, searcher behavior:

  • noSearcher
  • realtimeSearcher
  • standardSearcher
  • standardSearcherNoWait

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree, such a change would be a nice refactoring done first before this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this enum idea very much.
If we replace two public boolean fields by an enum field, would it be a public API change, so limited to main branch?
Ideally, I would like to have this change in 9.x to get it in solr-sandbox.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely merely an internal change in terms of just the CommitUpdateCommand internals (I'm aware its fields are all public :-(. No backwards-compatibility concern.

Of course users could pass those booleans at an HTTP params level and it still needs to work, ultimately being converted to the appropriate enum. Not sure if you want to add a new http param for the enum as an alternative... I suppose you might.

Comment on lines 972 to 976
// todo: refactor this shared code (or figure out why a real CommitUpdateCommand can't
// be used)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't write that comment but it's unclear to me what is meant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, also unclear to me.

@@ -786,7 +786,7 @@ public void commit(CommitUpdateCommand cmd) throws IOException {
if (ulog != null) ulog.preSoftCommit(cmd);
if (cmd.openSearcher) {
core.getSearcher(true, false, waitSearcher);
} else {
} else if (!(cmd instanceof ClosingCommitUpdateCommand)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really smart observation here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is beautiful

try {
if (iw != null) {
iw.commit();
SolrQueryRequest req = new LocalSolrQueryRequest(core, new ModifiableSolrParams());
CommitUpdateCommand cmd =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "var" would be clear and short

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I can change but I don't like var.

Comment on lines -2064 to -2065
// switch old core to readOnly
core.readOnly = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose doing this before was to protect changes happening after the commit sneaking it. Why did you move it?

Copy link
Contributor Author

@bruno-roustant bruno-roustant Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, good point. I moved it because I didn't realize your point and UpdateHanlder.commit() checks the core is not readonly, otherwise stops with an exception.
Actually, with your nice idea of an enum in CommitUpdateCommand, the intent would be clear and commit() would not reject if the core is closed after the commit. And we could restore the core.readonly = true at the right location here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an aside, I don't like that CoreContainer is doing SolrCore internal manipulations... like this should be a method on SolrCore like core.commit()

@bruno-roustant
Copy link
Contributor Author

I tried to refactor to have an enum in CommitUpdateCommand instead of 3 booleans, but the change in callings classes was becoming large and odd with dependent classes. I finally reverted to keep the 2 booleans and add a private 3rd one. There is no refactoring, but it keeps clarity. Tell me what you think.

@@ -21,9 +21,21 @@

/** A commit index command encapsulated in an object. */
public class CommitUpdateCommand extends UpdateCommand {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome docs!

@bruno-roustant bruno-roustant merged commit 5cc733b into apache:main Nov 19, 2024
3 checks passed
@bruno-roustant bruno-roustant deleted the direct_update_handler branch November 19, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants