-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REST high-level client: add reindex API #32679
Conversation
); | ||
} | ||
{ | ||
TimeUnit.SECONDS.sleep(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 I know this is incorrect. But I couldn't make the test pass without it. Will firing a refresh request help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked. Sorry for the noise.
@elasticmachine, white list this please |
whitelist this please |
add to whitelist LOL |
Pinging @elastic/es-core-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few questions, the biggest being, maybe a mutable builder object that knows how to build a status or the error might make this less terrible to parse. Maybe. Could you check?
request.setScript( | ||
new Script( | ||
ScriptType.INLINE, "painless", | ||
"if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think messing with the _version
is a fairly rare thing. I think It'd be more normal to, say, add split a field on a regex and stick it into two fields. Or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I copied this from the other doc. It is to show (I believe) that you can mess with it during reindex
but not during update_by_query
. I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth changing. Probably in both places to be honest, but just in this place in this PR.
-------------------------------------------------- | ||
<1> Set the versionType to `EXTERNAL` | ||
|
||
Settings `opType` to `create` will cause `_reindex` to only create missing documents in the target index. All existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/Settings/Setting/
<1> `setScript` to bump the version of the source document | ||
|
||
`ReindexRequest` supports reindexing from a remote Elasticsearch cluster. When using a remote cluster the query should be | ||
specified inside the `RemoteInfo` object and not using `setSourceQuery`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd explicitly say that the query set by setSourceQuery
is ignored. It might also be nice to explain why: the remote Elasticsearch may not understand queries built by the modern query builders. This works all the way back to Elasticsearch 0.90 and the query language has drifted a bit since then. When you reach to old versions, it is better to write the query by hand in json.
List<Failure> bulkFailures = new ArrayList<>(); | ||
List<SearchFailure> searchFailures = new ArrayList<>(); | ||
for (Object object: failures) { | ||
if (object instanceof Failure) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I regret building this in this way. It is indeed messy.
case SearchFailure.SHARD_FIELD: | ||
shardId = parser.intValue(); | ||
break; | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow!
Float requestsPerSecond = (Float) a[startIndex + 10]; | ||
requestsPerSecond = requestsPerSecond == -1 ? Float.POSITIVE_INFINITY : requestsPerSecond; | ||
String reasonCancelled = (String) a[startIndex + 11]; | ||
TimeValue throttledUntil = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably extract them all from the array first, casting them, and then manipulating them.
Actually, I wonder if a builder object would be better than this object array stuff. It might be cleaner. Not nice, but the way I laid out the json makes nice kind of impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. But it might not be super helpful. There are two constructors. The way it works right now is you call one or the other. So you calculate stuff from sliceStatuses
or you specify everything and sliceStatuses
is an empty list. Not sure how that would translate here. Also, personally I like immutable objects. Validation is consolidated in the constructor. For example, fields that must be set. Plus it wouldn't change much in terms of lines of code I think.
But if you think it's important I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By mutable builder I mean making a mutable object that you just use as the target for the parsing logic and then throw away afterwords. It just has setters and then a method to build status or error depending on which fields are set. The mutable object never escapes parsing. It might not be less code but I think it'd be easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Okay, thanks for explaining! I can make that change.
} | ||
|
||
public static Status innerFromXContent(XContentParser parser) throws IOException { | ||
Token token = parser.currentToken(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you can avoid all of this with a builder object that knows how to build either status or error. Then you can use regular ObjectParser with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I cannot parse the error myself (as we already discussed)? I need to peek at a field name to realise what I need to parse which doesn't work well with an ObjectParser
. I could declare all fields in the ObjectParser
and then try to build an object using that but for that, I would need to write the parse for the error as well. Which I really do not want to do... Or are you suggesting something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here explaining why you need to parse this manually? I'm 100% sure when I read this again in 6 months I'll waste an hour figuring out why you did it this way. A comment will save future me some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! I left about 7 requests for more comments, mostly because I know that I'll read this again in six months and won't know why you made the choices that you did. Comments explaining those choices would be super nice. And some Javadoc on the new public methods would be wonderful!
@@ -189,6 +202,115 @@ public boolean shouldCancelChildrenOnCancellation() { | |||
return true; | |||
} | |||
|
|||
public static class StatusBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add javadoc for this?
import java.util.List; | ||
import java.util.concurrent.TimeUnit; | ||
|
||
public class BulkByScrollResponseBuilder extends StatusBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add javadoc for this too?
Also, should it be public? What about package scope?
} | ||
|
||
public static Status innerFromXContent(XContentParser parser) throws IOException { | ||
Token token = parser.currentToken(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here explaining why you need to parse this manually? I'm 100% sure when I read this again in 6 months I'll waste an hour figuring out why you did it this way. A comment will save future me some time.
@@ -610,6 +961,41 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
return builder; | |||
} | |||
|
|||
public static StatusOrException fromXContent(XContentParser parser) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave a comment about why you do it this way? The reasoning is fairly involved and a comment will totally help future-me when I reread this six months from now.
@@ -39,7 +44,8 @@ | |||
* of reasons, not least of which that scripts are allowed to change the destination request in drastic ways, including changing the index | |||
* to which documents are written. | |||
*/ | |||
public class ReindexRequest extends AbstractBulkIndexByScrollRequest<ReindexRequest> implements CompositeIndicesRequest { | |||
public class ReindexRequest extends AbstractBulkIndexByScrollRequest<ReindexRequest> | |||
implements CompositeIndicesRequest, ToXContentObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you indent this one more level? I think this kind of indenting makes it look like implements
is part of the class body.
return this; | ||
} | ||
|
||
public ReindexRequest setDestIndex(String destIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add javadoc for these? They'd be nice because these are part of the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we weren't good about this in the past but we're trying to be better lately.
d3d7f9f
to
34cb5ed
Compare
657e26e
to
4490216
Compare
4490216
to
4fa987c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged master and pushed a tiny cleanup. I'm going to let CI chew on it.
* master: Painless: Add Bindings (#33042) Update version after client credentials backport Fix forbidden apis on FIPS (#33202) Remote 6.x transport BWC Layer for `_shrink` (#33236) Test fix - Graph HLRC tests needed another field adding to randomisation exception list HLRC: Add ML Get Records API (#33085) [ML] Fix character set finder bug with unencodable charsets (#33234) TESTS: Fix overly long lines (#33240) Test fix - Graph HLRC test was missing field name to be excluded from randomisation logic Remove unsupported group_shard_failures parameter (#33208) Update BucketUtils#suggestShardSideQueueSize signature (#33210) Parse PEM Key files leniantly (#33173) INGEST: Add Pipeline Processor (#32473) Core: Add java time xcontent serializers (#33120) Consider multi release jars when running third party audit (#33206) Update MSI documentation (#31950) HLRC: create base timed request class (#33216) [DOCS] Fixes command page titles HLRC: Move ML protocol classes into client ml package (#33203) Scroll queries asking for rescore are considered invalid (#32918) Painless: Fix Semicolon Regression (#33212) ingest: minor - update test to include dissect (#33211) Switch remaining LLREST usage to new style Requests (#33171) HLREST: add reindex API (#32679)
Adds the reindex API to the high level REST client.
Relates to #27205