CCR Rest API design #30102

elasticmachine · 2017-09-21T12:52:21Z

Original comment by @bleskes:

Top Level Overview

This is a meta issue to capture all the initial thoguhts and design of the REST API for the LINK REDACTED. We currently see the need for the following API, each described in more details below:

Creates a new index to follow a remote index
Convert an existing index to a following index
Disconnect a following index
Monitoring, most specificly expose the currentlag
Register a auto follow pattern to automatically create following indices for newly created indices on the remote cluster (phase 2)

The API design assumes we will use the remote cluster configuration of Cross Cluster Search (which will require minor tweaks not described here).

Create a following index

This API is used to create a new index on the local cluster that immediatley start following an index on a remote cluster. The newly create index will have the same meta data as the remote index. The default name will be identical but can optionally changed.

Notes:

We could shorten this to a url only scheme (/_xpack/xdcr/_follow/{{remote:index}}), but I think a body is good for future proofing
Should not apply local index templates

PUT /_xpack/xdcr/_create_and_follow/
PUT {index}/_xpack/xdcr/_create_and_follow/ (create with explicit name)
{
    "leader": { // we may want shorten this based on remote_cluster:index notation.
        "index": "name",  
        "remote_cluster": "name" // refers to a remote cluster in cross cluster search config
    }
}

Make an existing index become a follower

This API takes an existing index and adds the needed metadata to make it a follower. The API validates that the index is closed but doesn't close it nor open it. This needs to be done by explicit calls to the dedicated API.

This API also needs to validate that the remote following index is compatible with the local one. This includes the mapping and metadata but also some kind of sanity check using history uuids. Caveats here include people restoring this index from a snapshot, which will destroy it's history uuid. Sadly this is a likely scenario as we plan to use snapshot and restore as a way to boostrap indices initially.

POST {index}/_xpack/xdcr/_follow 
{
    "leader": { // we may shorten this based on remote_cluster:index notation.
        "index": "name",  
        "remote_cluster": "name"
    }
}

Disconnect a following index

Takes a following index and makes it a "normal" one. It should verify that the index is closed before doing so.

POST {index}/_xpack/xdcr/_unfollow

Monitor/Stats

The goal of the API is to give easy access to statiscs that are relevant for CCR. This information is exposed by index stats and job status but you'll quite an expert knowledge to figure out how to tie things together. To that end we offer an API that does the heavy lifing.

POST /_xpack/xdcr/_stats

returns a per index, per shard map of lag information

{
    "index": {
        "_metadata": {
            ... anything relevant to syncing of mappings etc.
        },
        "0": {
            "completed_upto": 1024,
            "leader_max_seq_no": 1050, // the task fetches this information and stores it locally
            "lag": 26 // convenience number subtracting leader_max_seq_no from  completed_upto  
            ... inline any interestig information from the task,
            "error": ... (if the last job fetch had an error, report here)
            "last_fetch": "20170131..."
        }, ...
    }
}

Register an auto follow patterns (phase 2)

Register a auto follow pattern to automatically create following indices for newly created indices on the remote cluster (phase 2)

This is a rough sketch. We don't plan to implement this at first phase of the project. That said, it's good to start discussion going on how this may look like and how it may or may not affect the components being built. This API will be important for the timebased data use case

POST /_xpack/xdcr/_autofollow/{{remote_cluster}}
{
    "index_pattern": "logstash-*,.kibana",
    "include_deletes": true // do we really need auto delete? people can go and delete themselves, and automatic deletes are obvsiouly dangerous and complicates things.
}

The text was updated successfully, but these errors were encountered:

elasticmachine · 2017-09-22T15:21:03Z

Original comment by @Mpdreamz:

cc @elastic/es-clients

elasticmachine · 2017-09-25T14:47:30Z

Original comment by @jasontedor:

Relates LINK REDACTED

elasticmachine · 2017-10-06T14:09:23Z

Original comment by @clintongormley:

I don't think we need two separate end points. Eventually, when we no longer need snapshot/restore for bootstrapping, then the logic would look something like this:

Fetch the remote index defn
If the index exists, check that it is compatible, otherwise create a new index
Perform file recovery if needed
Follow transactions from translog

While we still need S&R, we'd do the following:

Fetch the remote index defn
If the index exists, check that it is compatible, otherwise create a new index
If file recovery needed, then throw an exception (this check could come before creating a new index)
Follow transactions from translog

So this can all be handled by the single _follow end point. eg with the reindex API and everywhere else, we create an index on demand.

The method should be POST not PUT, because we're not storing the body with the URL _xpack/xdcr/_follow.

I talked to @jasontedor about possibly doing something like this, without need for a body

POST foo/_xpack/xdcr/_follow/cluster:foo

or

POST cluster:foo/_xpack/xdcr/_follow/foo

or

POST cluster:*/_xpack/xdcr/_follow

but he wasn't keen on having to remember the order of parameters.

Instead, we thought about making consistent with the reindex API, eg:

POST _xpack/xdcr/_follow
{
  "source": {
    "index": "cluster:foo"
  },
  "dest": {
    "index": "foo"  // optional
  }
}

This seems pretty easy to remember and understand.

elasticmachine · 2017-10-08T19:32:43Z

Original comment by @bleskes:

Thanks @clintongormley

I don't think we need two separate end points.
If the index exists, check that it is compatible, otherwise create a new index

I personally don't like the automatic fallback, in this case. I think that in this kind of admin level API this can only hide mistakes - for example, I expect the index to be there but it wasn't and now we automatically create it. This is a common issue in our API - sometime it's very useful, like when we automatically create an index in the time series data use case, but in general I think it should be avoided.

I talked to @jasontedor about possibly doing something like this, without need for a body

See comment in the description - we discussed this and the group decided to not go with url only. The main argument was to make it easier to add parameters and options.

Instead, we thought about making consistent with the reindex API, eg:

POST _xpack/xdcr/_follow
{
  "source": {
    "index": "cluster:foo"
  },
  "dest": {
    "index": "foo"  // optional
  }
}

I personally find this less "tight" - although it's clear I'm biased. The words "source"/"dest" add confusion IMO in the context where we use "follower" and "leader". Can you elaborate on the reason for the suggestion?

elasticmachine · 2017-10-09T12:10:25Z

Original comment by @clintongormley:

I personally don't like the automatic fallback, in this case. I think that in this kind of admin level API this can only hide mistakes - for example, I expect the index to be there but it wasn't and now we automatically create it. This is a common issue in our API - sometime it's very useful, like when we automatically create an index in the time series data use case, but in general I think it should be avoided.

What is the purpose of creating an empty follower index? Either:

the index already exists, in which case we check it for (a) compatibility and (b) the existence of data in the remote translog to allow us to catch up, or
the index doesn't exist, in which case we create it using the remote index definition and then we go to step 1

This looks like a single process to me. When relying on snapshot restore for bootstrapping, we'll fail if we can't read data from the translog, and when we no longer rely on snapshot restore, then we'll do a segment copy.

I don't understand why we need the two APIs. I also value consistency. Why should this API be different from all the others?

I personally find this less "tight" - although it's clear I'm biased. The words "source"/"dest" add confusion IMO in the context where we use "follower" and "leader". Can you elaborate on the reason for the suggestion?

Consistency again. It's one less thing for users to remember. I think source and dest leave little room for confusion. We already have them. Why introduce new terms?

elasticmachine · 2017-11-01T20:16:17Z

Original comment by @zuketo:

I started walking through using this API design for various use cases. Overall, looks good, a few questions came up (I'll group them by use case):

Data locality/replicating the same index to 5 different datacenters (to be close to the application server/user):

Can a leading index have multiple followers (e.g. by rerunning _follow with different destinations)
Can a following index follow another following index? (e.g. daisy chained replication)

Logging/security events:

We'll have an index created per day (in a lot of scenarios), without autofollow, what are some workarounds for phase 1? Will "POST cluster:*/_xpack/xdcr/_follow" be supported, or the other option is using a process external to ES to do the daily follow work?
What happens when the following index name already exists? Thinking through future Logstash scenarios, will every index name need to include the cluster/DC name, to avoid conflicts? E.g. "logstash-%{+YYYY.MM.dd}" may exist in every cluster, so we would probably want to recommend something like "logstash-datacenter1-%{+YYYY.MM.dd}".

elasticmachine · 2017-11-01T22:45:24Z

Original comment by @bleskes:

@zuketo answers inline:

Can a leading index have multiple followers (e.g. by rerunning _follow with different destinations)

Yes. Following is a property of the target indices. The source index doesn't care how many followers read from it.

Can a following index follow another following index? (e.g. daisy chained replication)

As far as we know now there is no reason for this not work. It is obviously a more complex setup so we might need to drop the feature at the first iteration. out of curiosity - what was your use case?

Logging/security events

We'll have an index created per day (in a lot of scenarios), without autofollow, what are some workarounds for phase 1? Will "POST cluster:*/_xpack/xdcr/_follow" be supported, or the other option is using a process external to ES to do the daily follow work?

At this point we dont know yet. POST cluster:*/_xpack/xdcr/_follow is likely not be it as it requires for us to be lenient about indices that already have a follower locally. Might need an external process for daily work. Might be part of the index life cycle management (though this doesn't align well with the everything is driven by the target cluster mode). Might be something else.

What happens when the following index name already exists? Thinking through future Logstash scenarios, will every index name need to include the cluster/DC name, to avoid conflicts? E.g. "logstash-%{+YYYY.MM.dd}" may exist in every cluster, so we would probably want to recommend something like "logstash-datacenter1-%{+YYYY.MM.dd}".

That's a fair point. We may need to extend the api I suggested at the end. I lean towards the same rename options we have in the restore API.

elasticmachine · 2017-11-01T22:45:45Z

Original comment by @bleskes:

@jasontedor do you mind updating the ticket about our discussion in berlin?

elasticmachine · 2017-11-01T23:10:56Z

Original comment by @zuketo:

Thanks!

For chained replication use cases, I haven't come across any users specifically asking for this (I definitely wouldn't list it as a priority). I was mainly interested in solving the data locality use case if the source index couldn't have multiple followers (which isn't the case).

For chained replication, thinking through some future scenarios, users may want to reduce load on the leading index (e.g. if replicating to 20 clusters, or maybe more, this can be fanned out with chaining). The other scenario is two or more clusters per DC/region, and replicating once over the WAN, then again locally, to reduce network traffic costs.

elasticmachine · 2017-11-01T23:26:20Z

Original comment by @jasontedor:

@clintongormley Can you update this issue with the outcome of our discussion in Berlin?

elasticmachine · 2017-11-02T08:13:09Z

Original comment by @clintongormley:

As discussed in Berlin, we're going to go with a single API for setting up index following, and we're going to use leader/follower instead of source/dest in the body

The TODOs in the rest actions was incorrect. The problem was that these rest actions used `follow_index` as first named variable in the path under which the rest actions were registered. Other candidate rest actions that also have a named variable as first element in the path (but with a different name) get resolved as rest parameters too and passed down to the rest action that actually ends up getting executed. In the case of the follow index api, a `index` parameter got passed down to `RestFollowExistingAction`, but that param was never used. This caused the follow index api call to fail, because of unused http parameters. This change doesn't fixes that problem, but works around it by using `index` as named variable for the follow index (instead of `follow_index`). Relates to elastic#30102

The TODOs in the rest actions was incorrect. The problem was that these rest actions used `follow_index` as first named variable in the path under which the rest actions were registered. Other candidate rest actions that also have a named variable as first element in the path (but with a different name) get resolved as rest parameters too and passed down to the rest action that actually ends up getting executed. In the case of the follow index api, a `index` parameter got passed down to `RestFollowExistingAction`, but that param was never used. This caused the follow index api call to fail, because of unused http parameters. This change doesn't fixes that problem, but works around it by using `index` as named variable for the follow index (instead of `follow_index`). Relates to #30102

Relates to elastic#30102

Relates to #30102

Tests shard follow task in the context of a leader and follower ReplicationGroup, in order to test how the shard follow logic reacts to certain shard related failure scenarios. More tests will need to be added, but this indicates what changes need to be made to have these tests. Relates to elastic#30102

Tests shard follow task in the context of a leader and follower ReplicationGroup, in order to test how the shard follow logic reacts to certain shard related failure scenarios. More tests will need to be added, but this indicates what changes need to be made to have these tests. Relates to #30102

dnhatn · 2019-03-08T15:22:04Z

We have implemented all these endpoints. Closing
/cc @martijnvg

elasticmachine added :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features Meta labels Apr 25, 2018

martijnvg mentioned this issue May 8, 2018

[CCR] Fixed follow and unfollow api url path according to design. #30459

Merged

martijnvg mentioned this issue May 15, 2018

[CCR] Add create and follow api #30602

Merged

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jul 11, 2018

[CCR] Move api parameters from url to request body.

2b7d35c

Relates to elastic#30102

martijnvg mentioned this issue Jul 11, 2018

[CCR] Move api parameters from url to request body. #31949

Merged

martijnvg added a commit that referenced this issue Jul 11, 2018

[CCR] Move api parameters from url to request body. (#31949)

815faf3

Relates to #30102

martijnvg added a commit that referenced this issue Jul 11, 2018

[CCR] Move api parameters from url to request body. (#31949)

04b5681

Relates to #30102

dnhatn closed this as completed Mar 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CCR Rest API design #30102

CCR Rest API design #30102

elasticmachine commented Sep 21, 2017

elasticmachine commented Sep 22, 2017

elasticmachine commented Sep 25, 2017

elasticmachine commented Oct 6, 2017

elasticmachine commented Oct 8, 2017

elasticmachine commented Oct 9, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 2, 2017

dnhatn commented Mar 8, 2019

CCR Rest API design #30102

CCR Rest API design #30102

Comments

elasticmachine commented Sep 21, 2017

Top Level Overview

Create a following index

Make an existing index become a follower

Disconnect a following index

Monitor/Stats

Register an auto follow patterns (phase 2)

elasticmachine commented Sep 22, 2017

elasticmachine commented Sep 25, 2017

elasticmachine commented Oct 6, 2017

elasticmachine commented Oct 8, 2017

elasticmachine commented Oct 9, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 1, 2017

elasticmachine commented Nov 2, 2017

dnhatn commented Mar 8, 2019