Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCR Rest API design #30102

Closed
elasticmachine opened this issue Sep 21, 2017 · 12 comments
Closed

CCR Rest API design #30102

elasticmachine opened this issue Sep 21, 2017 · 12 comments
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features Meta

Comments

@elasticmachine
Copy link
Collaborator

Original comment by @bleskes:

Top Level Overview

This is a meta issue to capture all the initial thoguhts and design of the REST API for the LINK REDACTED. We currently see the need for the following API, each described in more details below:

  • Creates a new index to follow a remote index
  • Convert an existing index to a following index
  • Disconnect a following index
  • Monitoring, most specificly expose the currentlag
  • Register a auto follow pattern to automatically create following indices for newly created indices on the remote cluster (phase 2)

The API design assumes we will use the remote cluster configuration of Cross Cluster Search (which will require minor tweaks not described here).

Create a following index

This API is used to create a new index on the local cluster that immediatley start following an index on a remote cluster. The newly create index will have the same meta data as the remote index. The default name will be identical but can optionally changed.

Notes:

  • We could shorten this to a url only scheme (/_xpack/xdcr/_follow/{{remote:index}}), but I think a body is good for future proofing
  • Should not apply local index templates
PUT /_xpack/xdcr/_create_and_follow/
PUT {index}/_xpack/xdcr/_create_and_follow/ (create with explicit name)
{
    "leader": { // we may want shorten this based on remote_cluster:index notation.
        "index": "name",  
        "remote_cluster": "name" // refers to a remote cluster in cross cluster search config
    }
}

Make an existing index become a follower

This API takes an existing index and adds the needed metadata to make it a follower. The API validates that the index is closed but doesn't close it nor open it. This needs to be done by explicit calls to the dedicated API.

This API also needs to validate that the remote following index is compatible with the local one. This includes the mapping and metadata but also some kind of sanity check using history uuids. Caveats here include people restoring this index from a snapshot, which will destroy it's history uuid. Sadly this is a likely scenario as we plan to use snapshot and restore as a way to boostrap indices initially.

POST {index}/_xpack/xdcr/_follow 
{
    "leader": { // we may shorten this based on remote_cluster:index notation.
        "index": "name",  
        "remote_cluster": "name"
    }
}

Disconnect a following index

Takes a following index and makes it a "normal" one. It should verify that the index is closed before doing so.

POST {index}/_xpack/xdcr/_unfollow

Monitor/Stats

The goal of the API is to give easy access to statiscs that are relevant for CCR. This information is exposed by index stats and job status but you'll quite an expert knowledge to figure out how to tie things together. To that end we offer an API that does the heavy lifing.

POST /_xpack/xdcr/_stats

returns a per index, per shard map of lag information

{
    "index": {
        "_metadata": {
            ... anything relevant to syncing of mappings etc.
        },
        "0": {
            "completed_upto": 1024,
            "leader_max_seq_no": 1050, // the task fetches this information and stores it locally
            "lag": 26 // convenience number subtracting leader_max_seq_no from  completed_upto  
            ... inline any interestig information from the task,
            "error": ... (if the last job fetch had an error, report here)
            "last_fetch": "20170131..."
        }, ...
    }
}

Register an auto follow patterns (phase 2)

Register a auto follow pattern to automatically create following indices for newly created indices on the remote cluster (phase 2)

This is a rough sketch. We don't plan to implement this at first phase of the project. That said, it's good to start discussion going on how this may look like and how it may or may not affect the components being built. This API will be important for the timebased data use case

POST /_xpack/xdcr/_autofollow/{{remote_cluster}}
{
    "index_pattern": "logstash-*,.kibana",
    "include_deletes": true // do we really need auto delete? people can go and delete themselves, and automatic deletes are obvsiouly dangerous and complicates things.
}
@elasticmachine
Copy link
Collaborator Author

Original comment by @Mpdreamz:

cc @elastic/es-clients

@elasticmachine
Copy link
Collaborator Author

Original comment by @jasontedor:

Relates LINK REDACTED

@elasticmachine
Copy link
Collaborator Author

Original comment by @clintongormley:

I don't think we need two separate end points. Eventually, when we no longer need snapshot/restore for bootstrapping, then the logic would look something like this:

  • Fetch the remote index defn
  • If the index exists, check that it is compatible, otherwise create a new index
  • Perform file recovery if needed
  • Follow transactions from translog

While we still need S&R, we'd do the following:

  • Fetch the remote index defn
  • If the index exists, check that it is compatible, otherwise create a new index
  • If file recovery needed, then throw an exception (this check could come before creating a new index)
  • Follow transactions from translog

So this can all be handled by the single _follow end point. eg with the reindex API and everywhere else, we create an index on demand.

The method should be POST not PUT, because we're not storing the body with the URL _xpack/xdcr/_follow.

I talked to @jasontedor about possibly doing something like this, without need for a body

POST foo/_xpack/xdcr/_follow/cluster:foo

or

POST cluster:foo/_xpack/xdcr/_follow/foo

or

POST cluster:*/_xpack/xdcr/_follow

but he wasn't keen on having to remember the order of parameters.

Instead, we thought about making consistent with the reindex API, eg:

POST _xpack/xdcr/_follow
{
  "source": {
    "index": "cluster:foo"
  },
  "dest": {
    "index": "foo"  // optional
  }
}

This seems pretty easy to remember and understand.

@elasticmachine
Copy link
Collaborator Author

Original comment by @bleskes:

Thanks @clintongormley

I don't think we need two separate end points.
If the index exists, check that it is compatible, otherwise create a new index

I personally don't like the automatic fallback, in this case. I think that in this kind of admin level API this can only hide mistakes - for example, I expect the index to be there but it wasn't and now we automatically create it. This is a common issue in our API - sometime it's very useful, like when we automatically create an index in the time series data use case, but in general I think it should be avoided.

I talked to @jasontedor about possibly doing something like this, without need for a body

See comment in the description - we discussed this and the group decided to not go with url only. The main argument was to make it easier to add parameters and options.

Instead, we thought about making consistent with the reindex API, eg:

POST _xpack/xdcr/_follow
{
  "source": {
    "index": "cluster:foo"
  },
  "dest": {
    "index": "foo"  // optional
  }
}

I personally find this less "tight" - although it's clear I'm biased. The words "source"/"dest" add confusion IMO in the context where we use "follower" and "leader". Can you elaborate on the reason for the suggestion?

@elasticmachine
Copy link
Collaborator Author

Original comment by @clintongormley:

I personally don't like the automatic fallback, in this case. I think that in this kind of admin level API this can only hide mistakes - for example, I expect the index to be there but it wasn't and now we automatically create it. This is a common issue in our API - sometime it's very useful, like when we automatically create an index in the time series data use case, but in general I think it should be avoided.

What is the purpose of creating an empty follower index? Either:

  1. the index already exists, in which case we check it for (a) compatibility and (b) the existence of data in the remote translog to allow us to catch up, or
  2. the index doesn't exist, in which case we create it using the remote index definition and then we go to step 1

This looks like a single process to me. When relying on snapshot restore for bootstrapping, we'll fail if we can't read data from the translog, and when we no longer rely on snapshot restore, then we'll do a segment copy.

I don't understand why we need the two APIs. I also value consistency. Why should this API be different from all the others?

I personally find this less "tight" - although it's clear I'm biased. The words "source"/"dest" add confusion IMO in the context where we use "follower" and "leader". Can you elaborate on the reason for the suggestion?

Consistency again. It's one less thing for users to remember. I think source and dest leave little room for confusion. We already have them. Why introduce new terms?

@elasticmachine
Copy link
Collaborator Author

Original comment by @zuketo:

I started walking through using this API design for various use cases. Overall, looks good, a few questions came up (I'll group them by use case):

Data locality/replicating the same index to 5 different datacenters (to be close to the application server/user):

  • Can a leading index have multiple followers (e.g. by rerunning _follow with different destinations)
  • Can a following index follow another following index? (e.g. daisy chained replication)

Logging/security events:

  • We'll have an index created per day (in a lot of scenarios), without autofollow, what are some workarounds for phase 1? Will "POST cluster:*/_xpack/xdcr/_follow" be supported, or the other option is using a process external to ES to do the daily follow work?
  • What happens when the following index name already exists? Thinking through future Logstash scenarios, will every index name need to include the cluster/DC name, to avoid conflicts? E.g. "logstash-%{+YYYY.MM.dd}" may exist in every cluster, so we would probably want to recommend something like "logstash-datacenter1-%{+YYYY.MM.dd}".

@elasticmachine
Copy link
Collaborator Author

Original comment by @bleskes:

@zuketo answers inline:

Can a leading index have multiple followers (e.g. by rerunning _follow with different destinations)

Yes. Following is a property of the target indices. The source index doesn't care how many followers read from it.

Can a following index follow another following index? (e.g. daisy chained replication)

As far as we know now there is no reason for this not work. It is obviously a more complex setup so we might need to drop the feature at the first iteration. out of curiosity - what was your use case?

Logging/security events

We'll have an index created per day (in a lot of scenarios), without autofollow, what are some workarounds for phase 1? Will "POST cluster:*/_xpack/xdcr/_follow" be supported, or the other option is using a process external to ES to do the daily follow work?

At this point we dont know yet. POST cluster:*/_xpack/xdcr/_follow is likely not be it as it requires for us to be lenient about indices that already have a follower locally. Might need an external process for daily work. Might be part of the index life cycle management (though this doesn't align well with the everything is driven by the target cluster mode). Might be something else.

What happens when the following index name already exists? Thinking through future Logstash scenarios, will every index name need to include the cluster/DC name, to avoid conflicts? E.g. "logstash-%{+YYYY.MM.dd}" may exist in every cluster, so we would probably want to recommend something like "logstash-datacenter1-%{+YYYY.MM.dd}".

That's a fair point. We may need to extend the api I suggested at the end. I lean towards the same rename options we have in the restore API.

@elasticmachine
Copy link
Collaborator Author

Original comment by @bleskes:

@jasontedor do you mind updating the ticket about our discussion in berlin?

@elasticmachine
Copy link
Collaborator Author

Original comment by @zuketo:

Thanks!

For chained replication use cases, I haven't come across any users specifically asking for this (I definitely wouldn't list it as a priority). I was mainly interested in solving the data locality use case if the source index couldn't have multiple followers (which isn't the case).

For chained replication, thinking through some future scenarios, users may want to reduce load on the leading index (e.g. if replicating to 20 clusters, or maybe more, this can be fanned out with chaining). The other scenario is two or more clusters per DC/region, and replicating once over the WAN, then again locally, to reduce network traffic costs.

@elasticmachine
Copy link
Collaborator Author

Original comment by @jasontedor:

@clintongormley Can you update this issue with the outcome of our discussion in Berlin?

@elasticmachine
Copy link
Collaborator Author

Original comment by @clintongormley:

As discussed in Berlin, we're going to go with a single API for setting up index following, and we're going to use leader/follower instead of source/dest in the body

@elasticmachine elasticmachine added :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features Meta labels Apr 25, 2018
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue May 16, 2018
The TODOs in the rest actions was incorrect. The problem was that
these rest actions used `follow_index` as first named variable in the path
under which the rest actions were registered. Other candidate rest actions that
also have a named variable as first element in the path (but with a different
name) get resolved as rest parameters too and passed down to the rest
action that actually ends up getting executed.

In the case of the follow index api, a `index` parameter got passed down
to `RestFollowExistingAction`, but that param was never used. This caused the
follow index api call to fail, because of unused http parameters.

This change doesn't fixes that problem, but works around it by using
`index` as named variable for the follow index (instead of `follow_index`).

Relates to elastic#30102
martijnvg added a commit that referenced this issue May 16, 2018
The TODOs in the rest actions was incorrect. The problem was that
these rest actions used `follow_index` as first named variable in the path
under which the rest actions were registered. Other candidate rest actions that
also have a named variable as first element in the path (but with a different
name) get resolved as rest parameters too and passed down to the rest
action that actually ends up getting executed.

In the case of the follow index api, a `index` parameter got passed down
to `RestFollowExistingAction`, but that param was never used. This caused the
follow index api call to fail, because of unused http parameters.

This change doesn't fixes that problem, but works around it by using
`index` as named variable for the follow index (instead of `follow_index`).

Relates to #30102
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jul 11, 2018
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jul 13, 2018
Tests shard follow task in the context of a leader and follower ReplicationGroup,
in order to test how the shard follow logic reacts to certain shard related
failure scenarios.

More tests will need to be added, but this indicates what changes need to be made
to have these tests.

Relates to elastic#30102
martijnvg added a commit that referenced this issue Jul 17, 2018
Tests shard follow task in the context of a leader and follower ReplicationGroup,
in order to test how the shard follow logic reacts to certain shard related
failure scenarios.

More tests will need to be added, but this indicates what changes need to be made
to have these tests.

Relates to #30102
martijnvg added a commit that referenced this issue Jul 17, 2018
Tests shard follow task in the context of a leader and follower ReplicationGroup,
in order to test how the shard follow logic reacts to certain shard related
failure scenarios.

More tests will need to be added, but this indicates what changes need to be made
to have these tests.

Relates to #30102
@dnhatn
Copy link
Member

dnhatn commented Mar 8, 2019

We have implemented all these endpoints. Closing
/cc @martijnvg

@dnhatn dnhatn closed this as completed Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features Meta
Projects
None yet
Development

No branches or pull requests

2 participants