Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For Azure, support multiple storage accounts and secondary endpoints (which are readonly) #13228

Closed
wants to merge 1 commit into from

Conversation

craigwi
Copy link
Contributor

@craigwi craigwi commented Aug 31, 2015

…to ES master.

Enhancements as discussed on https://github.com/craigwi/elasticsearch-cloud-azure/releases/tag/v2.7.1-craigwi:

1.supports multiple storage accounts; cloud.azure.storage.account and cloud.azure.storage.key may now be an array of accounts / keys; the arrays must be the same length; the account at an index must match the key at the same index.

2.an Azure repository specification ("type" : "azure") allows for two new settings. "account" specifies the name of the account to be used and must be one of the items in cloud.azure.storage.account. If "account" is not specified, the first item in the list of accounts is used. The other new setting "location_mode" may be used to specify the endpoint. This defaults to "primary_only" and may also be "primary_then_secondary", "secondary_only" or "secondary_then_primary".

3.when a repository is registered using "secondary_only" or "secondary_then_primary" as the "location_mode", the verification of the repository is limited to checking that the container specified exists; in particular the tests-* files are not created because the secondary endpoint is read only.

NOTE: for a given storage account, only one location_mode can be active at a time.

An example showing settings in elasticsearch.yml:

cloud.azure.storage.account: [ "azstorageaccount1", "mystorage2" ]
cloud.azure.storage.key: [ "", "" ]

A sample repository specification using the secondary endpoint:

{ "type": "azure", "settings": { "account" : "mystorage2", "container": "snapshots-20150701",
"location_mode": "secondary_only"}}

…to ES master.

Enhancements as discussed on https://github.com/craigwi/elasticsearch-cloud-azure/releases/tag/v2.7.1-craigwi:

1.supports multiple storage accounts; cloud.azure.storage.account and cloud.azure.storage.key may now be an array of accounts / keys; the arrays must be the same length; the account at an index must match the key at the same index.

2.an Azure repository specification ("type" : "azure") allows for two new settings. "account" specifies the name of the account to be used and must be one of the items in cloud.azure.storage.account. If "account" is not specified, the first item in the list of accounts is used. The other new setting "location_mode" may be used to specify the endpoint. This defaults to "primary_only" and may also be "primary_then_secondary", "secondary_only" or "secondary_then_primary".

3.when a repository is registered using "secondary_only" or "secondary_then_primary" as the "location_mode", the verification of the repository is limited to checking that the container specified exists; in particular the tests-* files are not created because the secondary endpoint is read only.

NOTE: for a given storage account, only one location_mode can be active at a time.

An example showing settings in elasticsearch.yml:

cloud.azure.storage.account: [ "azstorageaccount1", "mystorage2" ]
cloud.azure.storage.key: [ "", "" ]

A sample repository specification using the secondary endpoint:

{ "type": "azure", "settings": { "account" : "mystorage2", "container": "snapshots-20150701",
 "location_mode": "secondary_only"}}
@dadoonet
Copy link
Member

dadoonet commented Sep 1, 2015

Hi @craigwi.

Thank you for bringing that PR here.

As I wrote in elastic/elasticsearch-cloud-azure#93 (comment), I think we should do it a bit differently unless I'm missing something.

Pasting the discussion here:


Note that you also raised a valid point which is that we need to support in elasticsearch.yml multiple credentials.
We could imagine that as a generic feature whatever repository type you want to use.

Let say that we can now create something like:

cloud:
    azure:
        storage:
            azure1:
              account: your_azure_storage_account1
              key: your_azure_storage_key1
              default: true
            azure2:
              account: your_azure_storage_account2
              key: your_azure_storage_key2
            azure3:
              account: your_azure_storage_account3
              key: your_azure_storage_key3

Then when we create the repo, we can specify which credentials we want to use:

# use credentials 2
PUT _snapshot/my_backup2
{
  "type": "azure",
  "settings": {
      "credentials": "azure2",
      "container": "backup_container",
      "base_path": "backups"
  }
}

# This one will use the one marked as "default"
PUT _snapshot/my_backup3
{
  "type": "azure"
}

I know that we need to make one of those repo readonly. The infra is now ready. We "just" have basically to expose it in azure plugin now.

I think that you'll be able to define with this something really similar with what you are trying to achieve here.

elasticsearch.yml

Instead of:

cloud.azure.storage.account: [ "azstorageaccount1", "mystorage2" ]
cloud.azure.storage.key: [ "", "" ]

define:

cloud.azure.storage.azure1.account: "azstorageaccount1"
cloud.azure.storage.azure1.key: ""
cloud.azure.storage.azure2.account: "mystorage2"
cloud.azure.storage.azure2.key: ""

Usage

Instead of:

PUT _snapshot/myrepo
{ 
  "type": "azure", 
  "settings": { 
     "account" : "mystorage2", 
     "container": "snapshots-20150701",
     "location_mode": "secondary_only"
   }
}

Define:

PUT _snapshot/myrepo
{ 
  "type": "azure", 
  "settings": { 
     "credentials" : "azure2", 
     "container": "snapshots-20150701",
     "readonly": true,
     "location_mode": "secondary_only"
   }
}

That said, I think we can easily auto detect that we are using secondary endpoint here so we automatically set readonly to true unless the user explicitly define it to false.

I'm also wondering if we should not prefer an easier setting like use_secondary: true instead of location_mode: secondary_only and location_mode: primary_only. Unless you are thinking of a future usage?

While reading the Azure Storage Replication documentation, I was also wondering if we really need this flag?

When you enable read-only access to your data in the secondary region, your data is available on a secondary endpoint, in addition to the primary endpoint for your storage account. The secondary endpoint is similar to the primary endpoint, but appends the suffix -secondary to the account name. For example, if your primary endpoint for the Blob service is myaccount.blob.core.windows.net, then your secondary endpoint is myaccount-secondary.blob.core.windows.net. The access keys for your storage account are the same for both the primary and secondary endpoints.

If I understand it correctly, it means to me that the azure client basically adds -secondary in the endpoint when location_mode is secondary_only.

In that case, does something like the following would work?

cloud.azure.storage.azure1.account: "myaccount"
cloud.azure.storage.azure1.key: ""
cloud.azure.storage.azure2.account: "myaccount-secondary"
cloud.azure.storage.azure2.key: ""
PUT _snapshot/myrepo
{ 
  "type": "azure", 
  "settings": { 
     "credentials" : "azure2", 
     "container": "snapshots-20150701",
     "readonly": true
   }
}

I did not check. May be myaccount-secondary is a forbidden account name?

What do you think?

@imotov @skearns64 @ppf2 Feel free to add also your thoughts here!

@dadoonet
Copy link
Member

dadoonet commented Sep 1, 2015

The access keys for your storage account are the same for both the primary and secondary endpoints.

I think this answers my last question. The end point is different but the account must be kept as is, right?

@craigwi
Copy link
Contributor Author

craigwi commented Sep 1, 2015

Hi David,
I did see your comments. I ported my previous proposal was because it is a simple, tested and full featured solution and I wanted to start this discussion in the context of a pull request for the new location of the sources for the cloud-azure plugin.
Taking the points in reverse order, the Azure client apis provide a flexible means of accessing primary and secondary storage endpoints. There are four cases and my proposal supports them all. The four cases:

  1. Primary only
  2. Primary then secondary
  3. Secondary only
  4. Secondary then primary

To support those combinations, which I see no reason not to support, one passes the account AND the location mode. The client library uses the secondary endpoint as required. That is, the concept in Azure is ONE storage account and key with the endpoints derived indirectly from the use case (and potentially other settings).

My conclusion on this: location_mode, as I have implemented, is the correct way in Azure to use primary and secondary endpoints. The Java client library supports this and my proposed solution supports this.

Regarding the “readonly” support for repositories, I like the feature! While secondary endpoints in Azure are NECESSARILY readonly, we should enable the use of a primary endpoint that it should be accessed readonly. That is, the concept of location_mode and readonly-ness are mostly orthogonal. As noted elsewhere, location mode cases #2, #3 and #4 above are implicitly readonly and should be treated as if “readonly”: true was set.

My conclusion on this: the new “readonly” setting does not eliminate the need for “location_mode”.

As for the configuration in yml, independent of the above points, I am fine with either the azure1, azure2 approach or the arrays approach. I actually started with the approach you suggested, but found the array approach more like the rest of the settings in the yml file and super simple to implement. It is clear that in general there might be lots of settings per storage account; cf. https://azure.microsoft.com/en-us/documentation/articles/storage-configure-connection-string. However, it is extremely rare that one would setting them differently for different storage account in one deployment of ES. Thus it would be reasonable to set, for example, the blob endpoint once for use with all storage accounts.

My conclusion on this: either approach is fine.

Let me know what you think.

Craig.

@skearns64
Copy link
Contributor

@craigwi - I agree with just about all of your points.

When it comes to the configuration in YML, I prefer the azure1, azure2 approach outlined by @dadoonet

@ppf2
Copy link
Member

ppf2 commented Sep 15, 2015

Hi Craig, spoke with @dadoonet and @skearns64 , we will take the PR from here and modify as needed. Thx for the contribution!

@craigwi
Copy link
Contributor Author

craigwi commented Sep 15, 2015

Sounds good. Thanks for letting me know.

@dadoonet
Copy link
Member

I'm closing this one in favor of #13779

@dadoonet dadoonet closed this Oct 27, 2015
dadoonet pushed a commit to dadoonet/elasticsearch that referenced this pull request Nov 18, 2015
Follow up for elastic#13228.

This commit adds support for a secondary storage account:

```yml
cloud:
    azure:
        storage:
            my_account1:
                account: your_azure_storage_account1
                key: your_azure_storage_key1
                default: true
            my_account2:
                account: your_azure_storage_account2
                key: your_azure_storage_key2
```

When creating a repository, you can choose which azure account you want to use for it:

```sh
curl -XPUT localhost:9200/_snapshot/my_backup1?pretty -d '{
  "type": "azure"
}'

curl -XPUT localhost:9200/_snapshot/my_backup2?pretty -d '{
  "type": "azure",
  "settings": {
    "account" : "my_account2",
    "location_mode": "secondary_only"
  }
}'
```

`location_mode` supports `primary_only` or `secondary_only`. Defaults to `primary_only`. Note that if you set it
to `secondary_only`, it will force `read_only` to true.
dadoonet added a commit that referenced this pull request Nov 18, 2015
Follow up for #13228.

This commit adds support for a secondary storage account:

```yml
cloud:
    azure:
        storage:
            my_account1:
                account: your_azure_storage_account1
                key: your_azure_storage_key1
                default: true
            my_account2:
                account: your_azure_storage_account2
                key: your_azure_storage_key2
```

When creating a repository, you can choose which azure account you want to use for it:

```sh
curl -XPUT localhost:9200/_snapshot/my_backup1?pretty -d '{
  "type": "azure"
}'

curl -XPUT localhost:9200/_snapshot/my_backup2?pretty -d '{
  "type": "azure",
  "settings": {
    "account" : "my_account2",
    "location_mode": "secondary_only"
  }
}'
```

`location_mode` supports `primary_only` or `secondary_only`. Defaults to `primary_only`. Note that if you set it
to `secondary_only`, it will force `read_only` to true.

(cherry picked from commit 79a4d9c)

# Conflicts:
#	docs/plugins/repository-azure.asciidoc
#	plugins/cloud-azure/src/main/java/org/elasticsearch/plugin/cloud/azure/CloudAzurePlugin.java
#	plugins/cloud-azure/src/test/java/org/elasticsearch/repositories/azure/AzureSnapshotRestoreTests.java
#	plugins/repository-azure/src/main/java/org/elasticsearch/cloud/azure/AzureRepositoryModule.java
@clintongormley clintongormley added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Cloud Azure labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants