For Azure, support multiple storage accounts and secondary endpoints (which are readonly) #13228

craigwi · 2015-08-31T19:52:14Z

…to ES master.

Enhancements as discussed on https://github.com/craigwi/elasticsearch-cloud-azure/releases/tag/v2.7.1-craigwi:

1.supports multiple storage accounts; cloud.azure.storage.account and cloud.azure.storage.key may now be an array of accounts / keys; the arrays must be the same length; the account at an index must match the key at the same index.

2.an Azure repository specification ("type" : "azure") allows for two new settings. "account" specifies the name of the account to be used and must be one of the items in cloud.azure.storage.account. If "account" is not specified, the first item in the list of accounts is used. The other new setting "location_mode" may be used to specify the endpoint. This defaults to "primary_only" and may also be "primary_then_secondary", "secondary_only" or "secondary_then_primary".

3.when a repository is registered using "secondary_only" or "secondary_then_primary" as the "location_mode", the verification of the repository is limited to checking that the container specified exists; in particular the tests-* files are not created because the secondary endpoint is read only.

NOTE: for a given storage account, only one location_mode can be active at a time.

An example showing settings in elasticsearch.yml:

cloud.azure.storage.account: [ "azstorageaccount1", "mystorage2" ]
cloud.azure.storage.key: [ "", "" ]

A sample repository specification using the secondary endpoint:

{ "type": "azure", "settings": { "account" : "mystorage2", "container": "snapshots-20150701",
"location_mode": "secondary_only"}}

…to ES master. Enhancements as discussed on https://github.com/craigwi/elasticsearch-cloud-azure/releases/tag/v2.7.1-craigwi: 1.supports multiple storage accounts; cloud.azure.storage.account and cloud.azure.storage.key may now be an array of accounts / keys; the arrays must be the same length; the account at an index must match the key at the same index. 2.an Azure repository specification ("type" : "azure") allows for two new settings. "account" specifies the name of the account to be used and must be one of the items in cloud.azure.storage.account. If "account" is not specified, the first item in the list of accounts is used. The other new setting "location_mode" may be used to specify the endpoint. This defaults to "primary_only" and may also be "primary_then_secondary", "secondary_only" or "secondary_then_primary". 3.when a repository is registered using "secondary_only" or "secondary_then_primary" as the "location_mode", the verification of the repository is limited to checking that the container specified exists; in particular the tests-* files are not created because the secondary endpoint is read only. NOTE: for a given storage account, only one location_mode can be active at a time. An example showing settings in elasticsearch.yml: cloud.azure.storage.account: [ "azstorageaccount1", "mystorage2" ] cloud.azure.storage.key: [ "", "" ] A sample repository specification using the secondary endpoint: { "type": "azure", "settings": { "account" : "mystorage2", "container": "snapshots-20150701", "location_mode": "secondary_only"}}

dadoonet · 2015-09-01T10:18:20Z

Hi @craigwi.

Thank you for bringing that PR here.

As I wrote in elastic/elasticsearch-cloud-azure#93 (comment), I think we should do it a bit differently unless I'm missing something.

Pasting the discussion here:

Note that you also raised a valid point which is that we need to support in elasticsearch.yml multiple credentials.
We could imagine that as a generic feature whatever repository type you want to use.

Let say that we can now create something like:

cloud:
    azure:
        storage:
            azure1:
              account: your_azure_storage_account1
              key: your_azure_storage_key1
              default: true
            azure2:
              account: your_azure_storage_account2
              key: your_azure_storage_key2
            azure3:
              account: your_azure_storage_account3
              key: your_azure_storage_key3

Then when we create the repo, we can specify which credentials we want to use:

# use credentials 2
PUT _snapshot/my_backup2
{
  "type": "azure",
  "settings": {
      "credentials": "azure2",
      "container": "backup_container",
      "base_path": "backups"
  }
}

# This one will use the one marked as "default"
PUT _snapshot/my_backup3
{
  "type": "azure"
}

I know that we need to make one of those repo readonly. The infra is now ready. We "just" have basically to expose it in azure plugin now.

I think that you'll be able to define with this something really similar with what you are trying to achieve here.

elasticsearch.yml

Instead of:

cloud.azure.storage.account: [ "azstorageaccount1", "mystorage2" ]
cloud.azure.storage.key: [ "", "" ]

define:

cloud.azure.storage.azure1.account: "azstorageaccount1"
cloud.azure.storage.azure1.key: ""
cloud.azure.storage.azure2.account: "mystorage2"
cloud.azure.storage.azure2.key: ""

Usage

Instead of:

PUT _snapshot/myrepo
{ 
  "type": "azure", 
  "settings": { 
     "account" : "mystorage2", 
     "container": "snapshots-20150701",
     "location_mode": "secondary_only"
   }
}

Define:

PUT _snapshot/myrepo
{ 
  "type": "azure", 
  "settings": { 
     "credentials" : "azure2", 
     "container": "snapshots-20150701",
     "readonly": true,
     "location_mode": "secondary_only"
   }
}

That said, I think we can easily auto detect that we are using secondary endpoint here so we automatically set readonly to true unless the user explicitly define it to false.

I'm also wondering if we should not prefer an easier setting like use_secondary: true instead of location_mode: secondary_only and location_mode: primary_only. Unless you are thinking of a future usage?

While reading the Azure Storage Replication documentation, I was also wondering if we really need this flag?

When you enable read-only access to your data in the secondary region, your data is available on a secondary endpoint, in addition to the primary endpoint for your storage account. The secondary endpoint is similar to the primary endpoint, but appends the suffix -secondary to the account name. For example, if your primary endpoint for the Blob service is myaccount.blob.core.windows.net, then your secondary endpoint is myaccount-secondary.blob.core.windows.net. The access keys for your storage account are the same for both the primary and secondary endpoints.

If I understand it correctly, it means to me that the azure client basically adds -secondary in the endpoint when location_mode is secondary_only.

In that case, does something like the following would work?

cloud.azure.storage.azure1.account: "myaccount"
cloud.azure.storage.azure1.key: ""
cloud.azure.storage.azure2.account: "myaccount-secondary"
cloud.azure.storage.azure2.key: ""

PUT _snapshot/myrepo
{ 
  "type": "azure", 
  "settings": { 
     "credentials" : "azure2", 
     "container": "snapshots-20150701",
     "readonly": true
   }
}

I did not check. May be myaccount-secondary is a forbidden account name?

What do you think?

@imotov @skearns64 @ppf2 Feel free to add also your thoughts here!

dadoonet · 2015-09-01T10:24:57Z

The access keys for your storage account are the same for both the primary and secondary endpoints.

I think this answers my last question. The end point is different but the account must be kept as is, right?

craigwi · 2015-09-01T17:24:13Z

Hi David,
I did see your comments. I ported my previous proposal was because it is a simple, tested and full featured solution and I wanted to start this discussion in the context of a pull request for the new location of the sources for the cloud-azure plugin.
Taking the points in reverse order, the Azure client apis provide a flexible means of accessing primary and secondary storage endpoints. There are four cases and my proposal supports them all. The four cases:

Primary only
Primary then secondary
Secondary only
Secondary then primary

To support those combinations, which I see no reason not to support, one passes the account AND the location mode. The client library uses the secondary endpoint as required. That is, the concept in Azure is ONE storage account and key with the endpoints derived indirectly from the use case (and potentially other settings).

My conclusion on this: location_mode, as I have implemented, is the correct way in Azure to use primary and secondary endpoints. The Java client library supports this and my proposed solution supports this.

Regarding the “readonly” support for repositories, I like the feature! While secondary endpoints in Azure are NECESSARILY readonly, we should enable the use of a primary endpoint that it should be accessed readonly. That is, the concept of location_mode and readonly-ness are mostly orthogonal. As noted elsewhere, location mode cases #2, #3 and #4 above are implicitly readonly and should be treated as if “readonly”: true was set.

My conclusion on this: the new “readonly” setting does not eliminate the need for “location_mode”.

As for the configuration in yml, independent of the above points, I am fine with either the azure1, azure2 approach or the arrays approach. I actually started with the approach you suggested, but found the array approach more like the rest of the settings in the yml file and super simple to implement. It is clear that in general there might be lots of settings per storage account; cf. https://azure.microsoft.com/en-us/documentation/articles/storage-configure-connection-string. However, it is extremely rare that one would setting them differently for different storage account in one deployment of ES. Thus it would be reasonable to set, for example, the blob endpoint once for use with all storage accounts.

My conclusion on this: either approach is fine.

Let me know what you think.

Craig.

skearns64 · 2015-09-01T20:51:20Z

@craigwi - I agree with just about all of your points.

When it comes to the configuration in YML, I prefer the azure1, azure2 approach outlined by @dadoonet

ppf2 · 2015-09-15T16:42:00Z

Hi Craig, spoke with @dadoonet and @skearns64 , we will take the PR from here and modify as needed. Thx for the contribution!

craigwi · 2015-09-15T16:59:15Z

Sounds good. Thanks for letting me know.

dadoonet · 2015-10-27T14:45:28Z

I'm closing this one in favor of #13779

Follow up for elastic#13228. This commit adds support for a secondary storage account: ```yml cloud: azure: storage: my_account1: account: your_azure_storage_account1 key: your_azure_storage_key1 default: true my_account2: account: your_azure_storage_account2 key: your_azure_storage_key2 ``` When creating a repository, you can choose which azure account you want to use for it: ```sh curl -XPUT localhost:9200/_snapshot/my_backup1?pretty -d '{ "type": "azure" }' curl -XPUT localhost:9200/_snapshot/my_backup2?pretty -d '{ "type": "azure", "settings": { "account" : "my_account2", "location_mode": "secondary_only" } }' ``` `location_mode` supports `primary_only` or `secondary_only`. Defaults to `primary_only`. Note that if you set it to `secondary_only`, it will force `read_only` to true.

Follow up for #13228. This commit adds support for a secondary storage account: ```yml cloud: azure: storage: my_account1: account: your_azure_storage_account1 key: your_azure_storage_key1 default: true my_account2: account: your_azure_storage_account2 key: your_azure_storage_key2 ``` When creating a repository, you can choose which azure account you want to use for it: ```sh curl -XPUT localhost:9200/_snapshot/my_backup1?pretty -d '{ "type": "azure" }' curl -XPUT localhost:9200/_snapshot/my_backup2?pretty -d '{ "type": "azure", "settings": { "account" : "my_account2", "location_mode": "secondary_only" } }' ``` `location_mode` supports `primary_only` or `secondary_only`. Defaults to `primary_only`. Note that if you set it to `secondary_only`, it will force `read_only` to true. (cherry picked from commit 79a4d9c) # Conflicts: # docs/plugins/repository-azure.asciidoc # plugins/cloud-azure/src/main/java/org/elasticsearch/plugin/cloud/azure/CloudAzurePlugin.java # plugins/cloud-azure/src/test/java/org/elasticsearch/repositories/azure/AzureSnapshotRestoreTests.java # plugins/repository-azure/src/main/java/org/elasticsearch/cloud/azure/AzureRepositoryModule.java

craigwi mentioned this pull request Aug 31, 2015

[cloud-azure] Allow multiple repositories with different settings #12759

Closed

dadoonet added discuss :Plugin Cloud Azure labels Sep 1, 2015

dadoonet added the v2.1.0 label Sep 23, 2015

dadoonet self-assigned this Sep 24, 2015

dadoonet mentioned this pull request Sep 24, 2015

Add support for secondary azure storage account #13779

Merged

dadoonet removed discuss v2.1.0 labels Oct 27, 2015

dadoonet closed this Oct 27, 2015

clintongormley added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Cloud Azure labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For Azure, support multiple storage accounts and secondary endpoints (which are readonly) #13228

For Azure, support multiple storage accounts and secondary endpoints (which are readonly) #13228

craigwi commented Aug 31, 2015

dadoonet commented Sep 1, 2015

dadoonet commented Sep 1, 2015

craigwi commented Sep 1, 2015

skearns64 commented Sep 1, 2015

ppf2 commented Sep 15, 2015

craigwi commented Sep 15, 2015

dadoonet commented Oct 27, 2015

For Azure, support multiple storage accounts and secondary endpoints (which are readonly) #13228

For Azure, support multiple storage accounts and secondary endpoints (which are readonly) #13228

Conversation

craigwi commented Aug 31, 2015

dadoonet commented Sep 1, 2015

elasticsearch.yml

Usage

dadoonet commented Sep 1, 2015

craigwi commented Sep 1, 2015

skearns64 commented Sep 1, 2015

ppf2 commented Sep 15, 2015

craigwi commented Sep 15, 2015

dadoonet commented Oct 27, 2015