Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch changed the SKU of load balancers provisioned in the user subscription. How to switch back? #66

Closed
dtrefilov-piterbyte opened this issue Jun 20, 2019 · 8 comments
Labels
question Question

Comments

@dtrefilov-piterbyte
Copy link

Problem Description

We noticed that around May 22 Azure Batch has started to provision Standard SKU load balancers in the user subscription when the pool is provisioned in a virtual network. Previously it provisioned Basic SKU load balancers. The problem is - in case of our kind of workloads the associated charges $0.005 per GB for the data processed exceed the charges for running virtual machines. Azure Support seems to be unable to find the source of this change in the cloud platform in such a big corporation as Microsoft. I don't see any way to switch back to Basic SKU and there were no preliminary announcement or notice about this change. What options do we have to drop the Azure Batch expenses back to the level we had 1 month ago?

Steps to Reproduce

Provision pool with the user virtual network.

Expected Results

Basic SKU load balancers to be provisioned in the resource group with the VNET:
basic

Actual Results

Standard SKU load balancers are provisioned:
standard

@alfpark
Copy link
Contributor

alfpark commented Jun 20, 2019

We provided notification for this change more than 30 days before it occurred. You would have received email notification for Batch account(s) linked to your subscription(s) similar to the following:

image

This is a permanent change. You can consider transferring your data through services that support ARM Azure virtual network service endpoints and join your compute nodes to the properly configured subnet allowing the appropriate service endpoint(s) of the virtual network. Data egressed to these endpoints will not incur a transfer charge as per current policies.

Please consult the Load Balancer pricing page and the Static IP pricing page.

@dtrefilov-piterbyte
Copy link
Author

Thanks for the quick response. Do these charges apply if we don't specify the custom VNET in the pool configuration (i.e. in the case when these load balancers aren't provisioned in the user subscription)?
The only reason we use custom VNETs is the ability to block inbound RDP traffic, which is opened to all by default. But turns out Batch has a native method of blocking this traffic without NSG: https://docs.microsoft.com/en-us/azure/batch/pool-endpoint-configuration#example-deny-all-rdp-traffic

@alfpark
Copy link
Contributor

alfpark commented Jun 20, 2019

Load balancers are always provisioned (with or without virtual networks bound to the pool) and regardless of the account pool allocation mode (Batch service vs. User Subscription).

You will need a virtual network configured on the pool and egress to the corresponding Azure service(s) for the service endpoint(s) in the properly configured subnet where the compute nodes exist to not incur a transfer charge as per current policies.

@dtrefilov-piterbyte
Copy link
Author

Tried it with a clean subscription and a new batch account. Created Pool without virtual network, no additional resources have been created (of course Batch provisioned VMSS, load balancer, static IP somewhere, but the point is - they are not created in the user subscription). From my understanding, this will effectively eliminate additional charges and subscription quota limitations associated with these resources. Please correct me if I'm wrong.
The only mention of load balancers I've found in the documentation is related to the virtual networks:
https://docs.microsoft.com/en-us/azure/batch/batch-virtual-network#pools-in-the-virtual-machine-configuration

Regarding your advice about the proper configuration of service endpoints. In our case, it won't help, as most of the traffic is the Internet traffic, which always flows through the load balancer frontend IPs. Of course, we can force-route this traffic through a virtual appliance, but that would kill the scalability.

@alfpark
Copy link
Contributor

alfpark commented Jun 21, 2019

From my understanding, this will effectively eliminate additional charges and subscription quota limitations associated with these resources.

That is incorrect, you will still be charged. On Batch service pool allocation Batch accounts, these resources (including the SLB and IP addresses) are contained within the Batch service but are still charged to the subscription associated with the Batch account as part of your allocation.

If your scenario requires you to directly egress data from the compute nodes to the Internet, then you will be charged, as per Standard Load Balancer pricing.

@alfpark alfpark added the question Question label Jun 21, 2019
@alfpark alfpark closed this as completed Jul 1, 2019
@dtrefilov-piterbyte
Copy link
Author

Why the issue was closed? The Batch service was Generally Available for years, and now it's just spoiled. Is it possible to have this configurable with backward compatibility, like it was done in kubernetes?kubernetes/kubernetes#61884

@jis260
Copy link

jis260 commented Nov 28, 2020

@dtrefilo did you even get this working correctly?

@alfpark I tried what you suggested, but I am struggling to get my Azure batch nodes to start within a Pool that is configured to use a virtual network. The virtual network has been configured with a service endpoint policy that has a "Microsoft.Storage" policy definition and it points at a single storage account. Without the service endpoints defined on the virtual network the Azure batch pool works as expected, but with it the following error occurs and the node never starts.

I have tried creating the Batch account in both Pool allocation modes. This did not seem to make a difference, the pool resizes successfully and then the nodes are stuck in "Starting" mode. In the "User Subscription" mode I found the start-up error because I can see the VM instance in my account:

VM has reported a failure when processing extension 'batchNodeExtension'. Error message: "Enable failed: processing file downloads failed: failed to download file[0]: failed to download file: unexpected status code: actual=403 expected=200" More information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot

From what I can determine this is an Azure VM extension that is running to configure the VM for Azure Batch. My base image is Canonical, ubuntuserver, 18.04-lts (batch.node.ubuntu 18.04). I can see that the extensions is attempting to download from:

https://a52a7f3c745c443e8c2cac69.blob.core.windows.net/nodeagentpackage-version9-22-0-2/Ubuntu-18.04/batch_init-ubuntu-18.04-1.8.7.tar.gz (note I removed the SAS token from this URL for posting here)

there are 8 further files that are downloaded and it looks like this is configuring the Batch agent on the node.

The 403 error indicates that the node cannot connect to this storage account, which makes sense given the service endpoint policy. It does not include this storage account within it and this storage account is external to my Azure subscription. I thought that I might be able to add it to the service endpoint policy, but I have no way of determining what Azure subscription it is part of it. If I knew this I thought I could add it like:

Endpoint policy allows you to add specific Azure Storage accounts to allow list, using the resourceID format. You can restrict access to all storage accounts in a subscription
E.g. /subscriptions/subscriptionId (from https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoint-policies-overview)

I tried adding security group rules using service tags for Azure storage, but this did not help. The node still cannot connect and this makes sense given the description of service endpoint policies.

I would really appreciate any pointers because I am running out of ideas to try!

@Gaploid
Copy link

Gaploid commented Dec 18, 2020

@dtrefilo @alfpark if I will set up my own NAT on VM and route traffic from nodes in the batch cluster through it. Will it eliminate the egress traffic cost? Also, there is the possibility to setup Batch pools without public Ip address https://docs.microsoft.com/en-us/azure/batch/batch-pool-no-public-ip-address will it be also provisioned Load balancer there as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question
Projects
None yet
Development

No branches or pull requests

4 participants