Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authentication with environment variables not working #482

Closed
dbaltor opened this issue Nov 5, 2019 · 13 comments
Closed

Authentication with environment variables not working #482

dbaltor opened this issue Nov 5, 2019 · 13 comments

Comments

@dbaltor
Copy link

dbaltor commented Nov 5, 2019

The Telegraf agent fails to connect to Azure_Monitor using environment variables AZURE_TENANT_ID, AZURE_CLIENT_ID and AZURE_CLIENT_SECRET with the following error message:
2019-11-05T15:57:50Z E! [agent] Error writing to outputs.azure_monitor: unable to fetch authentication credentials: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://uksouth.monitoring.azure.com/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachines/azuredemo-ops-manager-vm/metrics: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}

OS:
#64~16.04.1-Ubuntu SMP Wed Aug 7 14:10:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Telegraf agent:

# systemctl status telegraf.service
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
   Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-11-05 13:10:03 UTC; 3s ago
     Docs: https://github.com/influxdata/telegraf
 Main PID: 1347 (telegraf)
   CGroup: /system.slice/telegraf.service
           └─1347 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
Nov 05 13:10:03 xxx systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Nov 05 13:10:03 xx telegraf[1347]: 2019-11-05T13:10:03Z I! Starting Telegraf 1.12.4

I have got the following env variables configured via /etc/environment:

$ env | grep AZURE
AZURE_CLIENT_ID=xxxxxxx
AZURE_TENANT_ID=xxxxxxxx
AZURE_CLIENT_SECRET=xxxxxxxx
@jhendrixMSFT
Copy link
Member

I believe this means that the client ID specified in AZURE_CLIENT_ID doesn't exist in the tenant ID specified in AZURE_TENANT_ID. If you have the Azure CLI installed please run the following command to confirm.

az ad sp show --id $AZURE_CLIENT_ID

And look at the value of the appOwnerTenantId property, it should match $AZURE_TENANT_ID.

@dbaltor
Copy link
Author

dbaltor commented Nov 7, 2019

Hey Joel, many thanks for acting on this. This seems not to be the case as the AZURE_CLIENT_ID 's appOwnerTenantId property is indeed holding the same value as the env variable AZURE_TENANT_ID. Here is the command I used as per your guidance:
$az ad sp show --id xxxxxx | jq .appOwnerTenantId

@jhendrixMSFT
Copy link
Member

jhendrixMSFT commented Nov 7, 2019

It was just a guess as the error message returned from the oauth endpoint is vague. Before I follow up on that, is this running in an Azure VM and are you sure the env vars are properly set? Unfortunately we don't have logging wired into the auth package. Since you're using auth.NewAuthorizerFromEnvironmentWithResource() can you add some logging to EnvironmentSettings.GetAuthorizer() to verify it's picking up the client secrets?

@dbaltor
Copy link
Author

dbaltor commented Nov 8, 2019

I can confirm the issue is happening on an Azure VM and the variable are set globally using /etc/environment. We added debugging messages on the auth.go and auth_test.go and changed the test code to not set the env vars so we could check if they are being properly read. Tests on my laptop confirmed the code is able to read them.

Meanwhile, I'd like to share with you this link of a very similar issue with Azure CLI which seems to have the same root cause.

@jhendrixMSFT
Copy link
Member

jhendrixMSFT commented Nov 8, 2019

Thanks for the info. And you see the creds printed out when you run on the VM (not just a local test)? I ask because the returned error message is similar to ones you'd get for a failed MSI authentication.

Assuming you do see the correct creds printed when running on a VM, do you know if the initial token acquisition works and this fails later on when trying to refresh?

@dbaltor
Copy link
Author

dbaltor commented Nov 8, 2019

I ask because the returned error message is similar to ones you'd get for a failed MSI authentication.

Can you please elaborate more on that? This use case is about running Telegraf agent which uses the go-autorest, but the agent documentation claims the environment-based auth is supposed to be used to replace MSI auth. This rational seems to be confirmed by the EnvironmentSettings.GetAuthorizer() code, doesn't it?
I can confirm the VM hasn't got any MSI assigned though.

@jhendrixMSFT
Copy link
Member

You are correct that client/secret credentials will be preferred if available. I brought this up as the error message from the authentication endpoint is very vague, and I have seen the same message returned from failed MSI authentication attempts. If the VM doesn't have any MSI assigned it further suggests that the environment credentials aren't being found.

Are you able to run this code on the VM with the extra diagnostics you added to ensure the environment vars are being picked up?

@dbaltor
Copy link
Author

dbaltor commented Nov 8, 2019

Got it! That makes sense. Unfortunately I was not able to do it today. I couldn't rebuild the Telegraf agent using the changed go-autorest package. I suspect this snippet in authorisation.go is being executed and causing the error.

@jhendrixMSFT
Copy link
Member

Yep that's the place. This ultimately calls ServicePrincipalToken.refreshInternal() which sends the token request.

@dbaltor
Copy link
Author

dbaltor commented Nov 13, 2019

Hey Joel, we managed to recompile Telegraf agent using our forked version with the debugging msgs and it showed us that the env vars are being read. During the process, we also got the pleasant surprise that the new binary is able to authenticate against AAD using the env vars. Digging further, we found that the Telegraf agent currently uses the go-autorest release 10.12.0. We believe this issue has been fixed by this commit, therefore we have just created the PR #6656 on the Telegraf repo.

@jhendrixMSFT
Copy link
Member

That commit fixed multi-tenant authorization which doesn't appear to be the case here. At any rate if moving to the latest version resolves the issue then perhaps that's good enough. I'm going to close this for now, please ping if you have further questions.

@kunalnanda
Copy link

I found that the MS documentation is very specific to Linux. I had the following environment variables set in Windows:
ARM_SUBSCRIPTION_ID
ARM_CLIENT_ID
ARM_CLIENT_SECRET
ARM_TENANT_ID
ARM_ENVIRONMENT

Replacing ARM with AZURE fixed it for me.

@jhendrixMSFT
Copy link
Member

@kunalnanda I don't believe that all SDKs support these environment variables (unfortunately there were several implementations floating around until we standardized). For the env vars supported by the Go SDK have a look at the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants