Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expired API secret - Causes failure on api/workspaces #4110

Closed
JaimieWi opened this issue Oct 22, 2024 · 13 comments · Fixed by #4112
Closed

Expired API secret - Causes failure on api/workspaces #4110

JaimieWi opened this issue Oct 22, 2024 · 13 comments · Fixed by #4112
Assignees
Labels
bug Something isn't working

Comments

@JaimieWi
Copy link
Contributor

Describe the bug
All our TRE Users are unable to access the TRE home page where APi call /api/workspaces is called. They are faced with a 500 error
Image

The user is able to access their workspace, but cannot access the TRE home page.

Initially, this presented as an expired password in application insights. Log portion below:

Traceback (most recent call last):
  File "/api/services/aad_authentication.py", line 107, in __call__
    raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f'{strings.ACCESS_USER_DOES_NOT_HAVE_REQUIRED_ROLE}: {self.require_one_of_roles}', headers={"WWW-Authenticate": "Bearer"})
fastapi.exceptions.HTTPException: 403: The user is missing a required role: ['TREAdmin']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/api/api/routes/workspaces.py", line 62, in retrieve_users_active_workspaces
    user = await get_current_admin_user(request)
  File "/api/services/aad_authentication.py", line 110, in __call__
    raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f'{strings.ACCESS_USER_DOES_NOT_HAVE_REQUIRED_ROLE}: {self.require_one_of_roles}', headers={"WWW-Authenticate": "Bearer"})
fastapi.exceptions.HTTPException: 403: The user is missing a required role: ['TREAdmin']

During handling of the above exception, another exception occurred:
.....
Exception: API app registration access token cannot be retrieved. invalid_client: AADSTS7000222: The provided client secret keys for app '<API app registration ID>' are expired.

In Entra, the "sp-aztre-cicd" service principal secret had expired (not the main API app reg). This was updated and added to AZURE_CREDENTIALS. Rerunning the pipeline got past the initial step that was failing. We now saw an error related to the TEST_ACCOUNT_CIENT_ID when registering bundles. Why would this secret expire?

Login Succeeded
Using TEST_ACCOUNT_CLIENT_ID to sign in to tre CLI
ClientSecretCredential.get_token failed: Azure Active Directory error '(invalid_client) AADSTS7000222: The provided client secret keys for app '***' are expired. Visit the Azure portal to create new keys for your app: https://aka.ms/NewClientSecret, or consider using certificate credentials for added security: https://aka.ms/certCreds. Trace ID: 7674e1b0-c710-4be6-aad3-50d40e5d0100 Correlation ID: dbbeeb4a-3cb3-46fe-bcc6-39d4a945498e Timestamp: 2024-10-21 15:49:56Z'

Now troubleshooting in our DEV environment

Troubleshooting steps tried:

  1. Update the "sp-aztre-cicd" service principal secret and update AZURE_CREDENTIALS - rerun pipeline. Result: Runs past initial steps, fails at registering bundles.
  2. Update Automation admin secret by running az ad sp credential reset --id "<ID>" --query 'password' --output tsv --only-show-errors and updating test_account_client_secret in GitHub secrets. (In DEV where error had not occurred, but no new errors raised when updating secret)
  3. API app registration secret expired error, still present. Use command similar to above to update the password and add it into KeyVault and GitHub secrets. Re run pipeline - DEV pipeline successful. Error now does not recognise secret.
Exception: API app registration access token cannot be retrieved. invalid_client: AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app '<API app ID>'

This TRE was deployed on 18th October 2023, the first failure happened on 18th October 2024. The secrets do not have an expiry in KeyVault, was a limit set on creation?

Steps to reproduce

  1. Deploy a TRE using CICD 1 year ago
  2. Have many users with TRE User and Workspace Researcher roles only
  3. Try to access the main UI.

Azure TRE release version (e.g. v0.14.0 or main):
v0.19.1

Deployed Azure TRE components - click the (i) in the UI:

UI Version: 0.5.28
API Version: 0.18.11

@JaimieWi JaimieWi added the bug Something isn't working label Oct 22, 2024
@marrobi
Copy link
Member

marrobi commented Oct 22, 2024

@JaimieWi can you read #3998 (comment) and below, and see if that helps?

I've seen an issue before where the core infra needs updating twice - pipeline running that includes make deploy-core twice.

@marrobi
Copy link
Member

marrobi commented Oct 22, 2024

@JaimieWi to answer your question, looks like default expiry is 1 year https://learn.microsoft.com/en-us/cli/azure/ad/sp/credential?view=azure-cli-latest#az-ad-sp-credential-reset

@JaimieWi
Copy link
Contributor Author

Hi @marrobi Thank you for your quick response! I then re ran the DEV pipeline and it worked straight away.

I have followed the same steps for PROD and still see the error like below

Exception: API app registration access token cannot be retrieved. invalid_client: AADSTS7000215: Invalid client secret provided. 

This was still manually running the credential reset command, so I will look into trying make auth instead (which we planned to do anyway, but wanted to try the quick fix). I'm not sure why it would make a difference in terms of recognising the secret?

I can see now that the maximum for a secret is also 2 years. We will put something in place to make sure make auth is run on a yearly basis. Thanks!

@marrobi
Copy link
Member

marrobi commented Oct 24, 2024

@JaimieWi have you run the dev pipeline twice?

The secret it is complaining about is the api_client_secret which secret did you update?

app = ConfidentialClientApplication(client_id=config.API_CLIENT_ID, client_credential=config.API_CLIENT_SECRET, authority=f"{config.AAD_AUTHORITY_URL}/{config.AAD_TENANT_ID}")

@JaimieWi
Copy link
Contributor Author

@marrobi I did run the DEV pipeline twice, that one is now successful.

I changed the secrets in PROD and have rerun it twice, no change. The error is still present for our TRE Users.

  • I have changed the TEST_ACCOUNT_CLIENT_SECRET (to fix the error in registering bundles).
  • I have also changed the API CLIENT SECRET, this is the one that is not being picked up. (I have done it twice to make sure I wasn't missing something).

@marrobi
Copy link
Member

marrobi commented Oct 24, 2024

@JaimieWi hmm, seems like a recurrence of #2463

Can you try updating yourself in KeyVault? Wonder if Terraform isn't updating the secret.

@marrobi
Copy link
Member

marrobi commented Oct 24, 2024

Also check the value in the Web App environment variables matches the correct version of the secret in KeyVault

Image

@JaimieWi
Copy link
Contributor Author

Hi @marrobi

Steps completed:

  1. Run make auth
  2. Update GitHub secrets
  3. Manually update Key vault secrets
  4. Run pipeline twice

Still see the error. The web app is not using the correct value. It shows the ID of the original secret, has not updated to use a later one.

How do I force the web app to recognise the new secret?

@marrobi
Copy link
Member

marrobi commented Oct 24, 2024

In KeyVault, copy the secret identifier:

Image

and updated it in the App service as per the image pasted above (note its just the SecretUri)

Let us know how you get on.

@marrobi
Copy link
Member

marrobi commented Oct 24, 2024

It seems to be related to this - hashicorp/terraform-provider-azurerm#3129 and hashicorp/terraform-provider-azurerm#8745 - the issue has been closed, but not sure it is properly resolved.

@marrobi
Copy link
Member

marrobi commented Oct 24, 2024

Hmm, we are using:

"AAD_TENANT_ID"                                  = "@Microsoft.KeyVault(SecretUri=${azurerm_key_vault_secret.auth_tenant_id.id})"
    "API_CLIENT_ID"                                  = "@Microsoft.KeyVault(SecretUri=${azurerm_key_vault_secret.api_client_id.id})"
    "API_CLIENT_SECRET"                              = "@Microsoft.KeyVault(SecretUri=${azurerm_key_vault_secret.api_client_secret.id})"

We should try changing to this format:

resource "azurerm_function_app" "function_app" {
  ...
  app_settings = {
    ...
    # this works
    "MY_SECRET" = "@Microsoft.KeyVault(SecretUri=${var.key_vault_vault_uri}secrets/MY-SECRET/${azurerm_key_vault_secret.my_secret.version})",
  }
}

@JaimieWi
Copy link
Contributor Author

That has worked! Thank you @marrobi !

Just to reiterate, steps to update Entra secrets:

  1. Run make auth
  2. Update GitHub secrets
  3. Manually update Key vault secrets
  4. Run pipeline twice (This may not be required if secrets are not updated anyway?)
  5. Update API_CLIENT_SECRET value in web app environment variables to use latest version
  6. Restart web app
  7. Hard refresh TRE page (temporarily see 504 error that resolves itself within 5 minutes)

This can now be closed, but would be great if this could be an automated process (as you mentioned above).

@marrobi
Copy link
Member

marrobi commented Oct 24, 2024

Great. I've added this to the engineering board so will leave this issue open, along with #2463 until we make the change to how to reference the secrets.

@marrobi marrobi self-assigned this Oct 24, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Azure TRE - Engineering Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants