Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurerm_api_management_api does not properly follow up when Azure responds with 202 #24379

Closed
1 task done
Sefriol opened this issue Jan 3, 2024 · 11 comments · Fixed by #27758
Closed
1 task done

azurerm_api_management_api does not properly follow up when Azure responds with 202 #24379

Sefriol opened this issue Jan 3, 2024 · 11 comments · Fixed by #27758
Labels
service/api-management upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR v/3.x

Comments

@Sefriol
Copy link

Sefriol commented Jan 3, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

1.5.2

AzureRM Provider Version

3.85

Affected Resource(s)/Data Source(s)

azurerm_api_management_api

Terraform Configuration Files

resource "azurerm_api_management_api" "test" {
  provider            = azurerm.apim
  name                = "test-api"
  resource_group_name = data.azurerm_api_management.test.resource_group_name
  api_management_name = data.azurerm_api_management.test.name
  revision            = "1"
  display_name        = "Test API"
  path                = "test"
  protocols           = ["https"]
  service_url         = "https://${azurerm_linux_function_app.test.default_hostname}/api"

  import {
    content_format = "openapi+json-link"
    content_value  = "https://${azurerm_linux_function_app.test.default_hostname}/api/swagger.json?dummy_change_hash=${null_resource.deploy_files.triggers.dir_sha1}"
  }
}

Debug Output/Panic Output

HTTP/2.0 202 Accepted
Content-Length: 0
Cache-Control: no-cache
Date: Wed, 03 Jan 2024 13:50:31 GMT
Expires: -1
Location: https://management.azure.com/subscription/test-id/resourceGroups/rg-apim-test/providers/Microsoft.ApiManagement/service/apim-test/apis/test-api;rev=1?api-version=2022-08-01&asyncId=id&asyncCode=200

Expected Behaviour

202 Accepted   Request to create or update API was accepted. Location header contains the URL where the status of the long running operation can be checked.

Currently terraform does not use the location URL to check whether resource update was successful. There is a flaw in APIM resources where operation can be claimed to be successful, but actual error shows up in the location url and location URL responds with 400.

Actual Behaviour

Currently terraform reports operation to be succesful, even though nothing is actually updated.

Steps to Reproduce

  1. Create an API with terraform
  2. Try to update it with incorrect OpenAPI spec
  3. Terraform says that update was successful while it was not

Important Factoids

No response

References

APIM Create or Update API

@sinbai
Copy link
Contributor

sinbai commented Jan 4, 2024

Hi @Sefriol thanks for opening this issue. In order to find the root cause, a TF configuration file (containing dependent resources and variable values) and detailed steps would be of great help in quickly reproducing/troubleshooting the issue. Could you please provide the complete TF configuration and detailed repro steps to help reproduce this issue? Thanks in advanced.

In addition, could you please clarify that why do you think "terraform does not use the location URL to check whether resource update was successful"?

@Sefriol
Copy link
Author

Sefriol commented Jan 4, 2024

When terraform updates an API it uses https://management.azure.com/subscription/test-id/resourceGroups/rg-apim-test/providers/Microsoft.ApiManagement/service/apim-test/apis/test-api URL with PUT to update it.

However, when OpenAPI spec is provided from external URL (i.e. content_format = "openapi+json-link", not sure if it will also happen if it is directly provided. I have not tested), Azure API will respond with 202. This means Azure is processing the request. In the same response, it also provides a Location header which gives you an URL where the operation status is provided.

If the OpenAPI spec is invalid, this can fail. However this failure does not seem to be reported anywhere else in Azure (Activity Log etc tells that operation succeeded). If processing this OpenAPI spec fails, it this URL in Location header will respond with status 400 and an error message.

I will try to figure out if there is a simple TF config that could be used to replicate this. I had an issue where there were 2 APIs with same display name and it failed to update because of that. Terraform and Azure did not otherwise report this as failure, but I noticed that the OpenAPI spec was not updated in APIM. I am not sure which kind of errors will cause operation failure specifically.

@sinbai
Copy link
Contributor

sinbai commented Jan 5, 2024

Thanks for the explanation. First of all, I would like to explain that if the Azure-AsyncOperation (high priority) or Location returned in the API response is the request URL itself, Terraform uses the request url instead of the Azure-AsyncOperation/Location URL, this is by design. Per the information provided, I'm assuming that "https://management.azure.com/subscription/test-id/resourceGroups/rg-apim-test/providers/Microsoft.ApiManagement/service/apim-test/apis/test-api;rev=1?api-version=2022-08-01&asyncId=id&asyncCode=200" and "https://management.azure.com/subscription/test-id/resourceGroups/rg-apim-test/providers/Microsoft.ApiManagement/service/apim-test/apis/test-api;rev=1?api-version=2022-08-01" are essentially the same, right? If you notice a difference, please let us know(the difference means that the poll URL of Location can get the error message but the poll request URL cannot).

In addition, Terraform does not perform validity verification on the content of content_value. So the specific results depend on the API response. If it is determined that the API does not return error information, it is recommended to open an issue in this repo to find a solution.

@Sefriol
Copy link
Author

Sefriol commented Jan 5, 2024

Thanks for the explanation. First of all, I would like to explain that if the Azure-AsyncOperation (high priority) or Location returned in the API response is the request URL itself, Terraform uses the request url instead of the Azure-AsyncOperation/Location URL, this is by design. Per the information provided, I'm assuming that "https://management.azure.com/subscription/test-id/resourceGroups/rg-apim-test/providers/Microsoft.ApiManagement/service/apim-test/apis/test-api;rev=1?api-version=2022-08-01&asyncId=id&asyncCode=200" and "https://management.azure.com/subscription/test-id/resourceGroups/rg-apim-test/providers/Microsoft.ApiManagement/service/apim-test/apis/test-api" are essentially the same, right?

From my understanding, they are not the same.

In addition, Terraform does not perform validity verification on the content of content_value. So the specific results depend on the API response. If it is determined that the API does not return error information, it is recommended to open an issue in this repo to find a solution.

This is fine. This should be exactly what the Location URL does.

EDIT:

Example response from Location URL:

STATUS: 400

{
    "error": {
        "code": "ValidationError",
        "message": "One or more fields contain incorrect values:",
        "details": [
            {
                "code": "ValidationError",
                "target": "ApiVersion",
                "message": "Can't set VersionName when ApiVersionSetId is not set."
            }
        ]
    }
}

EDIT 2: We have also made a ticket about this for Microsoft since Activity Log should report this failure.

@Sefriol
Copy link
Author

Sefriol commented Jan 9, 2024

In the support call, we noticed, that these OpenAPI spec errors are same which would appear in the UI if you tried to import it manually. However, it seems that it will process it locally while through API it will process in cloud. UI leaves no mark in the activity log while API leaves "Accepted" and sometimes "Succeeded", when update process fails.

Microsoft will follow up us on whether this is intended that it won't show up in Activity Log or not.

@rcskosir rcskosir added the upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR label Jan 9, 2024
@Sefriol
Copy link
Author

Sefriol commented Jan 18, 2024

Microsoft kinda stated that this is working as intended since error does not actually happen in the portal, but in a subprocess that validates the OpenAPI spec. We didn't really agree with this, but we'll see if something comes up.

However the logic behind the terraform process should be adjusted to fix this issue regardless.

@clarkminor
Copy link

Yeah, this improper validation is causing us all sorts of problems. Here's a very simple repro of the issue. I created a basic test-api then attempted to update it with invalid yaml. The first apply goes as expected, but second attempt succeeds.

First apply fails:

▶ terraform apply -var-file="test.tfvars"
azurerm_resource_group.rg: Refreshing state... [id=/subscriptions/redacted/resourceGroups/openapi-test-rg]
azurerm_api_management.apim: Refreshing state... [id=/subscriptions/redacted/resourceGroups/openapi-test-rg/providers/Microsoft.ApiManagement/service/apim-openapi-test]
azurerm_api_management_api.api: Refreshing state... [id=/subscriptions/redacted/resourceGroups/openapi-test-rg/providers/Microsoft.ApiManagement/service/apim-openapi-test/apis/test-api]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # azurerm_api_management_api.api will be updated in-place
  ~ resource "azurerm_api_management_api" "api" {
        id                    = "/subscriptions/redacted/resourceGroups/openapi-test-rg/providers/Microsoft.ApiManagement/service/apim-openapi-test/apis/test-api"
        name                  = "test-api"
        # (11 unchanged attributes hidden)

      ~ import {
          ~ content_value  = <<-EOT
                openapi: 3.0.1
                info:
              -   title: 'Test API with valid configuration!'
              +   title: 'Test API with 'invalid' configuration!'
                  version: '1.0.0'
                servers:
                  - url: 'http://my-backend-url/v2'
                paths:
                  /test:
                    get:
                      summary: Test
                      description: >-
                        Basic GET on test resource.
                      responses:
                        '200':
                          description: Successful!
                components:
                  securitySchemes:
                    apiKeyHeader:
                      type: apiKey
                      name: Ocp-Apim-Subscription-Key
                      in: header
                    apiKeyQuery:
                      type: apiKey
                      name: subscription-key
                      in: query
                security:
                  - apiKeyHeader: []
                  - apiKeyQuery: []
            EOT
            # (1 unchanged attribute hidden)
        }

        # (1 unchanged block hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

azurerm_api_management_api.api: Modifying... [id=/subscriptions/redacted/resourceGroups/openapi-test-rg/providers/Microsoft.ApiManagement/service/apim-openapi-test/apis/test-api]
╷
│ Error: creating/updating Api (Subscription: "redacted"
│ Resource Group Name: "openapi-test-rg"
│ Service Name: "apim-openapi-test"
│ Api: "test-api"): performing CreateOrUpdate: unexpected status 400 with error: ValidationError: One or more fields contain incorrect values:
│
│   with azurerm_api_management_api.api,
│   on main.tf line 29, in resource "azurerm_api_management_api" "api":
│   29: resource "azurerm_api_management_api" "api" {
│
╵

Second apply succeeds and terraform reports no changes:

▶ terraform apply -var-file="test.tfvars"
azurerm_resource_group.rg: Refreshing state... [id=/subscriptions/redacted/resourceGroups/openapi-test-rg]
azurerm_api_management.apim: Refreshing state... [id=/subscriptions/redacted/resourceGroups/openapi-test-rg/providers/Microsoft.ApiManagement/service/apim-openapi-test]
azurerm_api_management_api.api: Refreshing state... [id=/subscriptions/redacted/resourceGroups/openapi-test-rg/providers/Microsoft.ApiManagement/service/apim-openapi-test/apis/test-api]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

If you then go into the UI to view the OpenAPI spec it shows this:

openapi: 3.0.1
info:
  title: test-api
  description: ''
  version: '1.0'
servers:
  - url: 'https://apim-openapi-test.azure-api.net/test-api'
paths:
  /test:
    get:
      summary: Test
      description: Basic GET on test resource.
      operationId: get-test
      responses:
        '200':
          description: Successful!
components:
  securitySchemes:
    apiKeyHeader:
      type: apiKey
      name: Ocp-Apim-Subscription-Key
      in: header
    apiKeyQuery:
      type: apiKey
      name: subscription-key
      in: query
security:
  - apiKeyHeader: []
  - apiKeyQuery: []

I set TF_LOG_PROVIDER=TRACE before running, so happy to share those logs if helpful.

In the meantime, I think we're going to start validating the OpenAPI specs with an openapi-spec-validator pre-commit hook as a workaround: https://github.com/python-openapi/openapi-spec-validator?tab=readme-ov-file#pre-commit-hook

@Sefriol
Copy link
Author

Sefriol commented Jan 18, 2024

Nice that you found a simple way to reproduce this issue! Also to note that validating OpenAPI spec is not enough since update process can fail on APIM specific issues (like trying to create two APIs with the same spec).

@Sefriol
Copy link
Author

Sefriol commented Feb 2, 2024

Final response from Microsoft on the issue:

After discussions from different Subject matter experts of APIM, got a conclusion on the ask. In general, the whole idea about using the REST APIs to call the management API endpoint is to bypass the ARM limitations, 2 min timeout and throttling etc. So, in conclusion only the ARM calls are listed in the subscription Activity logs as status failed as they are residing in ARM plane, the management API calls are not since they bypass ARM.
Hope the above explanation helps. I understand that the solution is not completely favorable as the service and product is designed in such a way and while we understand your expectation from the product, at the same time we hope you will understand the challenges at the time of designing the product, having said so, we always respect our customer business and association with Microsoft.

@Sefriol
Copy link
Author

Sefriol commented Nov 22, 2024

I mean, documentation change does not really fix the problem? Provider should use the proper follow-up url to check whether update succeeds or fails.

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
service/api-management upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR v/3.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants