add core logic to support access token in postgres scaler #5589

Ferdinanddb · 2024-03-11T18:04:19Z

Provide a description of what has been changed
This PR purpose is to add the necessary code for the Postgres scaler to support Azure Access Token authentication.

Checklist

When introducing a new scaler, I agree with the scaling governance policy
I have verified that my change is according to the deprecations & breaking changes policy
Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
A PR is opened to update the documentation on (repo) (if applicable)
Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes #5823

Relates to #

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

Ferdinanddb · 2024-03-11T19:06:26Z

Dear reviewers,

I have some doubts regarding the following topics and would appreciate assistance / guidance please:

(1) Should I change the logic of how an Azure Access Token is retrieved to be able to mock it and write some specific PodIdentityProviderAzureWorkload tests? If yes, I am thinking about the following tests, based on what I wrote:

Check that the config is successfully parsed when using the podIdentity kedav1alpha1.PodIdentityProviderAzureWorkload.
Check that the Access Token (i.e. the password) is updated when it is expired or about to expire. This might be difficult because this part happens when performing the query, so it happens at "runtime" and it seems that the tests are not covering "runtime" behaviors, right?

(2) I used regexp pattern matching and replacement to find and replace the connection string and the DB connection, is it robust?

I could also split the connection string into an array, replace the password entry, and then reconstruct the string, but I felt like regexp could do the same job or even better.

(3) To be honest, I got inspired by both the Azure Blob and Azure pipelines scalers. The latter also uses an Access Token but with a different scope, so I am wondering if it could be a good idea to deduplicate and generalize the logic of generating an Azure Access Token to have it in one place.

If it makes sense, I would say that this should be done in another PR, so that this one remains focus on the Postgres scaler.

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

JorTurFer

(1) Should I change the logic of how an Azure Access Token is retrieved to be able to mock it and write some specific PodIdentityProviderAzureWorkload tests? If yes, I am thinking about the following tests, based on what I wrote:

I don't think so because this is something quite difficult to test with unit tests, but maybe we could introduce an e2e test for this scenario, WDYT? We can spin up a postgresql database in Azure with a managed identity user (we have another repo, testing-infrastucture, where we manage the infra from)

Check that the Access Token (i.e. the password) is updated when it is expired or about to expire. This might be difficult because this part happens when performing the query, so it happens at "runtime" and it seems that the tests are not covering "runtime" behaviors, right?

I don't think that this is a "real problem". Of course, handling it is always better, but in the worst case the scaler will fail, and it will trigger the scaler regeneration (during the same loop without printing any error) and that will regenerate the token (despite as I have said, managing it well is better)

(2) I used regexp pattern matching and replacement to find and replace the connection string and the DB connection, is it robust?

I could also split the connection string into an array, replace the password entry, and then reconstruct the string, but I felt like regexp could do the same job or even better.

I don't have any preference tbh, maybe to be 100% sure that i will always work not depending on the token, we could use a placeholder for the regex and instead of updating s.metadata.connection, using a variable scoped to the function. Using this approach, we can ensure that the regex will work (or event better, maybe not setting any password in case of pod identity)

(3) To be honest, I got inspired by both the Azure Blob and Azure pipelines scalers. The latter also uses an Access Token but with a different scope, so I am wondering if it could be a good idea to deduplicate and generalize the logic of generating an Azure Access Token to have it in one place.

If it makes sense, I would say that this should be done in another PR, so that this one remains focus on the Postgres scaler.

It's a good idea, but I think that it doesn't makes sense because we are migrating the Azure Scalers from current implementations to the new azure-sdk-for-go and that SDK uses a unified authentication approach (This PR is working on that)

pkg/scalers/postgresql_scaler.go

Ferdinanddb · 2024-03-12T12:23:52Z

@JorTurFer Thank you for your review!

Regarding your answers on my interrogations:

(1)

I don't think so because this is something quite difficult to test with unit tests, but maybe we could introduce an e2e test for this scenario, WDYT? We can spin up a postgresql database in Azure with a managed identity user (we have another repo, testing-infrastucture, where we manage the infra from)

I don't think that this is a "real problem". Of course, handling it is always better, but in the worst case the scaler will fail, and it will trigger the scaler regeneration (during the same loop without printing any error) and that will regenerate the token (despite as I have said, managing it well is better)

I was planning to create an Azure account and use free credits to test this (i.e. spin up an Azure Postgres Flexible Server, an AKS cluster where I install the KEDA helm chart using a container image built from this branch, an UAMI and all the other resources needed…) on my end but happy to try using the "testing-infrastucture" repo you mentioned!
The thing I find difficult to test is that the access token being generated can be set to be valid from 5 min to 1 hour, so it would mean that the e2e test would run for at least 5 minutes. Would that make sense?

(2)

I don't have any preference tbh, maybe to be 100% sure that i will always work not depending on the token, we could use a placeholder for the regex and instead of updating s.metadata.connection, using a variable scoped to the function. Using this approach, we can ensure that the regex will work (or event better, maybe not setting any password in case of pod identity)

From my understanding and based on my current implementation, we need to update s.metadata.connection (string) to re-create the s.connection (sql.DB) with the new token (password), this is why I did it that way and I use the getConnection function to replace/update the sql.DB's connection when the access token is expired or about to expire.
I don't really picturize your following idea "..., we could use a placeholder for the regex and instead of updating s.metadata.connection, using a variable scoped to the function."
- May I ask you to write a little snippet to showcase this, please?
Maybe I don't understand your last point in parentheses, but to be sure: are you proposing to not set the password='.....' part inside the s.metadata.connection string object but instead use the environment variable PGPASSWORD to store the access token (password)?
- That would be neat because it means that the s.connection sql.DB object does not need to be replaced if I am not wrong. But it means that the code would replace an environment variable (doable using os.Setenv("PGPASSWORD", accessToken)), WDYT?
  And we would also need another environment variable to store the access token's expiration date info of the token in order to replace it when needed.

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

Ferdinanddb · 2024-03-13T20:09:17Z

Hey @JorTurFer,

I think I understand what you meant with your regexp placeholder idea (which is really nice btw) and just proposed the change to that into account.

I feel like some of the code I am adding / updating can still be written in a cleaner way though, and I still miss some unit tests regarding the changes, but I would like your opinion on what I changed to know if it is better than before, please :).

Thanks !

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

JorTurFer · 2024-03-15T22:17:26Z

I think that the changes look nice! ❤️

stale · 2024-05-15T02:16:12Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

JorTurFer · 2024-05-18T09:06:06Z

@Ferdinanddb any update?

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

…nnection before recreating it Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

…atement Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

Ferdinanddb · 2024-05-21T10:43:48Z

Hi @JorTurFer , sorry I was busy with other things but I found the time to change some things in the PR and test it.

I tested my change within my own Azure subscription during the weekend, and it works, so I would say that the PR is ready to be reviewed :).

I deployed the KEDA resources using the Helm chart + custom container images built using this branch's code, and let the resources run for more than 24 hours because the Access Token in my case would expire after 24 hours, and it works.
I tested the change by deploying an Airflow cluster on an AKS cluster and creating an Azure Postgres Flexible Server, I had to adapt a bit the Airflow helm chart (specifically the part linked to using KEDA to scale the Airflow workers) but nothing rocket science.

One observation is that, during my test, I tried to use the PGBouncer feature offered by the Azure Postgres Flexible Server resource, and it is not working, I think it is somehow related to this issue.
But if I don't use the PGBouncer port which is "6432", so if I use the normal port of the server (i.e. "5432") then it works fine.

Another observation is regarding how Azure provides Access Token: if there is already an active Access Token and not yet expired, Azure will provide this Access Token until it expires. So I modified the part where we renew the Postgres connection to:

Verify if the Access Token expired by comparing its ExpireOn value to Time.Now(). If the Access Token expired:
- the Postgres connection object is closed,
- a new Access Token is retrieved,
- a new Postgres connection object is created and replaces the previous connection reference.

What do you think about this?

TODO:

Reference this change in the CHANGELOG.md file,
Modify the doc to document this new way of connecting to an Azure Postgres Flex server,
Maybe add more tests?

Thank you !

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

JorTurFer

The implementation looks nice!
Could you add an e2e test for this new scenario? I think that it'll be super useful for the future.

The new infra (the postgresql) can be created via PR to this repo using terraform: https://github.com/kedacore/testing-infrastructure and the test itself can be a copycut of any of these (adding the change in the TriggerAuthentication): https://github.com/kedacore/keda/tree/main/tests/scalers/postgresql

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

JorTurFer · 2024-06-24T14:17:15Z

I guess that something isn't wrong with any string generation because there are a lot of errors like this during e2e tests: https://github.com/kedacore/keda/actions/runs/9645629374/job/26600554311#step:9:278

helper.go:***87: Waiting for successful execution of command on Pod; Output: psql: warning: extra command-line argument "54***" ignored
        psql: could not translate host name "-p" to address: Name or service not known
        , Error:

Ferdinanddb · 2024-06-24T14:21:53Z

@JorTurFer Thank you for your comments, I just pushed a commit to take what you wrote in account.

However, I still don't understand why the e2e test fails here, but succeeds locally from my laptop. Do you have an idea by any chance?

I know that one thing that might differ is that the Azure Postgres Flexible Server that should be used by the e2e test is not hosted within the same Azure region as the AKS cluster (because of some quota issues if I remember well). And when I ran the e2e tests locally, everything was in the same region.

I feel like the problem is more related to some environment variables not being considered, and to be honest I am quite lost on this one!

JorTurFer · 2024-06-24T14:25:15Z

Let me trigger the test again and check the Env vars used

zroubalik · 2024-06-25T20:31:12Z

/run-e2e azure_postgresql
Update: You can check the progress here

Ferdinanddb · 2024-06-27T21:54:47Z

@JorTurFer @zroubalik I think I have a little clue regarding why the test is failing. At this stage of the GitHub Actions, during the oNaiPs/secret-to-env-action job, I see that the following secrets are not listed:

But what I don't understand is that the other secrets related to the Azure Postgres (TF_AZURE_POSTGRES_ADMIN_USERNAME, TF_AZURE_POSTGRES_ADMIN_PASSWORD and TF_AZURE_POSTGRES_DB_NAME) are listed.

Do you maybe know why this could be the case?

Maybe it could be worth rerunning the KEDA testing-infrastructure's GitHub Actions pipeline, even though I just verified the terraform apply stage from the last run, and we can see that:

Ferdinanddb · 2024-07-08T17:48:14Z

Hi @JorTurFer and @zroubalik, may I ask you if you had the time to look at my previous comment please?

JorTurFer · 2024-07-24T10:42:36Z

/run-e2e azure_postgresql
Update: You can check the progress here

JorTurFer · 2024-07-24T10:43:46Z

Sorry @Ferdinanddb
I've been really busy last months with a project and now I'm checking the delayed things. I'm reviewing this atm, I'll give you any insight that I find.

The secrets are there

but the dates don't match with the dates I'd expect :/

JorTurFer · 2024-07-24T11:10:59Z

Okey, the values now are there:

ignore the ***, it's the shitty anonymization process that GH does (apparently "1" it's a sensitive value)

JorTurFer · 2024-07-24T14:14:26Z

/run-e2e azure_postgresql
Update: You can check the progress here

JorTurFer · 2024-07-24T15:47:43Z

I've solved some infra issues and it seems that works, but currently the scaling in process isn't working because it needs more time accessing azure database than local db. Let me update the PR with a fix

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

JorTurFer · 2024-07-24T16:03:03Z

/run-e2e azure_postgresql
Update: You can check the progress here

JorTurFer · 2024-07-24T17:17:21Z

Could you update the documentation to include this feature as part of v2.15 @Ferdinanddb ?

Ferdinanddb · 2024-07-24T17:37:14Z

Hi @JorTurFer,

No worries for the delay, and thank you very much for your support and involvement!

Yes, I will update the doc by the end of tomorrow, I will open a PR in this keda-docs repo.

Just FYI, I faced a weird case while developing this feature: I could not make it work with PGBouncer, and I did not really understand why, but I will mention it in the documentation.

To be specific, the PGBouncer feature in Azure PG Flex server gives the ability to interact with the server using the port 6432 in addition to the default port 5432.
- When I ran my test using the port 6432 it did not work, but it worked using the port 5432.
- So, KEDA """needs""" to use the default port 5432 but the other applications interacting with the PG Flex server are free to use the PGBouncer port 6432. I will try to mention that in a clear way in the doc.

Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>

JorTurFer · 2024-07-24T19:34:22Z

/run-e2e azure_postgresql
Update: You can check the progress here

JorTurFer · 2024-07-24T19:35:45Z

Thanks a lot! I'll be afk until Sunday night, but once I come back I'll check both PR and merge them if they are ready. We plan to cut a release next week, so this feature will be available soon :)

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

JorTurFer · 2024-07-24T21:01:33Z

/run-e2e azure_postgresql
Update: You can check the progress here

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

JorTurFer · 2024-07-24T21:33:07Z

/run-e2e azure_postgresql
Update: You can check the progress here

Ferdinanddb · 2024-07-27T15:22:35Z

I just opened the following PR in the keda-docs repo.

JorTurFer · 2024-07-29T08:40:20Z

/run-e2e azure_postgresql
Update: You can check the progress here

JorTurFer · 2024-07-29T08:43:41Z

Thanks a lot for the improvement!

add core logic to support access token in postgres scaler

6354167

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

Ferdinanddb added 2 commits March 11, 2024 20:19

minor fix

b2855ec

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

run make build to fmt code

f3760a5

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

JorTurFer reviewed Mar 11, 2024

View reviewed changes

pkg/scalers/postgresql_scaler.go Outdated Show resolved Hide resolved

Ferdinanddb added 2 commits March 13, 2024 20:33

make regexp password pattern global

bee72e1

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

adapt to use placeholder for regexp

de5b4ae

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

Ferdinanddb added 5 commits March 13, 2024 21:11

add missing authPodIdentity variable

1a4e372

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

Merge branch 'main' into feature/support-access-token-postgres-scaler

fc778ca

lint code using gci write... command

d3b0456

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

lint import + add 2 unite tests

1885608

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

lint with make fmt

ce1c2d1

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

stale bot added the stale All issues that are marked as stale due to inactivity label May 15, 2024

stale bot removed the stale All issues that are marked as stale due to inactivity label May 18, 2024

Ferdinanddb added 4 commits May 18, 2024 19:58

Merge branch 'main' into feature/support-access-token-postgres-scaler

c70e464

remove podIdentityAzure references (but keep AzureWorkload ones)

1034b6c

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

replace switch by if statements + fix error when comparing + close co…

5d9bcee

…nnection before recreating it Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

generate a new token if the current one has expired + add log info st…

acd9e28

…atement Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

Ferdinanddb marked this pull request as ready for review May 21, 2024 10:20

Ferdinanddb requested a review from a team as a code owner May 21, 2024 10:20

Merge branch 'main' into feature/support-access-token-postgres-scaler

ff31521

Ferdinanddb mentioned this pull request May 23, 2024

Add support for access token authentication to an Azure Postgres Flexible Server - Postgres scaler #5823

Closed

minor change + add entry in CHANGELOG.md

239b8e2

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

JorTurFer reviewed May 26, 2024

View reviewed changes

remove useless variables

548e6d3

Signed-off-by: Ferdinand de Baecque <45566171+Ferdinanddb@users.noreply.github.com>

JorTurFer and others added 3 commits July 24, 2024 17:51

Update e2e test to reset all the task using a query

84e9252

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

Merge branch 'main' into feature/support-access-token-postgres-scaler

f2ef6af

Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>

missing changes after rebase

ea13133

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

fix typo in the query

1133382

Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>

remove the load

130bf5a

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

fix style

21b8e3c

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

Merge branch 'main' into feature/support-access-token-postgres-scaler

1a17069

JorTurFer merged commit c740bf0 into kedacore:main Jul 29, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add core logic to support access token in postgres scaler #5589

add core logic to support access token in postgres scaler #5589

Ferdinanddb commented Mar 11, 2024 •

edited

Loading

Ferdinanddb commented Mar 11, 2024

JorTurFer left a comment

Ferdinanddb commented Mar 12, 2024

Ferdinanddb commented Mar 13, 2024

JorTurFer commented Mar 15, 2024

stale bot commented May 15, 2024

JorTurFer commented May 18, 2024

Ferdinanddb commented May 21, 2024 •

edited

Loading

JorTurFer left a comment •

edited

Loading

JorTurFer commented Jun 24, 2024

Ferdinanddb commented Jun 24, 2024

JorTurFer commented Jun 24, 2024

zroubalik commented Jun 25, 2024 •

edited by github-actions bot

Loading

Ferdinanddb commented Jun 27, 2024

Ferdinanddb commented Jul 8, 2024

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024 •

edited

Loading

JorTurFer commented Jul 24, 2024

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024

Ferdinanddb commented Jul 24, 2024

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

Ferdinanddb commented Jul 27, 2024

JorTurFer commented Jul 29, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 29, 2024

add core logic to support access token in postgres scaler #5589

add core logic to support access token in postgres scaler #5589

Conversation

Ferdinanddb commented Mar 11, 2024 • edited Loading

Checklist

Ferdinanddb commented Mar 11, 2024

JorTurFer left a comment

Choose a reason for hiding this comment

Ferdinanddb commented Mar 12, 2024

Ferdinanddb commented Mar 13, 2024

JorTurFer commented Mar 15, 2024

stale bot commented May 15, 2024

JorTurFer commented May 18, 2024

Ferdinanddb commented May 21, 2024 • edited Loading

JorTurFer left a comment • edited Loading

Choose a reason for hiding this comment

JorTurFer commented Jun 24, 2024

Ferdinanddb commented Jun 24, 2024

JorTurFer commented Jun 24, 2024

zroubalik commented Jun 25, 2024 • edited by github-actions bot Loading

Ferdinanddb commented Jun 27, 2024

Ferdinanddb commented Jul 8, 2024

JorTurFer commented Jul 24, 2024 • edited by github-actions bot Loading

JorTurFer commented Jul 24, 2024 • edited Loading

JorTurFer commented Jul 24, 2024

JorTurFer commented Jul 24, 2024 • edited by github-actions bot Loading

JorTurFer commented Jul 24, 2024

JorTurFer commented Jul 24, 2024 • edited by github-actions bot Loading

JorTurFer commented Jul 24, 2024

Ferdinanddb commented Jul 24, 2024

JorTurFer commented Jul 24, 2024 • edited by github-actions bot Loading

JorTurFer commented Jul 24, 2024

JorTurFer commented Jul 24, 2024 • edited by github-actions bot Loading

JorTurFer commented Jul 24, 2024 • edited by github-actions bot Loading

Ferdinanddb commented Jul 27, 2024

JorTurFer commented Jul 29, 2024 • edited by github-actions bot Loading

JorTurFer commented Jul 29, 2024

Ferdinanddb commented Mar 11, 2024 •

edited

Loading

Ferdinanddb commented May 21, 2024 •

edited

Loading

JorTurFer left a comment •

edited

Loading

zroubalik commented Jun 25, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024 •

edited

Loading

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Jul 29, 2024 •

edited by github-actions bot

Loading