Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of testing Providers that were prepared on June 22, 2024 #40382

Closed
57 of 96 tasks
eladkal opened this issue Jun 22, 2024 · 38 comments
Closed
57 of 96 tasks

Status of testing Providers that were prepared on June 22, 2024 #40382

eladkal opened this issue Jun 22, 2024 · 38 comments
Labels
kind:meta High-level information important to the community testing status Status of testing releases

Comments

@eladkal
Copy link
Contributor

eladkal commented Jun 22, 2024

Body

I have a kind request for all the contributors to the latest provider packages release.
Could you please help us to test the RC versions of the providers?

The guidelines on how to test providers can be found in

Verify providers by contributors

Let us know in the comment, whether the issue is addressed.

Those are providers that require testing as there were some substantial changes introduced:

Provider amazon: 8.25.0rc1

Provider apache.drill: 2.7.2rc1

Provider apache.kafka: 1.5.0rc1

Provider cncf.kubernetes: 8.3.2rc1

Provider common.compat: 1.0.0rc1

Provider common.sql: 1.14.1rc1

Provider databricks: 6.6.0rc1

Provider dbt.cloud: 3.9.0rc1

Provider docker: 3.12.1rc1

Provider fab: 1.2.0rc1

Provider ftp: 3.10.0rc1

Provider google: 10.20.0rc1

Provider http: 4.12.0rc1

Provider microsoft.azure: 10.1.2rc1

Provider microsoft.mssql: 3.7.2rc1

Provider openai: 1.2.2rc1

Provider openlineage: 1.9.0rc1

Provider opensearch: 1.3.0rc1

Provider sftp: 4.10.2rc1

Provider snowflake: 5.5.2rc1

Provider telegram: 4.5.2rc1

Provider teradata: 2.3.0rc1

Provider ydb: 1.0.0rc1

All users involved in the PRs:
@rahul-madaan @eladkal @Taragolis @nyoungstudios @uzhastik @jalengg @mobuchowski @e-galan @VladaZakharova @riccardoforzan @josh-fell @pankajastro @kacpermuda @ephraimbuddy @satish-chinthanippu @boraberk

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.
@eladkal eladkal added kind:meta High-level information important to the community testing status Status of testing releases labels Jun 22, 2024
@uzhastik
Copy link
Contributor

Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.

@eladkal
Copy link
Contributor Author

eladkal commented Jun 22, 2024

Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.

In your PR you marked it as ready to be released

Missing features are not a blocker for release. We can always add new features later on.

@uzhastik
Copy link
Contributor

Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.

In your PR you marked it as ready to be released

Missing features are not a blocker for release. We can always add new features later on.

To be honest yaml was just copied and bug was found two days ago;) auth info from file does not work, but there is workaround to provide auth info inplace.

@potiuk
Copy link
Member

potiuk commented Jun 22, 2024

Checked that all changes are there. The common.compat does not have any code yet, so we could skip releasing it, but there is no harm in doing so - having a 1.0.0 version released in PyPI is generally a good idea.

@gopidesupavan
Copy link
Member

Tested #40287, working fine. Thanks

@rahul-madaan
Copy link
Contributor

tested #40290, working fine 👍🏻

@mehdigati
Copy link
Contributor

tested #39955, working fine, thanks !

@shahar1
Copy link
Contributor

shahar1 commented Jun 22, 2024

#40142, #40237, #39831 are ok

airflow-oss-bot added a commit to astronomer/astronomer-providers that referenced this issue Jun 23, 2024
@jasonspeck
Copy link
Contributor

Confirmed #40206 works as expected

@kylase
Copy link
Contributor

kylase commented Jun 23, 2024

I have tested #39991 and it works as expected.

[2024-06-23, 04:47:48 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - Dataflow SDK version: 2.56.0","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - To access the Dataflow monitoring console, please navigate to https://console.cloud.google.com/dataflow/jobs/<redacted>","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - Submitted job: <redacted>","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - To cancel the job using the \u0027gcloud\u0027 tool, run:\n\u003e gcloud dataflow jobs --project\u003d<redacted> cancel --region\u003<redacted> <redacted>","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:172} INFO - Process exited with return code: 0
[2024-06-23, 04:47:49 UTC] {dataflow.py:461} INFO - Start waiting for done.
[2024-06-23, 04:47:49 UTC] {dataflow.py:403} INFO - Google Cloud DataFlow job <redacted> is state: JOB_STATE_PENDING
[2024-06-23, 04:47:49 UTC] {dataflow.py:464} INFO - Waiting for done. Sleep 10 s
[2024-06-23, 04:47:59 UTC] {dataflow.py:403} INFO - Google Cloud DataFlow job <redacted> is state: JOB_STATE_PENDING
[2024-06-23, 04:47:59 UTC] {dataflow.py:464} INFO - Waiting for done. Sleep 10 s
[2024-06-23, 04:48:09 UTC] {dataflow.py:403} INFO - Google Cloud DataFlow job <redacted> is state: JOB_STATE_RUNNING
[2024-06-23, 04:48:09 UTC] {taskinstance.py:1401} INFO - Marking task as SUCCESS. dag_id=<redacted>, task_id=start_streaming, map_index=0, execution_date=20240623T044637, start_date=20240623T044717, end_date=20240623T044809
[2024-06-23, 04:48:09 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 0
[2024-06-23, 04:48:09 UTC] {taskinstance.py:2781} INFO - 0 downstream tasks scheduled from follow-on schedule check

@dirrao
Copy link
Contributor

dirrao commented Jun 23, 2024

Changes #40253 works as expected.

@amirmor1
Copy link
Contributor

Tested #40041 and it works as expected.

@TJaniF
Copy link
Contributor

TJaniF commented Jun 23, 2024

Tested #38497, works as expected.

@kacpermuda
Copy link
Contributor

#39520 stopped working correctly (something changed between now and when i wrote it) and I'm trying to debug what exactly is causing it.

@sc250072
Copy link
Contributor

Can you please remove teradata #40378 PR from this 2.3.0 release as we wanted to release compute cluster functionality along with this. We are in process of creating new PR for compute cluster and we would like to release 2.3.0 with these two features as per our roadmap.

@potiuk
Copy link
Member

potiuk commented Jun 24, 2024

Can you please remove teradata #40378 PR from this 2.3.0 release as we wanted to release compute cluster functionality along with this. We are in process of creating new PR for compute cluster and we would like to release 2.3.0 with these two features as per our roadmap.

It's of course @eladkal (Release Manager's) decision - but no @satish-chinthanippu , this is not how our provider's release process work. We release every changed provider from main, and we only exclude a provider (not individual PRs) from release if a serious bug has been found. Manipulating and manually modifying stuff during the release is not easy, takes time and effort, might break various parts of the process (like documentation generation, or package preparation, publishing, signing and verification) and introduces serious overhead for ther release process that we don't want.

We have more than 90 providers and we cannot afford individual treatment and such "custom" approach.

If there is a bug / regression in the existing functionality that is a blocker - we might remove whole provider from the set of the providers being voted - but that's about all flexibility, and unless there is a bug in Teradata provider, it will be released as-is.

We do not look at other's roadmaps - this is a bit of price to pay for making the provider a "community managed" one and we've been very clear about the process when we accepted Teradata - one of the reasons why Teradata could choose to release their own provider, was that this could help them to manage their own release roadmap and schedule. Once this is a community provider, we do expect Teradata to keep it updated, with dashboard and fix bugs (in their own interest) but this also means that anyone can contribute changes (and Airflow committers make decisions what goes in and out) but also that we release it together with other providers with the same cadence.

I hope that explains how it works :). This is not a complaint or being nasty (we appreciate all the work done, system test dashboards, and all the new features you guys, add). It's just the way how we have to manage 90+ providers in release has to have some limitations and structure, and also the governance requirements of the ASF is very clear that once the code is submitted to our repo it has to follow the rules of ASF where only Airflow committers can decide on the code modifications. So I wanted to make it clear.

@dolfinus
Copy link
Contributor

#39889 is working as expected

@uzhastik
Copy link
Contributor

uzhastik commented Jun 24, 2024

Hello, @potiuk . Thank you for clarification. I'd like to ask how we can improve quality of this release or it is not our goal? Usually release means something good and stable for use. But you tell that improving is big overhead. That means that release is something dirty in some situations, it is easy to release. I suggest to make it clear to users who installs such “release”. It could be just a comment that particular providers have some known problems or give authors some release branch to merge fixes there. Or even exclude provider from release.

@potiuk
Copy link
Member

potiuk commented Jun 24, 2024

Or even exclude provider from release.

Yes. This is how it works. Unit tests are the first line of defense and we assume that when we have passing unit test in main - the provider is ready to have release candidate out. And then this conversation here is foreseen to see if there are any bugs that should block certain providers and remove them from release. But this is NOT to block certain features from being released, it is only to see if there are no blocker bugs. What gets merged into main is assumed to be "ready for next release candidate". If you do not want a PR to be merged yet, you should keep on rebasing it and mark it as Draft until you feel it is ready to be released in the next release candidate (whenever it happens).

If there is a blocker bug /regression we exclude provider from release. But it's a 0/1 decision based on release manager's assesment whether it's ok to release particular provider or not, based on description of those who test the RC here. If a bug is found during RC, those who find it - should describe the scope and impact of the bug and release manager assesses and decides what to do. This is at sole discretion of the release manager (see https://www.apache.org/legal/release-policy.html#approving-a-release and releated documentation on the release process requirements by the Apache Software Foundation).

In our case we are ok to release new features even with minor non-blocking bugs, the "strong" reason for excluding the provider is when there is a major regression in already released features. Sometimes we even decide to release providers even if new features are not complete, if some partial implementation "work" but new work is planned (then it will be released in the next wave).

We do not "hold" releases, we release everything that has been merged to main. Full stop. This has been working like that for ~ 4 years for 90+ providers of ours.

@potiuk
Copy link
Member

potiuk commented Jun 24, 2024

BTW. And just to clarify - as per definition of the ASF release manager's job is purely mechanical (+ single-handedly make decision to exclude certain provides based on the assessment of bug description provided by those who test it).

See also here: https://infra.apache.org/release-publishing.html#releasemanager

The release manager releases whatever the community decided to merge as "ready to be released". In case of providers - "main" is the "ready for release" sign, so when you are marking your PR as "ready to be merged" and it passes all tests as an author you are saying "it's ready to be released".

That's why also in those "Status of the providers" issue we mark the authors, so that they can verify just before the release if there are no blocking bugs. See #40382 (comment) where @kacpermuda is still evaluating the impact for openlineage provider. But again - this is for bugs only. What goes into the next release is decided at merge time. See also https://github.com/apache/airflow/blob/main/PROVIDERS.rst#community-providers-release-process- where the release process and various aspects of it are explained.

@Lee-W
Copy link
Member

Lee-W commented Jun 24, 2024

Verified #40080

@potiuk
Copy link
Member

potiuk commented Jun 24, 2024

So just to summarize it in short - release manager is NOT responsible for quality of the merged changes nor for the set of changes that are being prepared as release candidates. In both cases the authors are responsible - both for what goes in but also about the quality of what goes in. Release manager is a purely mechanical role to make the release happen, but the authors (with approval of committers who merge the changes) are driving both the scope and quality of the release. No-one else.. And the authors have a chance to verify their changes once the RC is out and have a chance to say "hey there is a blocker bug, I will fix it for the future release but for now let's remove the provider from the release".

I think this is a very, very clear split of responsibilities here and I am explaining it here, so that it's crystal clear as different people might have different assumptions on what is the release manager's and author's role in the process, and when tests are done.

@e-galan
Copy link
Contributor

e-galan commented Jun 24, 2024

#40023 works as expected

@uzhastik
Copy link
Contributor

Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.

Let’s release ydb provider. Known issues are minors.

@pankajastro
Copy link
Member

#39348 looks good. thank you!

@kacpermuda
Copy link
Contributor

@eladkal please exclude openlineage provider from this wave, and if possible, let's go for rc2.

#40353 is causing the scheduler crash, I prepared a revert commit in #40402 that should be included in rc2. I also described the bug and provided some logs in #40403, but i guess for now just reverting this will allow us to move forward with the release.

@sc250072
Copy link
Contributor

Can you please remove teradata #40378 PR from this 2.3.0 release as we wanted to release compute cluster functionality along with this. We are in process of creating new PR for compute cluster and we would like to release 2.3.0 with these two features as per our roadmap.

It's of course @eladkal (Release Manager's) decision - but no @satish-chinthanippu , this is not how our provider's release process work. We release every changed provider from main, and we only exclude a provider (not individual PRs) from release if a serious bug has been found. Manipulating and manually modifying stuff during the release is not easy, takes time and effort, might break various parts of the process (like documentation generation, or package preparation, publishing, signing and verification) and introduces serious overhead for ther release process that we don't want.

We have more than 90 providers and we cannot afford individual treatment and such "custom" approach.

If there is a bug / regression in the existing functionality that is a blocker - we might remove whole provider from the set of the providers being voted - but that's about all flexibility, and unless there is a bug in Teradata provider, it will be released as-is.

We do not look at other's roadmaps - this is a bit of price to pay for making the provider a "community managed" one and we've been very clear about the process when we accepted Teradata - one of the reasons why Teradata could choose to release their own provider, was that this could help them to manage their own release roadmap and schedule. Once this is a community provider, we do expect Teradata to keep it updated, with dashboard and fix bugs (in their own interest) but this also means that anyone can contribute changes (and Airflow committers make decisions what goes in and out) but also that we release it together with other providers with the same cadence.

I hope that explains how it works :). This is not a complaint or being nasty (we appreciate all the work done, system test dashboards, and all the new features you guys, add). It's just the way how we have to manage 90+ providers in release has to have some limitations and structure, and also the governance requirements of the ASF is very clear that once the code is submitted to our repo it has to follow the rules of ASF where only Airflow committers can decide on the code modifications. So I wanted to make it clear.

Thank you @potiuk for your detailed information. Understood the process and considerations regarding the release. Actually, @eladkal suggested to raise individual PRs for each related functionality implemented in Airflow Teradata Provider to make the review process simpler. So, in line with this suggestion, we thought of raising individual PRs for the two features we have planned for this release.
We're committed to aligning with the community standards and appreciate the governance framework outlined by ASF.
Given this understanding, we'll proceed with the new PR for the compute cluster functionality alongside Teradata's provider updates as per the standard release cadence. We'll ensure that our contributions meet the necessary criteria and are compatible with the overall release process.
Please let us know if there are any specific guidelines or additional steps we should follow as we prepare these updates with multiple PRs for a single release.

@sc250072
Copy link
Contributor

#40378 working as expected.

@pankajkoti
Copy link
Member

Tested #40013, #39771 and #39941. All works fine. Thank you for the release efforts!

@ahidalgob
Copy link
Contributor

#40062 working as expected

@e-galan
Copy link
Contributor

e-galan commented Jun 24, 2024

#40278 works as expected

@boraberke
Copy link
Contributor

Tested all of my work (#38868, #39154, #40048, #40086, #40162) and they all work as expected!

Thanks! 🥂

@eladkal
Copy link
Contributor Author

eladkal commented Jun 24, 2024

Actually, @eladkal suggested to raise individual PRs for each related functionality implemented in Airflow Teradata Provider to make the review process simpler.

Which we stand for.
I do not understand the concern you raised. Merged PR = ready to release.
What is the problem with releasing it as is?

@potiuk
Copy link
Member

potiuk commented Jun 24, 2024

Which we stand for.
I do not understand the concern you raised. Merged PR = ready to release.
What is the problem with releasing it as is?

Yep. This is all good and I stand for it too. Raising PR <> merging PR. If you wish PR to wait because there is a need to release it together with another - related - PR, the PR can be kept in draft or with an unresolved conversation explaining that it shoud not yet been merged, and other PR might be added on top (based on the first PR) and both could be rebased until both are ready to be released. I believe there was a lack of understanding that "merged" = "ready to release" for providers, so I hope it's now clear.

@sc250072
Copy link
Contributor

sc250072 commented Jun 24, 2024

@potiuk and @eladkal understood. These steps clarifies on the process to follow to release related features with multiple PRs in a single release. Thank you for providing detailed information.

@sc250072
Copy link
Contributor

Actually, @eladkal suggested to raise individual PRs for each related functionality implemented in Airflow Teradata Provider to make the review process simpler.

Which we stand for. I do not understand the concern you raised. Merged PR = ready to release. What is the problem with releasing it as is?

Please release it. #40378 Tested and working as expected.

@dabla
Copy link
Contributor

dabla commented Jun 24, 2024

40301

40300

40297 are all working as expected

@eladkal
Copy link
Contributor Author

eladkal commented Jun 27, 2024

Thank you everyone. Providers are released.
openlineage provider is excluded from this wave

I invite everyone to help improve providers for the next release, a list of open issues can be found here.

@eladkal eladkal closed this as completed Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:meta High-level information important to the community testing status Status of testing releases
Projects
None yet
Development

No branches or pull requests