Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Amazon Hook's region_name and config in wrapper #25336

Merged
merged 3 commits into from
Aug 5, 2022

Conversation

Taragolis
Copy link
Contributor

@Taragolis Taragolis commented Jul 27, 2022

Follow up #25256

Right now region_name and config (botocore.config.Config) could be set in different places with this precedence rules (high to low):

  • Hooks methods arguments
  • Hooks attribute
  • In connection

With this PR I tried to do in one place (AwsConnectionWrapper) by:

  1. Allow specify init values for botocore_config and region_name and keep them in wrapper
  2. Allow create AwsConnectionWrapper from another AwsConnectionWrapper

Also this PR might introduce some breaking changes, previously if config specified in Connection than it always overrides config which explicit set in Hook arguments

if connection_object.botocore_config:
# For historical reason botocore.config.Config from connection overwrites
# config which explicitly set in Hook.
self.config = connection_object.botocore_config

# https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
if "config_kwargs" in extra_config:
self.log.debug(
"Retrieving config_kwargs from Connection.extra_config['config_kwargs']: %s",
extra_config["config_kwargs"],
)
self.config = Config(**extra_config["config_kwargs"])

cc: @ferruzzi @o-nikolas @vincbeck

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Jul 27, 2022
@Taragolis Taragolis force-pushed the botocore-region-merge-hook-connection branch from 118a39c to 1c2c34d Compare July 27, 2022 12:52
@potiuk
Copy link
Member

potiuk commented Jul 27, 2022

errors :)

@Taragolis
Copy link
Contributor Author

A loot of errors. Ah... just forget that AwsHook fallback to default strategy in case if Airflow Connection not exists.

@potiuk Interesting why test on pg/Sqlite failed and on MySQL/MSSQL pass

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Taragolis Taragolis force-pushed the botocore-region-merge-hook-connection branch 2 times, most recently from 23485ba to 2d4c9cd Compare July 27, 2022 16:33
@Taragolis
Copy link
Contributor Author

Add explicit check for aws_conn_id and fallback to None in case if it missing - almost the same as it worked before.

@Taragolis
Copy link
Contributor Author

New possible backward incompatibility: Previously Hook switch to default boto3 strategy if AirflowException raised, after recent changes only if aws_conn_id not Found (AirflowNotFoundException).

Places where it could happen

Assume role by unsupported federation

def _get_web_identity_credential_fetcher(
self,
) -> botocore.credentials.AssumeRoleWithWebIdentityCredentialFetcher:
base_session = self.basic_session._session or botocore.session.get_session()
client_creator = base_session.create_client
federation = self.extra_config.get('assume_role_with_web_identity_federation')
if federation == 'google':
web_identity_token_loader = self._get_google_identity_token_loader()
else:
raise AirflowException(
f'Unsupported federation: {federation}. Currently "google" only are supported.'
)
return botocore.credentials.AssumeRoleWithWebIdentityCredentialFetcher(
client_creator=client_creator,
web_identity_token_loader=web_identity_token_loader,
role_arn=self.role_arn,
extra_args=self.conn.assume_role_kwargs,
)

Legacy config file parser

config = configparser.ConfigParser()
if config.read(config_file_name): # pragma: no cover
sections = config.sections()
else:
raise AirflowException(f"Couldn't read {config_file_name}")

@potiuk
Copy link
Member

potiuk commented Jul 29, 2022

@potiuk Interesting why test on pg/Sqlite failed and on MySQL/MSSQL pass

We do have some intermittent errors more often we would like to have and continuously working on improving - but as in all large and complex systems, this is a constant, almost daily struggle and uphill battle :) - but we have longer and longer periods of mostly-green PRs and I hope it will get only better over time as we add some optimisations and improvements :).

@potiuk
Copy link
Member

potiuk commented Jul 29, 2022

@vincbeck @Taragolis - I have not look at very detail of that - the code looks good - but the question is is the "backwards-compatibility" problem mentioned above enough to be a "breaking change" or merely an inconvenience? WDYT? That will determine if the next AWS provider release will get a major bump (and if so, we might also want to remove some deprecations).

@eladkal
Copy link
Contributor

eladkal commented Jul 29, 2022

if the next AWS provider release will get a major bump (and if so, we might also want to remove some deprecations).

I can take care of it if we decide to release major version

@Taragolis
Copy link
Contributor Author

@vincbeck @Taragolis - I have not look at very detail of that - the code looks good - but the question is is the "backwards-compatibility" problem mentioned above enough to be a "breaking change" or merely an inconvenience? WDYT?

As always it is depends on. Personally for me it might be "bug fixes" and "stop catch to broad exceptions" however someone code might not work anymore. In most cases user should get an error, so it not happen implicit... except changing behaviour with config.

@potiuk
Copy link
Member

potiuk commented Jul 29, 2022

@vincbeck @Taragolis - I have not look at very detail of that - the code looks good - but the question is is the "backwards-compatibility" problem mentioned above enough to be a "breaking change" or merely an inconvenience? WDYT?

As always it is depends on. Personally for me it might be "bug fixes" and "stop catch to broad exceptions" however someone code might not work anymore. In most cases user should get an error, so it not happen implicit... except changing behaviour with config.

Yep. Agree "breaking change" and "bug fix" has a blurry border :) . For me the best criteria I have to decide is will it make our users fix a problem they did not realise they had when the change is implemented. If we fix an actually wrong user behaviour and assumptions that worked "accidentally" before and were not really intentional - then this is not a breaking change.

Which I hear from your description is the case right ?

@potiuk
Copy link
Member

potiuk commented Jul 29, 2022

We just have to make sure that we describe it in the changelog (possibly with some instructions what to do) if we decide to classify it as bug fix - explaining why we think it is a bug-fix, not a breaking change.

@Taragolis
Copy link
Contributor Author

I would suggest might be create create major release. Even if I could prof that it just bug fixes I also have at least 4-5 different changes in session factory creation or aws base hook which could also unlock use connection properties in aws secrets backends (#25326).

Many of them introduce "small bug fix" or "small changing behaviour" which can in general classified as 'breaking changes"

@potiuk
Copy link
Member

potiuk commented Jul 29, 2022

I would suggest might be create create major release. Even if I could prof that it just bug fixes I also have at least 4-5 different changes in session factory creation or aws base hook which could also unlock use connection properties in aws secrets backends (#25326).

Many of them introduce "small bug fix" or "small changing behaviour" which can in general classified as 'breaking changes"

This is fine and if we do so, we will remove most "easy" deprecations (following our more aggressive deprecation removal policy).

I think this might also give stronger signal to @ferruzzi @vincbeck @o-nikolas that they might want to (similarly as Google team) want to think about cherry-picking some bugfixes to some past version of the provider following https://github.com/apache/airflow/blob/main/README.md#release-process-for-providers - happy to assist with that.

@vincbeck
Copy link
Contributor

I am fine with that

@ferruzzi
Copy link
Contributor

Sorry, just catching up on this convo. I'm inclined to agree, I think it feels like a bug fix.

@Taragolis
Copy link
Contributor Author

Taragolis commented Jul 29, 2022

List of bug fixes and improvements which could affect users.


config - botocore.config.Config after this PR will follow this precedence rules:

  1. config argument in hooks methods get_client_type and get_resource_type - Operators/Sensors in providers packages do not use this methods.
  2. config hook attribute - Operators/Sensors in providers packages do not use this argument in hook initialisation right now.
  3. Construct from config_kwargs in Connection if this key exists and not empty

Since 1.0.0 version of amazon-provider use this rules

  1. config argument which set in methods get_client_type and get_resource_type
  2. If config_kwargs key exists in Connection then construct Config and overwrite config Hook attribute
  3. If config_kwargs key not exists in Connection then use argument hook config argument

In additional I personally think that config too broad name and better rename in next PR to botocore_config and deprecate config. Some Operators use their own config:


Fallback to boto3 default Credential strategy only if provided aws_conn_id not exists and inform the user. Personally I think that in some point of the future if provided aws_conn_id not exists, than better just raise an error, rather than fallback to default boto3 strategy. If user want do not want use connection Hook already support aws_conn_id=None and it is documented feature.

Right now Hook catch too broad AirflowException, it could be a reason why some methods in Session Factory raises something like RuntimeError. AirflowException raises in:

  1. If user set not supported auth web identity method. Now only google supported. So i think it is legit to stop execution on this point.
  2. In legacy configurations files, this function migrated from S3Hook to AwsHook in Airflow 1.9 and never been documented.

@Taragolis
Copy link
Contributor Author

We do have some intermittent errors more often we would like to have and continuously working on improving - but as in all large and complex systems, this is a constant, almost daily struggle and uphill battle

I noticed your reply just now. Failed tests showed me that my initial changes was wrong so it wasn't false positive. However tests only failed on this checks:

  • Postgres10,Py3.7: Always Providers[amazon,apache.hive,google,mysql,postgres]
  • Sqlite Py3.7: Always Providers[amazon,apache.hive,google,mysql,postgres]

But not on this checks:

  • MSSQL2017-latest, Py3.7: Always Providers[amazon,apache.hive,google,mysql,postgres]
  • MySQL5.7, Py3.7: Always Providers[amazon,apache.hive,google,mysql,postgres]

Seems like providers tests only run for PostgreSQL and SQLite backends. In this case, the behavior is expected.

@potiuk
Copy link
Member

potiuk commented Aug 1, 2022

Seems like providers tests only run for PostgreSQL and SQLite backends. In this case, the behavior is expected.

Correct. If you unfold "Determine how tests are run" entry - it explains it.
MySQL and MsSQL use much more memory than PostgreSQL/Sqlite that's why we skip provider tests for those.

@potiuk
Copy link
Member

potiuk commented Aug 1, 2022

Screenshot 2022-08-01 at 22 57 13

@Taragolis
Copy link
Contributor Author

@ferruzzi @vincbeck @o-nikolas WDYT about deprecate switch to default Boto3 Credential Strategy if user specified not existed aws_conn_id?
Right now aws_conn_id=None and aws_conn_id="conn_no_exists" do the same.
However aws_conn_id=None is documented and aws_conn_id="conn_no_exists" not documented.

I thought it would be nice if we warn user that need to specify existed aws_conn_id or explicit set to None.

@vincbeck
Copy link
Contributor

vincbeck commented Aug 2, 2022

I agree, I dont like too having a "hidden"mechanism which takes the default boto3 connection if the connection specified does not exist (or equal None)

@Taragolis Taragolis force-pushed the botocore-region-merge-hook-connection branch from 2d4c9cd to 6c7a4d3 Compare August 4, 2022 13:44
@Taragolis Taragolis force-pushed the botocore-region-merge-hook-connection branch from 6c7a4d3 to 982449f Compare August 4, 2022 16:09
@Taragolis
Copy link
Contributor Author

Add information about deprecation fallback in case of conn_id not exists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants