Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-32057] Support 1.18 rescale api for applying parallelism overrides #614

Merged
merged 7 commits into from
Jun 25, 2023

Conversation

gyfora
Copy link
Contributor

@gyfora gyfora commented Jun 7, 2023

What is the purpose of the change

The goal here is to use the newly introduced declarative resource management REST API endpoints in Flink 1.18 to execute parallelism override config changes for job vertexes. This allows us to rescale jobs without performing costly full upgrades (restarting all pods and resources). This will make the autoscaler module much more stable and reliable.

The operator already supported something slightly similar for standalone mode and reactive scheduler configuration where on TM replica change we would simply add new replicas.

Brief change log

Add required REST api request/response classes for 1.18

In order to not rely on unreleased Flink 1.18 libraries, we simply copied the request/response classes for the new rest api.
Combined with e2e tests this should be a good approach.

Async scaling logic

The rescale mechanism in Flink 1.18 is asynchronous and might take a non-defined time to complete. To track progress we trigger the scaling through the FlinkService#scale method and afterwards update the lastReconciledSpec and change the ReconciliationState to UPGRADING.

The observer will subsequently detect an in-progress scaling operation and will check the vertex parallelisms using FlinkService#scalingCompleted to move a DEPLOYED reconciliation state.

Note: The rescale api logic is currently only implemented for the native mode. In the standalone mode it would not work as efficiently as the user would need to manually add more TM replicas before the scaling can succeed anyways.

Changes to @SpecDiff annotations

Previously some spec fields were marked for SCALE difftype using the annotation, now it was required to separate what is SCALE / UPGRADE in the different deployment types (Standalone/Native). In native mode we do not support replica/global parallelism changes as scale operations (those rely on standalone + reactive mode).

Required autoscaler changes

We have to improve how we track the job update ts because with in-place rescaling the job can restart without the operator observer noticing (and thus not updating the updateTs). To implement this more robustly we use the Max (last state transition timestamps) from the JobDetailsInfo which we query in every metric collection step. This guarantees that we clear metrics on job restarts / upgrades / failures.

New Resource event and event triggering fixes

The PR contains a fix for the event triggering of spec change events. Currently spec change events are not triggered whenever the job is already in an upgrading state. This is incorrect as new spec changes can happen during an upgrade and those would not be visible to the user. The fix ensures that we trigger spec change events for each different spec generation (whenever the spec changes).

For in-place scaling we introduce a new Scaling event which signals the chosen scaling behaviour to the user.

Verifying this change

  • Manual testing using the autoscaler example on local k8s cluster
  • New unit tests added for:
  • DiffType / SpecDiff changes
  • Native Flink service rescale logic
  • Reconciler changes for triggering rescale and updating status
  • Observer changes for detecting compeleted scale operations
  • New events triggered
  • Configs
  • E2E test added for autoscaler together with verification for in-place scaling for 1.18+

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changes to the CustomResourceDescriptors: no
  • Core observer or reconciler logic that is regularly executed: yes

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? TODO

@gyfora gyfora force-pushed the FLINK-32057 branch 6 times, most recently from 72a05b0 to df9fee2 Compare June 14, 2023 09:58
@gyfora gyfora force-pushed the FLINK-32057 branch 5 times, most recently from 76520f8 to 3d6cc94 Compare June 15, 2023 07:31
Copy link
Contributor

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically look good, just some minors found.

Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! LGTM

@gyfora
Copy link
Contributor Author

gyfora commented Jun 21, 2023

Addressed your comments @mxm

@gyfora gyfora merged commit 3bb4fcc into apache:main Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants