Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataproc executor resource config #1160

Merged
merged 5 commits into from
Nov 19, 2020

Conversation

terryyylim
Copy link
Member

Signed-off-by: Terence terencelimxp@gmail.com

What this PR does / why we need it:
Currently one job can allocate too many resources (by default it will take amount of executors equal to partitions), which can break parallelization. This PR allows configuring of resource allocation per one job, so that many jobs can share a cluster.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Users can now configure amount of resources to allocate for their spark ingestion jobs.

@woop
Copy link
Member

woop commented Nov 14, 2020

/retest

1 similar comment
@woop
Copy link
Member

woop commented Nov 14, 2020

/retest

staging_location: str,
region: str,
project_id: str,
executor_instances: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why positional arguments? Are these required? We don't have defaults?

@woop
Copy link
Member

woop commented Nov 15, 2020

Users can now configure amount of resources to allocate for their spark ingestion jobs.

How do I as a user know how to set this configuration and what the possible options are?

@terryyylim terryyylim force-pushed the dataproc-resource-config branch 2 times, most recently from 66095dc to bbf7ba1 Compare November 17, 2020 02:56
@pyalex
Copy link
Collaborator

pyalex commented Nov 17, 2020

@woop I think it's not for users, but rather for feast admins and those options will be configured on jobservice side

@woop
Copy link
Member

woop commented Nov 17, 2020

@woop I think it's not for users, but rather for feast admins and those options will be configured on jobservice side

Yea I know, but my question is how do I know what configuration can be set? We need to start documenting the configuration options.

@terryyylim
Copy link
Member Author

/retest

tests/e2e/conftest.py Outdated Show resolved Hide resolved
staging_location: str,
region: str,
project_id: str,
executor_instances: str = "2",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rather rely on defaults in constants.py than here?

@@ -59,6 +62,9 @@ def _dataproc_launcher(config: Config) -> JobLauncher:
config.get(CONFIG_SPARK_STAGING_LOCATION),
config.get(CONFIG_SPARK_DATAPROC_REGION),
config.get(CONFIG_SPARK_DATAPROC_PROJECT),
config.get(CONFIG_SPARK_DATAPROC_EXECUTOR_INSTANCES),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use named arguments when amount of arguments is so high

@terryyylim terryyylim force-pushed the dataproc-resource-config branch 2 times, most recently from f13bde2 to 8c809a4 Compare November 17, 2020 09:31
Signed-off-by: Terence <terencelimxp@gmail.com>
Signed-off-by: Terence <terencelimxp@gmail.com>
Signed-off-by: Terence <terencelimxp@gmail.com>
Signed-off-by: Terence <terencelimxp@gmail.com>
Signed-off-by: Terence <terencelimxp@gmail.com>
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pyalex, terryyylim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pyalex
Copy link
Collaborator

pyalex commented Nov 19, 2020

/lgtm

@feast-ci-bot feast-ci-bot merged commit 4d67fbb into feast-dev:master Nov 19, 2020
pyalex pushed a commit that referenced this pull request Nov 24, 2020
* Add dataproc executor resource config

Signed-off-by: Terence <terencelimxp@gmail.com>

* Add default spark job executor values

Signed-off-by: Terence <terencelimxp@gmail.com>

* Fix e2e tests

Signed-off-by: Terence <terencelimxp@gmail.com>

* Shift spark configurations

Signed-off-by: Terence <terencelimxp@gmail.com>

* Update constants and docstrings

Signed-off-by: Terence <terencelimxp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants