-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement-11857][Spark] Remove the spark version of spark task #11860
Conversation
Related Doc will be modified after this PR is reviewed |
Please add the docs in this PR. We suggest that the documents involved in the modification should be submitted together. |
Sure, but there are many documents that need to be modified, maybe we can first review whether this PR is appropriate |
That sound good to me. |
If there is only one version of spark on a machine, there is no problem keeping only one. Problems can arise if a machine has two versions. Of course, I don't know about spark, I just analyze this problem from a business perspective |
Hi, @fuchanghai, Thanks for the reply. I agree with your point.
|
I think it's better to use current environment management to manage different version of spark. This is the meaning of environmental management. |
@SbloodyS Thanks for your suggestion. I totally agree with you. So DS uses |
Yes. That's what I mean. |
4fd4a6a
to
40db3a0
Compare
Hi, @SbloodyS , @fuchanghai Since #11721 is merged, I've rebased my PR and also modified the related docs. So now this PR uses So the misleading |
Codecov Report
@@ Coverage Diff @@
## dev #11860 +/- ##
=========================================
Coverage 38.68% 38.68%
Complexity 4006 4006
=========================================
Files 1002 1002
Lines 37213 37213
Branches 4249 4249
=========================================
Hits 14394 14394
Misses 21186 21186
Partials 1633 1633 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
40db3a0
to
38338ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rebase the latest code, our docs make some change
and can you also add some discription in both https://github.com/apache/dolphinscheduler/blob/4dca488cd50b4392d222167c01ae2a79fd295e77/dolphinscheduler-python/pydolphinscheduler/UPDATING.mdand https://github.com/apache/dolphinscheduler/blob/3aa9f2ea25ca42112141aad85140a72b0963e2c3/docs/docs/en/guide/upgrade/incompatible.md becaus this is incompatible change for users
docs/docs/en/guide/task/spark.md
Outdated
| Node Name | Set the name of the task. Node names within a workflow definition are unique. | | ||
| Run flag | Indicates whether the node can be scheduled normally. If it is not necessary to execute, you can turn on the prohibiting execution switch. | | ||
| Description | Describes the function of this node. | | ||
| Task priority | When the number of worker threads is insufficient, they are executed in order from high to low according to the priority, and they are executed according to the first-in, first-out principle when the priority is the same. | | ||
| Worker group | The task is assigned to the machines in the worker group for execution. If Default is selected, a worker machine will be randomly selected for execution. | | ||
| Task group name | The group in Resources, if not configured, it will not be used. | | ||
| Environment Name | Configure the environment in which to run the script. | | ||
| Number of failed retries | The number of times the task is resubmitted after failure. It supports drop-down and manual filling. | | ||
| Failure Retry Interval | The time interval for resubmitting the task if the task fails. It supports drop-down and manual filling. | | ||
| Timeout alarm | Check Timeout Alarm and Timeout Failure. When the task exceeds the "timeout duration", an alarm email will be sent and the task execution will fail. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is seems we already change the general parameter into separate parameter files, can you rebase to latest and take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
38338ba
to
64f3d2f
Compare
Hi, @zhongjiajie , thanks for your comment. I've rebased my PR and add some descriptions in |
Kudos, SonarCloud Quality Gate passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
@caishunfeng should we release this PR in version 3.1.0? |
Purpose of the pull request
close: #11857
Currently, the spark version is misleading.
![截屏2022-09-08 15 24 34](https://user-images.githubusercontent.com/38122586/189061020-9198f272-9239-4f32-a671-530c5b66a05e.png)
This is not the spark version that DS currently supports. E.g., DS can also run the spark task of
SPARK-3.x.x
.The spark version selected by the user only determines the environment variables used and the final command executed as below:
And also there is a bug that
spark-sql
can only be executed bySPARK2
, see #11721So why not just remove the spark version, like other task
And use
{SPARK_HOME}/bin/spark_submit
and{SPARK_HOME}/bin/spark-sql
Brief change log
use
{SPARK_HOME}/bin/spark_submit
and{SPARK_HOME}/bin/spark-sql
Verify this pull request
manually tested