-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a config option to print DAG. #4257
Conversation
Can one of the admins verify this patch? |
You can print the same details by calling rdd.toDebugString in your program ? |
Add configuration parameters "spark.rddDebug.enable" , and default value is false. When the option "spark.rddDebug.enable" is true, can print the DAG tree in the log. While the option is false, it will not print DAG info. |
I think @ScrapCodes's question is ... is this necessary, given users can easily print the dag themselves? |
@rxin I have noticed that very few users know about |
Can one of the admins verify this patch? |
Actually now I think about it - users don't always control the RDDs. For example, if they are using the ML pipeline API or calling some other libraries. Maybe there is some merit in including this, especially if they are off by default? |
Fair, but the issue is in some cases (e.g. GraphX) the printed representation of the DAG can be many hundreds of lines long. could that potentially explode the output? |
I think, when running a application, users just need change a config option, not modify binary code. And the default value of this config option is false. When users need get DAG info, users will go to modify this value. So i think there is a small chance of exploded output, or there is Impossible of exploded output. |
@pwendell |
ok to test |
Test build #27096 has started for PR 4257 at commit
|
Test build #27096 has finished for PR 4257 at commit
|
Test PASSed. |
Maybe we can include this, provided that it is off by default. That said, I think a better name for this option is "spark.logLineage". |
change config option from "spark.rddDebug.enable" to "spark.logLineage"
Test build #27149 has started for PR 4257 at commit
|
@rxin |
Test build #27154 has started for PR 4257 at commit
|
Test build #27154 has finished for PR 4257 at commit
|
Test FAILed. |
Test build #27149 has finished for PR 4257 at commit
|
Test FAILed. |
Test build #27166 has started for PR 4257 at commit
|
Test build #27166 has finished for PR 4257 at commit
|
Test PASSed. |
Thanks. I've merged this. |
Add a config option "spark.rddDebug.enable" to check whether to print DAG info. When "spark.rddDebug.enable" is true, it will print information about DAG in the log. Author: KaiXinXiaoLei <huleilei1@huawei.com> Closes #4257 from KaiXinXiaoLei/DAGprint and squashes the following commits: d9fe42e [KaiXinXiaoLei] change log info c27ee76 [KaiXinXiaoLei] change log info 83c2b32 [KaiXinXiaoLei] change config option adcb14f [KaiXinXiaoLei] change the file. f4e7b9e [KaiXinXiaoLei] add a option to print DAG (cherry picked from commit 31d435e) Signed-off-by: Reynold Xin <rxin@databricks.com>
Add a config option "spark.rddDebug.enable" to check whether to print DAG info. When "spark.rddDebug.enable" is true, it will print information about DAG in the log.