Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Task Plugin] Dolphin zeppelin task plugin improvement plan #9814

Closed
5 of 8 tasks
EricGao888 opened this issue Apr 27, 2022 · 11 comments
Closed
5 of 8 tasks
Assignees
Labels
feature new feature

Comments

@EricGao888
Copy link
Member

EricGao888 commented Apr 27, 2022

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

By Integrating Apache Dolphin Scheduler with Apache Zeppelin through zeppelin task plugin, we aim to provide Dolphin users, especially those big data engineers, with so-called Big Data Studio experience, which means users can develop and debug big data related tasks in zeppelin notebook interactively and schedule them directly from dolphin with 'one click'. This feature will significantly boost development efficiency of big data engineers and lower the bar for those who do not have much experience in the big data area.

However, currently dolphin only has basic integration with zeppelin and we need more features in zeppelin task plugin to achieve our goal. Here are a few points I come up with at the moment:

I would like to invite Apache Zeppelin PMC, Jeff Zhang @zjffdu to help with the review.

BTW, the idea of zeppelin task plugin is inspired by Jeff's previous work on Apache Airflow Zeppelin Operator, kudos to Jeff.

image

Use case

  • Already described above

Related issues

releated: #9201 #9798 #5271

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@EricGao888 EricGao888 added feature new feature Waiting for reply Waiting for reply labels Apr 27, 2022
@github-actions
Copy link

Thank you for your feedback, we have received your issue, Please wait patiently for a reply.

  • In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
  • If you haven't received a reply for a long time, you can join our slack and send your question to channel #troubleshooting

@EricGao888
Copy link
Member Author

EricGao888 commented Apr 27, 2022

Any ideas or suggestions to these features are appreciated!

@caishunfeng caishunfeng removed the Waiting for reply Waiting for reply label Apr 27, 2022
@caishunfeng
Copy link
Contributor

Look good to me, but I I have the following questions:

Enable note-level zeppelin task

Is it means that it needs database storage or resource manage?

Add custom variables support in zeppelin task plugin.

What's the difference from ds?

@EricGao888
Copy link
Member Author

Look good to me, but I I have the following questions:

Enable note-level zeppelin task

Is it means that it needs database storage or resource manage?

Add custom variables support in zeppelin task plugin.

What's the difference from ds?

@caishunfeng My bad. I think I didn't make it clear. When developing big data tasks in zeppelin, users write zeppelin note, which is consist of one or multiple paragraphs. You can run the whole note or just a specific paragraph. Currently ds zeppelin task plugin only supports trigger zeppelin paragraphs. Enabling note-level zeppelin task means ds will be able to trigger a whole zeppelin note. To enable note-level zeppelin task scheduling, we just need to call submitNote method from Zeppelin Client APIs. The reason why I didn't use this API is because at that time, there was no cancelNote method in Zeppelin Client APIs. Once we trigger a zeppelin note from ds, we would not be able to cancel it, which may lead to some issues. Since Zeppelin Client API now includes canceNote method, we can add this feature in ds zeppelin task plugin. It doesn't need database storage or resource management stuff. I hope this explanation make sense to you. : )

image

@EricGao888
Copy link
Member Author

Look good to me, but I I have the following questions:

Enable note-level zeppelin task

Is it means that it needs database storage or resource manage?

Add custom variables support in zeppelin task plugin.

What's the difference from ds?

About custom variables, yes, I mean exact that of ds. Just want to combine it with zeppelin dynamic form.

@EricGao888
Copy link
Member Author

BTW, since Dolphin also supports workflow as code, we could add a dolphin-scheduler-workflow-interpreter on Zeppelin side. With this feature, users will be able to write dolphin workflow python script in notebook interactively. I will open a related issue in Apache Zeppelin community later. Just write this idea down in case I forget it : ) @dailidong @zhongjiajie

@zhongjiajie
Copy link
Member

BTW, since Dolphin also supports workflow as code, we could add a dolphin-scheduler-workflow-interpreter on Zeppelin side. With this feature, users will be able to write dolphin workflow python script in notebook interactively. I will open a related issue in Apache Zeppelin community later. Just write this idea down in case I forget it : ) @dailidong @zhongjiajie

Sound good! thank for you bring this up !

@EricGao888
Copy link
Member Author

EricGao888 commented Jun 14, 2022

To make zeppelin task plugin more user-friendly, we could add some UI interaction features. For example, once a user fills in the noteId, there could be a button linking to the page of the zeppelin note with same noteId. In that case, the user could open and edit the connected note conveniently.

@zhongjiajie
Copy link
Member

To make zeppelin task plugin more user-friendly, we could add some UI interaction features. For example, once a user fills in the noteId, there could be a button linking to the page of the zeppelin note with same noteId. In that case, the user could open and edit the connected note conveniently.

yes, of cause, we have similar function in sub_process task, but just jump to dolphinscheduler resource instead of zeppelin's

@EricGao888
Copy link
Member Author

# Url endpoint for zeppelin RESTful API
zeppelin.rest.url=http://localhost:8080

I think it was not a good idea to put zeppelin endpoint in common.properties. I will submit a PR to remove it from common.properties and put it into task parameters. For default endpoint, maybe we could add it in configuration center in the future. see: #10283

@zhongjiajie
Copy link
Member

# Url endpoint for zeppelin RESTful API
zeppelin.rest.url=http://localhost:8080

I think it was not a good idea to put zeppelin endpoint in common.properties. I will submit a PR to remove it from common.properties and put it into task parameters. For default endpoint, maybe we could add it in configuration center in the future. see: #10283

sure, agree with that

EricGao888 added a commit to EricGao888/dolphinscheduler that referenced this issue Jul 11, 2022
EricGao888 added a commit to EricGao888/dolphinscheduler that referenced this issue Jul 13, 2022
zhuangchong pushed a commit that referenced this issue Jul 15, 2022
…asks (#10925)

* [Feature][Task Plugin] Enable users to switch endpoints in zeppelin tasks (#9814)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature
Projects
None yet
Development

No branches or pull requests

4 participants