Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Enable users to switch and install conda env in jupyter task #10337

Merged
merged 3 commits into from
Jun 20, 2022

Conversation

EricGao888
Copy link
Member

Purpose of the pull request

  • Enable users to switch and install conda env in jupyter task. If the conda environment user needs for his/her jupyter task is not too large like bigger than 100M, the user could upload the packed conda environment to resource center and select it when creating jupyter task, jupyter task plugin will automatically install the conda env on target worker before executing jupyter notes.
  • Also, users could upload multiple packed conda environments to resource center and select the one needed for a specific task.
  • This feature will save users quite a lot of time from installing multiple environments on multiple workers and greatly better the experience of using jupyter task plugin

image

Brief change log

  • Already described above.

Verify this pull request

  • Verified by unit tests and manual tests.

@SbloodyS SbloodyS added UI ui and front end related improvement make more easy to user or prompt friendly backend labels Jun 2, 2022
Copy link
Member

@Amy0104 Amy0104 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The front end part LGTM.

@EricGao888
Copy link
Member Author

Discussed with @SbloodyS, it seems using source may cause security issue. I'm trying to find a workaround to avoid using source.

image

Comment on lines +40 to +41
"tar -xzf %s -C jupyter_env && " +
"source jupyter_env/bin/activate";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to let users to define this path if the user's path is different from this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, env tar will be downloaded from resource center before task execution and removed when task execution completed.

// copy hdfs/minio file to local
List<Pair<String, String>> fileDownloads = downloadCheck(taskExecutionContext.getExecutePath(), taskExecutionContext.getResources());
if (!fileDownloads.isEmpty()){
downloadResource(taskExecutionContext.getExecutePath(), logger, fileDownloads);
}

} finally {
TaskExecutionContextCacheManager.removeByTaskInstanceId(taskExecutionContext.getTaskInstanceId());
taskCallbackService.sendTaskExecuteResponseCommand(taskExecutionContext);
clearTaskExecPath();
}

However, I think it is a good idea to give a choice to define the path and install the env on workers without getting removed after task execution. In this way, the same env will not need to be downloaded every time.

WDYT @SbloodyS @zhongjiajie

Copy link
Member

@SbloodyS SbloodyS Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better. But what i mean is if the user's the compressed tar package path after decompression is inconsistent with this hard code path. If we hard code this. We should put this to the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better. But what i mean is if the user's the compressed tar package path after decompression is inconsistent with this hard code path.

Seems if users use conda pack to pack their environment, when they run tar they will get the same directory structure.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about it. Could you take a look? @zhongjiajie

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better. But what i mean is if the user's the compressed tar package path after decompression is inconsistent with this hard code path.

Seems if users use conda pack to pack their environment, when they run tar they will get the same directory structure. image

I test locally, and it is 👍. but should we add some docs to tell use run conda pack to create the conda tarball?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better. But what i mean is if the user's the compressed tar package path after decompression is inconsistent with this hard code path.

Seems if users use conda pack to pack their environment, when they run tar they will get the same directory structure. image

I test locally, and it is 👍. but should we add some docs to tell use run conda pack to create the conda tarball?

@zhongjiajie Sure, that's a good idea. I will add some docs to instruct users on conda pack. BTW, is there any substitute for source jupyter_env/bin/activate? @SbloodyS and I discussed about this and it is possible that some companies ban developers from using source cmd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think it is a good idea to give a choice to define the path and install the env on workers without getting removed after task execution. In this way, the same env will not need to be downloaded every time.

but I do not think so, I think they can use shell task to do that. It is odd if we do not remove resources after the task execute finish, using two cases:

  • we only support not deleting resources after task is done: although the worker has already installed the env, but we still have resource configurated in task definition, So it means worker will re-install the conda env to worker. Or maybe users have to change the task definition and remove the resource after the first time they ran the task.(it is odd)
  • We have given users an option about resources install and delete. then have to change the option after they ran the task first time too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think it is a good idea to give a choice to define the path and install the env on workers without getting removed after task execution. In this way, the same env will not need to be downloaded every time.

but I do not think so, I think they can use shell task to do that. It is odd if we do not remove resources after the task execute finish, using two cases:

  • we only support not deleting resources after task is done: although the worker has already installed the env, but we still have resource configurated in task definition, So it means worker will re-install the conda env to worker. Or maybe users have to change the task definition and remove the resource after the first time they ran the task.(it is odd)
  • We have given users an option about resources install and delete. then have to change the option after they ran the task first time too.

Alright. This makes sense to me. Since the tasks are usually triggered multiple times, considering persisting the env could be complicated and might cause unexpected issues. If users want to persist the env, it is better to use a shell task and only trigger it once at the very beginning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that what I mean

@codecov-commenter
Copy link

codecov-commenter commented Jun 15, 2022

Codecov Report

Merging #10337 (2720f4d) into dev (4be3b87) will increase coverage by 0.00%.
The diff coverage is 88.88%.

@@            Coverage Diff            @@
##                dev   #10337   +/-   ##
=========================================
  Coverage     40.87%   40.87%           
- Complexity     4851     4853    +2     
=========================================
  Files           886      886           
  Lines         36032    36039    +7     
  Branches       3998     3999    +1     
=========================================
+ Hits          14727    14731    +4     
- Misses        19848    19854    +6     
+ Partials       1457     1454    -3     
Impacted Files Coverage Δ
...cheduler/plugin/task/jupyter/JupyterConstants.java 0.00% <ø> (ø)
...heduler/plugin/task/jupyter/JupyterParameters.java 91.17% <75.00%> (-2.16%) ⬇️
...phinscheduler/plugin/task/jupyter/JupyterTask.java 66.66% <100.00%> (+1.19%) ⬆️
...org/apache/dolphinscheduler/remote/utils/Host.java 43.47% <0.00%> (-2.18%) ⬇️
...e/dolphinscheduler/remote/NettyRemotingClient.java 51.38% <0.00%> (-1.39%) ⬇️
...er/master/dispatch/host/assign/RandomSelector.java 83.33% <0.00%> (+5.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4be3b87...2720f4d. Read the comment docs.

@EricGao888
Copy link
Member Author

Related docs added in the latest commit : )

@zhongjiajie
Copy link
Member

and we still have discussion in #10337 (comment), I wonder whether conda have some command like import same as docker's docker import to import and then conda activate?

@EricGao888
Copy link
Member Author

and we still have discussion in #10337 (comment), I wonder whether conda have some command like import same as docker's docker import to import and then conda activate?

I haven't found such things so far. However, using source for unpacked conda env seems to be the only way I could find on Conda-Pack official page.

image

@zhongjiajie
Copy link
Member

and we still have discussion in #10337 (comment), I wonder whether conda have some command like import same as docker's docker import to import and then conda activate?

I haven't found such things so far. However, using source for unpacked conda env seems to be the only way I could find on Conda-Pack official page.

image

Yeah, I can not find either, I think maybe we should use source to activate the env template, WDYT @EricGao888 @SbloodyS

@EricGao888
Copy link
Member Author

and we still have discussion in #10337 (comment), I wonder whether conda have some command like import same as docker's docker import to import and then conda activate?

I haven't found such things so far. However, using source for unpacked conda env seems to be the only way I could find on Conda-Pack official page.
image

Yeah, I can not find either, I think maybe we should use source to activate the env template, WDYT @EricGao888 @SbloodyS

Yes, IMHO, there is no perfect solution here to avoid using source while providing good user experience at the same time. This is a trade-off. I think we could give user this option but add some notice about source command in docs. If they have concerns about it, they could choose not to use it.

@zhongjiajie
Copy link
Member

SGTM, So please add some docs to hint users about thesource command

@sonarcloud
Copy link

sonarcloud bot commented Jun 18, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

88.9% 88.9% Coverage
0.0% 0.0% Duplication

@EricGao888
Copy link
Member Author

EricGao888 commented Jun 18, 2022

SGTM, So please add some docs to hint users about thesource command

Caveats to source command usage have been added in docs in the latest commit, PTAL @zhongjiajie @SbloodyS Thx!

Copy link
Member

@SbloodyS SbloodyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend part LGTM

Copy link
Member

@zhongjiajie zhongjiajie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and frontend jus simply add resource usage, merge

@zhongjiajie zhongjiajie merged commit 64cee03 into apache:dev Jun 20, 2022
@zhongjiajie zhongjiajie changed the title [Feature][Task Plugin]Enable users to switch and install conda env in jupyter task [Feature] Enable users to switch and install conda env in jupyter task Jun 20, 2022
@zhongjiajie zhongjiajie added this to the 3.1.0-alpha milestone Jun 20, 2022
hstdream pushed a commit to hstdream/dolphinscheduler that referenced this pull request Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend improvement make more easy to user or prompt friendly UI ui and front end related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Task Plugin] Enable users to switch and install conda env in jupyter task
5 participants