Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mounting dependencies on paths specified by keys #2295

Merged
merged 18 commits into from
Jun 11, 2020
Merged

Mounting dependencies on paths specified by keys #2295

merged 18 commits into from
Jun 11, 2020

Conversation

teetone
Copy link
Collaborator

@teetone teetone commented May 24, 2020

Resolves #1115

  • Support ., which will mount all content on the top-level of the dependencies path
  • Support custom paths via keys
  • We currently don't support paths to be an ancestor of another path since we mount file-systems to be read-only

try:
os.symlink(target, source)
except OSError as e:
# Override the existing symlink if one already exists
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all these cases necessary now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially had this helper function because of the FileExistsError when creating symlinks for nested paths. I figured out that a better way to do this is to create all the directories up to the parent of the path and then create a symlink for the path itself.

@@ -446,6 +502,9 @@ def _transition_from_CLEANING_UP(self, run_state):
time.sleep(1)

for dep in run_state.bundle.dependencies:
if dep.child_path == '.':
continue

dep_key = DependencyKey(dep.parent_uuid, dep.parent_path)
if not self.shared_file_system: # No dependencies if shared fs worker
self.dependency_manager.release(run_state.bundle.uuid, dep_key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this exempt for . dependencies? Seems like it's still a dependency and the only thing that's affected is the mounting behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more investigation, I found out that I wasn't cleaning up all the content that was mounted at the top. I fixed this in my recent commit.

Copy link
Collaborator

@percyliang percyliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good overall! I'd try to make the code a bit more modular - the logic for attaching mounting a path within a dependency to some path versus how we interpret .. Note that if we wanted to support having keys foo and foo/bar, then ideally we'd use the same logic for how we deal with .: just mount everything under foo.

if dep.child_path != '.':
# A child path can be a nested path which we need to create directories for
Path(full_child_path).mkdir(parents=True, exist_ok=True)
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious how we think about refactoring file operations from using os to pathlib? pathlib seems to be more convenient and compact and easier to read. We might be able to remove a lot of extra code by using pathlib. If we think this is something worth doing, we could file a separate ticket?

Copy link
Collaborator Author

@teetone teetone May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer pathlib too. It provides more utility than os does.

@teetone
Copy link
Collaborator Author

teetone commented May 27, 2020

Logic looks good overall! I'd try to make the code a bit more modular - the logic for attaching mounting a path within a dependency to some path versus how we interpret .. Note that if we wanted to support having keys foo and foo/bar, then ideally we'd use the same logic for how we deal with .: just mount everything under foo.

@percyliang I fixed the issue I was having with shared filesystems and added test2 to the shared filesystems run of our Travis build. I also did some refactoring and made the logic a bit more modular.

@teetone teetone marked this pull request as ready for review May 27, 2020 12:34
@@ -600,20 +600,33 @@ def resolve_key_targets(self, client, worksheet_uuid, target_specs):
Helper: target_specs is a list of strings which are [<key>]:<target>
Returns: [(key, worker.download_util.BundleTarget), ...]
"""
keys = set()

def is_ancestor(path1, path2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_ancestor_or_descendant is probably clearer. Note that if path1 == path2, we return false. Is that the right behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be okay because we check for duplicate keys before this check.

@teetone
Copy link
Collaborator Author

teetone commented Jun 6, 2020

@percyliang @candicegjing I modified the PR to support absolute paths. When a key is an absolute path, I ended up removing the leaf to clean up the dependency that is mounted.

Copy link
Collaborator

@percyliang percyliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@teetone teetone merged commit 7a87ad1 into master Jun 11, 2020
armantajback added a commit that referenced this pull request Jun 19, 2020
* Fix free disk bytes calculation (#2370)

Closed via #2362

* Fix WorkerManager launching (#2372)

Co-authored-by: Jing Ge <jingge2@illinois.edu>

* Make worker id match output directory and job name (#2369)

* Make worker id match output directory and job name

* Blacken

Co-authored-by: Jing Ge <jingge2@illinois.edu>

* Pass down max work dir size and delete workdir on exit to slurm workers (#2354)

* Pass down max work dir size and delete workdir on exit to slurm workers

* Fix typo.

Co-authored-by: Jing Ge <jingge2@illinois.edu>

* == -> === (#2357)

Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>

* Refactor dialogs into worksheetDialog component (#2263)

* merge openX to one open dialog variable

* refactor error message, delete worksheet message, bundle action fail dialogs into worksheet dialog component

* add check when using enter keyword

Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>

* remove context menu (#2376)

* Update print information (#2382)

Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local>

* Add a CLI option to limit the number of jobs allowed to run on a worker  (#2289)

Closed via #2287

* Set cli verbose default to 0 (#2378)

Closed via #2350

* Mounting dependencies on paths specified by keys (#2295)

* Fix time displaying issue (#2381)

* Fix datetime displaying issue

* Add a comment

Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local>
Co-authored-by: yipenghe <yipenghe@stanford.edu>

* overdue (#2393)

* fix migration (#2403)

* Make strings more consistent in terms of terminology / organization (#2409)

* tweak dialog prompts

* add cut/copied

* organize keyboard shortcuts

* fix formatting

* remove dummy file

* ignore a d when a dialog is opened

Co-authored-by: yipenghe <yipenghe@stanford.edu>

* Replace sacct with scontrol, since sacct is only available on the slurm head node (#2368)

* Handle sort key is 0 condition on frontend and use correct sort key for new runs (#2412)

* handle sort key is 0 condition

* handle sort key is 0 in getAfterSortKey function

* use this.props.after_sort_key instead

* Clear force delete bit after deletion (#2413)

* clear force delete bit

* clear force delete bit fails

* bump to v0.5.15 (#2416)

* 2417: Fix mkdocs Travis failure (#2420)

* debug

* reenable test

* Fix SlurmWorkerManager overlaunching (#2391)

Co-authored-by: Jing Ge <jingge2@illinois.edu>

* Fix accessing worker information from WorkerInfoAccessor (#2419)

* Don't fail WorkerManager if a network exception is encountered (#2422)

* Rename actionbar => terminal (#2429)



* rename actionbar->terminal

* remove unused line

* Update frontend/src/components/worksheets/Worksheet/Worksheet.js

Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>

* action bar => terminal in comments

Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>

* Remove select all for table (#2428)

* clear force delete bit

* remove select-all checkbox

* remove selectAll handler

* Remove detach from the frontend (#2427)

* clear force delete bit

* remove detach from frontend

* remove constant

* Use GitHub Actions for CI (#2185)

for #2094

Based off of Jane's work in https://github.com/candicegjing/codalab-worksheets

Time: 40 minutes -> 15-20 minutes

Several improvements:

Build images split into three parallel steps - rest-server, worker, frontend
Tests split into around 10 parallel steps - each step runs about 4 tests, and there is a step which runs the UI tests.
If a CLI test gives an error, it archives all Docker containers' logs so that they can be downloaded and inspected.
If the UI test gives an error, it archives the screenshot images so that they can be downloaded and inspected.
Additionally, the publish to pypi process has changed. Now, publishes to pypi happen on every GitHub release as opposed to every tag push. Effectively, this means that the release workflow has changed a bit:
Old workflow:
Wait 40 mins for Travis build on master to complete
Push a tag to master
Wait 40 mins for Travis build on master to complete, which also deploys to pypi
New workflow:
Wait for GitHub Actions build on master to complete
Create a new GitHub release
Wait for action to complete, which only deploys to pypi

* CI: github.head_ref -> github.ref

* CI: properly populate VERSION variable with branch name

* Add exit-after-num-runs=1 to slurm worker manager (#2373)

Closed via #2289

* Fix showing file contents for record item  (#2455)

fix #2446

to test:
create a schema with a field using files:
% schema a
% add hello /stdout

display bundles in record mode
% display record a

* Fix long test startup times by using python:3.6.10-slim-buster for default test runs (#2449)

Fixes #2388 by using python:3.6.10-slim-buster for default test runs.

Following @nelson-liu 's suggestion in #2388 (comment).

This image is still around ~40 MB (as opposed to the 5 GB default-cpu image).

Time changes from ~20 mins -> 10 mins

* Fix github actions on forks by not calling --push (#2457)

This conditional expression was in the old `.travis.yml`, but it isn't there in the new github actions workflow. This PR adds that expression so that `--push` is not called from forks.

Fixes #2456

* Prettier CLI Reference Doc (#2458)

Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local>

* Actually fix builds from forks by building docker images when needed (#2461)

* Test worker restart in GHA (#2466)

Resolves #2465

Adding back this test that we used to have in Travis
Appended .log for to err log files

Co-authored-by: Jing Ge <jingge2@illinois.edu>
Co-authored-by: Nelson Liu <nelson-liu@users.noreply.github.com>
Co-authored-by: yipenghe <yipenghe@stanford.edu>
Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>
Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local>
Co-authored-by: Tony Lee <tonyh.lee@yahoo.com>
Co-authored-by: Percy Liang <percyliang@gmail.com>
@epicfaace epicfaace deleted the 1115-mount branch June 20, 2020 16:28
adiprerepa pushed a commit that referenced this pull request May 27, 2021
* Fix free disk bytes calculation (#2370)

Closed via #2362

* Fix WorkerManager launching (#2372)

Co-authored-by: Jing Ge <jingge2@illinois.edu>

* Make worker id match output directory and job name (#2369)

* Make worker id match output directory and job name

* Blacken

Co-authored-by: Jing Ge <jingge2@illinois.edu>

* Pass down max work dir size and delete workdir on exit to slurm workers (#2354)

* Pass down max work dir size and delete workdir on exit to slurm workers

* Fix typo.

Co-authored-by: Jing Ge <jingge2@illinois.edu>

* == -> === (#2357)

Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>

* Refactor dialogs into worksheetDialog component (#2263)

* merge openX to one open dialog variable

* refactor error message, delete worksheet message, bundle action fail dialogs into worksheet dialog component

* add check when using enter keyword

Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>

* remove context menu (#2376)

* Update print information (#2382)

Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local>

* Add a CLI option to limit the number of jobs allowed to run on a worker  (#2289)

Closed via #2287

* Set cli verbose default to 0 (#2378)

Closed via #2350

* Mounting dependencies on paths specified by keys (#2295)

* Fix time displaying issue (#2381)

* Fix datetime displaying issue

* Add a comment

Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local>
Co-authored-by: yipenghe <yipenghe@stanford.edu>

* overdue (#2393)

* fix migration (#2403)

* Make strings more consistent in terms of terminology / organization (#2409)

* tweak dialog prompts

* add cut/copied

* organize keyboard shortcuts

* fix formatting

* remove dummy file

* ignore a d when a dialog is opened

Co-authored-by: yipenghe <yipenghe@stanford.edu>

* Replace sacct with scontrol, since sacct is only available on the slurm head node (#2368)

* Handle sort key is 0 condition on frontend and use correct sort key for new runs (#2412)

* handle sort key is 0 condition

* handle sort key is 0 in getAfterSortKey function

* use this.props.after_sort_key instead

* Clear force delete bit after deletion (#2413)

* clear force delete bit

* clear force delete bit fails

* bump to v0.5.15 (#2416)

* 2417: Fix mkdocs Travis failure (#2420)

* debug

* reenable test

* Fix SlurmWorkerManager overlaunching (#2391)

Co-authored-by: Jing Ge <jingge2@illinois.edu>

* Fix accessing worker information from WorkerInfoAccessor (#2419)

* Don't fail WorkerManager if a network exception is encountered (#2422)

* Rename actionbar => terminal (#2429)



* rename actionbar->terminal

* remove unused line

* Update frontend/src/components/worksheets/Worksheet/Worksheet.js

Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>

* action bar => terminal in comments

Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>

* Remove select all for table (#2428)

* clear force delete bit

* remove select-all checkbox

* remove selectAll handler

* Remove detach from the frontend (#2427)

* clear force delete bit

* remove detach from frontend

* remove constant

* Use GitHub Actions for CI (#2185)

for #2094

Based off of Jane's work in https://github.com/candicegjing/codalab-worksheets

Time: 40 minutes -> 15-20 minutes

Several improvements:

Build images split into three parallel steps - rest-server, worker, frontend
Tests split into around 10 parallel steps - each step runs about 4 tests, and there is a step which runs the UI tests.
If a CLI test gives an error, it archives all Docker containers' logs so that they can be downloaded and inspected.
If the UI test gives an error, it archives the screenshot images so that they can be downloaded and inspected.
Additionally, the publish to pypi process has changed. Now, publishes to pypi happen on every GitHub release as opposed to every tag push. Effectively, this means that the release workflow has changed a bit:
Old workflow:
Wait 40 mins for Travis build on master to complete
Push a tag to master
Wait 40 mins for Travis build on master to complete, which also deploys to pypi
New workflow:
Wait for GitHub Actions build on master to complete
Create a new GitHub release
Wait for action to complete, which only deploys to pypi

* CI: github.head_ref -> github.ref

* CI: properly populate VERSION variable with branch name

* Add exit-after-num-runs=1 to slurm worker manager (#2373)

Closed via #2289

* Fix showing file contents for record item  (#2455)

fix #2446

to test:
create a schema with a field using files:
% schema a
% add hello /stdout

display bundles in record mode
% display record a

* Fix long test startup times by using python:3.6.10-slim-buster for default test runs (#2449)

Fixes #2388 by using python:3.6.10-slim-buster for default test runs.

Following @nelson-liu 's suggestion in #2388 (comment).

This image is still around ~40 MB (as opposed to the 5 GB default-cpu image).

Time changes from ~20 mins -> 10 mins

* Fix github actions on forks by not calling --push (#2457)

This conditional expression was in the old `.travis.yml`, but it isn't there in the new github actions workflow. This PR adds that expression so that `--push` is not called from forks.

Fixes #2456

* Prettier CLI Reference Doc (#2458)

Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local>

* Actually fix builds from forks by building docker images when needed (#2461)

* Test worker restart in GHA (#2466)

Resolves #2465

Adding back this test that we used to have in Travis
Appended .log for to err log files

Co-authored-by: Jing Ge <jingge2@illinois.edu>
Co-authored-by: Nelson Liu <nelson-liu@users.noreply.github.com>
Co-authored-by: yipenghe <yipenghe@stanford.edu>
Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>
Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local>
Co-authored-by: Tony Lee <tonyh.lee@yahoo.com>
Co-authored-by: Percy Liang <percyliang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow mounting dependencies at not just single level
4 participants