-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mounting dependencies on paths specified by keys #2295
Conversation
codalab/worker/worker_run_state.py
Outdated
try: | ||
os.symlink(target, source) | ||
except OSError as e: | ||
# Override the existing symlink if one already exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are all these cases necessary now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had this helper function because of the FileExistsError
when creating symlinks for nested paths. I figured out that a better way to do this is to create all the directories up to the parent of the path and then create a symlink for the path itself.
@@ -446,6 +502,9 @@ def _transition_from_CLEANING_UP(self, run_state): | |||
time.sleep(1) | |||
|
|||
for dep in run_state.bundle.dependencies: | |||
if dep.child_path == '.': | |||
continue | |||
|
|||
dep_key = DependencyKey(dep.parent_uuid, dep.parent_path) | |||
if not self.shared_file_system: # No dependencies if shared fs worker | |||
self.dependency_manager.release(run_state.bundle.uuid, dep_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this exempt for .
dependencies? Seems like it's still a dependency and the only thing that's affected is the mounting behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some more investigation, I found out that I wasn't cleaning up all the content that was mounted at the top. I fixed this in my recent commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good overall! I'd try to make the code a bit more modular - the logic for attaching mounting a path within a dependency to some path versus how we interpret .
. Note that if we wanted to support having keys foo
and foo/bar
, then ideally we'd use the same logic for how we deal with .
: just mount everything under foo
.
codalab/worker/worker_run_state.py
Outdated
if dep.child_path != '.': | ||
# A child path can be a nested path which we need to create directories for | ||
Path(full_child_path).mkdir(parents=True, exist_ok=True) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious how we think about refactoring file operations from using os
to pathlib
? pathlib
seems to be more convenient and compact and easier to read. We might be able to remove a lot of extra code by using pathlib
. If we think this is something worth doing, we could file a separate ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer pathlib
too. It provides more utility than os
does.
@percyliang I fixed the issue I was having with shared filesystems and added |
codalab/lib/bundle_cli.py
Outdated
@@ -600,20 +600,33 @@ def resolve_key_targets(self, client, worksheet_uuid, target_specs): | |||
Helper: target_specs is a list of strings which are [<key>]:<target> | |||
Returns: [(key, worker.download_util.BundleTarget), ...] | |||
""" | |||
keys = set() | |||
|
|||
def is_ancestor(path1, path2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_ancestor_or_descendant
is probably clearer. Note that if path1 == path2
, we return false. Is that the right behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be okay because we check for duplicate keys before this check.
@percyliang @candicegjing I modified the PR to support absolute paths. When a key is an absolute path, I ended up removing the leaf to clean up the dependency that is mounted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
* Fix free disk bytes calculation (#2370) Closed via #2362 * Fix WorkerManager launching (#2372) Co-authored-by: Jing Ge <jingge2@illinois.edu> * Make worker id match output directory and job name (#2369) * Make worker id match output directory and job name * Blacken Co-authored-by: Jing Ge <jingge2@illinois.edu> * Pass down max work dir size and delete workdir on exit to slurm workers (#2354) * Pass down max work dir size and delete workdir on exit to slurm workers * Fix typo. Co-authored-by: Jing Ge <jingge2@illinois.edu> * == -> === (#2357) Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> * Refactor dialogs into worksheetDialog component (#2263) * merge openX to one open dialog variable * refactor error message, delete worksheet message, bundle action fail dialogs into worksheet dialog component * add check when using enter keyword Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> * remove context menu (#2376) * Update print information (#2382) Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local> * Add a CLI option to limit the number of jobs allowed to run on a worker (#2289) Closed via #2287 * Set cli verbose default to 0 (#2378) Closed via #2350 * Mounting dependencies on paths specified by keys (#2295) * Fix time displaying issue (#2381) * Fix datetime displaying issue * Add a comment Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local> Co-authored-by: yipenghe <yipenghe@stanford.edu> * overdue (#2393) * fix migration (#2403) * Make strings more consistent in terms of terminology / organization (#2409) * tweak dialog prompts * add cut/copied * organize keyboard shortcuts * fix formatting * remove dummy file * ignore a d when a dialog is opened Co-authored-by: yipenghe <yipenghe@stanford.edu> * Replace sacct with scontrol, since sacct is only available on the slurm head node (#2368) * Handle sort key is 0 condition on frontend and use correct sort key for new runs (#2412) * handle sort key is 0 condition * handle sort key is 0 in getAfterSortKey function * use this.props.after_sort_key instead * Clear force delete bit after deletion (#2413) * clear force delete bit * clear force delete bit fails * bump to v0.5.15 (#2416) * 2417: Fix mkdocs Travis failure (#2420) * debug * reenable test * Fix SlurmWorkerManager overlaunching (#2391) Co-authored-by: Jing Ge <jingge2@illinois.edu> * Fix accessing worker information from WorkerInfoAccessor (#2419) * Don't fail WorkerManager if a network exception is encountered (#2422) * Rename actionbar => terminal (#2429) * rename actionbar->terminal * remove unused line * Update frontend/src/components/worksheets/Worksheet/Worksheet.js Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> * action bar => terminal in comments Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> * Remove select all for table (#2428) * clear force delete bit * remove select-all checkbox * remove selectAll handler * Remove detach from the frontend (#2427) * clear force delete bit * remove detach from frontend * remove constant * Use GitHub Actions for CI (#2185) for #2094 Based off of Jane's work in https://github.com/candicegjing/codalab-worksheets Time: 40 minutes -> 15-20 minutes Several improvements: Build images split into three parallel steps - rest-server, worker, frontend Tests split into around 10 parallel steps - each step runs about 4 tests, and there is a step which runs the UI tests. If a CLI test gives an error, it archives all Docker containers' logs so that they can be downloaded and inspected. If the UI test gives an error, it archives the screenshot images so that they can be downloaded and inspected. Additionally, the publish to pypi process has changed. Now, publishes to pypi happen on every GitHub release as opposed to every tag push. Effectively, this means that the release workflow has changed a bit: Old workflow: Wait 40 mins for Travis build on master to complete Push a tag to master Wait 40 mins for Travis build on master to complete, which also deploys to pypi New workflow: Wait for GitHub Actions build on master to complete Create a new GitHub release Wait for action to complete, which only deploys to pypi * CI: github.head_ref -> github.ref * CI: properly populate VERSION variable with branch name * Add exit-after-num-runs=1 to slurm worker manager (#2373) Closed via #2289 * Fix showing file contents for record item (#2455) fix #2446 to test: create a schema with a field using files: % schema a % add hello /stdout display bundles in record mode % display record a * Fix long test startup times by using python:3.6.10-slim-buster for default test runs (#2449) Fixes #2388 by using python:3.6.10-slim-buster for default test runs. Following @nelson-liu 's suggestion in #2388 (comment). This image is still around ~40 MB (as opposed to the 5 GB default-cpu image). Time changes from ~20 mins -> 10 mins * Fix github actions on forks by not calling --push (#2457) This conditional expression was in the old `.travis.yml`, but it isn't there in the new github actions workflow. This PR adds that expression so that `--push` is not called from forks. Fixes #2456 * Prettier CLI Reference Doc (#2458) Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local> * Actually fix builds from forks by building docker images when needed (#2461) * Test worker restart in GHA (#2466) Resolves #2465 Adding back this test that we used to have in Travis Appended .log for to err log files Co-authored-by: Jing Ge <jingge2@illinois.edu> Co-authored-by: Nelson Liu <nelson-liu@users.noreply.github.com> Co-authored-by: yipenghe <yipenghe@stanford.edu> Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local> Co-authored-by: Tony Lee <tonyh.lee@yahoo.com> Co-authored-by: Percy Liang <percyliang@gmail.com>
* Fix free disk bytes calculation (#2370) Closed via #2362 * Fix WorkerManager launching (#2372) Co-authored-by: Jing Ge <jingge2@illinois.edu> * Make worker id match output directory and job name (#2369) * Make worker id match output directory and job name * Blacken Co-authored-by: Jing Ge <jingge2@illinois.edu> * Pass down max work dir size and delete workdir on exit to slurm workers (#2354) * Pass down max work dir size and delete workdir on exit to slurm workers * Fix typo. Co-authored-by: Jing Ge <jingge2@illinois.edu> * == -> === (#2357) Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> * Refactor dialogs into worksheetDialog component (#2263) * merge openX to one open dialog variable * refactor error message, delete worksheet message, bundle action fail dialogs into worksheet dialog component * add check when using enter keyword Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> * remove context menu (#2376) * Update print information (#2382) Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local> * Add a CLI option to limit the number of jobs allowed to run on a worker (#2289) Closed via #2287 * Set cli verbose default to 0 (#2378) Closed via #2350 * Mounting dependencies on paths specified by keys (#2295) * Fix time displaying issue (#2381) * Fix datetime displaying issue * Add a comment Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local> Co-authored-by: yipenghe <yipenghe@stanford.edu> * overdue (#2393) * fix migration (#2403) * Make strings more consistent in terms of terminology / organization (#2409) * tweak dialog prompts * add cut/copied * organize keyboard shortcuts * fix formatting * remove dummy file * ignore a d when a dialog is opened Co-authored-by: yipenghe <yipenghe@stanford.edu> * Replace sacct with scontrol, since sacct is only available on the slurm head node (#2368) * Handle sort key is 0 condition on frontend and use correct sort key for new runs (#2412) * handle sort key is 0 condition * handle sort key is 0 in getAfterSortKey function * use this.props.after_sort_key instead * Clear force delete bit after deletion (#2413) * clear force delete bit * clear force delete bit fails * bump to v0.5.15 (#2416) * 2417: Fix mkdocs Travis failure (#2420) * debug * reenable test * Fix SlurmWorkerManager overlaunching (#2391) Co-authored-by: Jing Ge <jingge2@illinois.edu> * Fix accessing worker information from WorkerInfoAccessor (#2419) * Don't fail WorkerManager if a network exception is encountered (#2422) * Rename actionbar => terminal (#2429) * rename actionbar->terminal * remove unused line * Update frontend/src/components/worksheets/Worksheet/Worksheet.js Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> * action bar => terminal in comments Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> * Remove select all for table (#2428) * clear force delete bit * remove select-all checkbox * remove selectAll handler * Remove detach from the frontend (#2427) * clear force delete bit * remove detach from frontend * remove constant * Use GitHub Actions for CI (#2185) for #2094 Based off of Jane's work in https://github.com/candicegjing/codalab-worksheets Time: 40 minutes -> 15-20 minutes Several improvements: Build images split into three parallel steps - rest-server, worker, frontend Tests split into around 10 parallel steps - each step runs about 4 tests, and there is a step which runs the UI tests. If a CLI test gives an error, it archives all Docker containers' logs so that they can be downloaded and inspected. If the UI test gives an error, it archives the screenshot images so that they can be downloaded and inspected. Additionally, the publish to pypi process has changed. Now, publishes to pypi happen on every GitHub release as opposed to every tag push. Effectively, this means that the release workflow has changed a bit: Old workflow: Wait 40 mins for Travis build on master to complete Push a tag to master Wait 40 mins for Travis build on master to complete, which also deploys to pypi New workflow: Wait for GitHub Actions build on master to complete Create a new GitHub release Wait for action to complete, which only deploys to pypi * CI: github.head_ref -> github.ref * CI: properly populate VERSION variable with branch name * Add exit-after-num-runs=1 to slurm worker manager (#2373) Closed via #2289 * Fix showing file contents for record item (#2455) fix #2446 to test: create a schema with a field using files: % schema a % add hello /stdout display bundles in record mode % display record a * Fix long test startup times by using python:3.6.10-slim-buster for default test runs (#2449) Fixes #2388 by using python:3.6.10-slim-buster for default test runs. Following @nelson-liu 's suggestion in #2388 (comment). This image is still around ~40 MB (as opposed to the 5 GB default-cpu image). Time changes from ~20 mins -> 10 mins * Fix github actions on forks by not calling --push (#2457) This conditional expression was in the old `.travis.yml`, but it isn't there in the new github actions workflow. This PR adds that expression so that `--push` is not called from forks. Fixes #2456 * Prettier CLI Reference Doc (#2458) Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local> * Actually fix builds from forks by building docker images when needed (#2461) * Test worker restart in GHA (#2466) Resolves #2465 Adding back this test that we used to have in Travis Appended .log for to err log files Co-authored-by: Jing Ge <jingge2@illinois.edu> Co-authored-by: Nelson Liu <nelson-liu@users.noreply.github.com> Co-authored-by: yipenghe <yipenghe@stanford.edu> Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com> Co-authored-by: Jing Ge <stanford@Stanfords-MacBook-Pro.local> Co-authored-by: Tony Lee <tonyh.lee@yahoo.com> Co-authored-by: Percy Liang <percyliang@gmail.com>
Resolves #1115
.
, which will mount all content on the top-level of the dependencies path