Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] dependency: fine grained (user cmd filter) #4363

Closed
wants to merge 11 commits into from

Conversation

casperdcl
Copy link
Contributor

@casperdcl casperdcl commented Aug 8, 2020

  • modify schema
    • dvc.yaml
    • dvc.lock
  • support running (from dvc.yaml:stages.<stage>.deps.[].<path>.cmd)
  • add to dvc.lock
  • add way to add user cmd filter via CLI API
    • maybe dvc run -d "utils.py:extract_function.py --name check_db"
  • add tests
  • fixes Support function specific dependencies #3439

Note that this implementation

  • uses PARAM_FILTER = "cmd"
  • passes path as a positional argument to the user-defined cmd
  • computes the md5sum of the user-defined cmd output
  • only works on local files (not dirs & not remote paths)
  • only works for dependencies (not outputs)
  • would produce the old behaviour (whole-file hash) if setting cmd: cat
  • was tested using https://github.com/casperdcl/dvc-udf

schema:

# in dvc.yaml
utils.py:
  cmd: python extract_function.py --name check_db

# in dvc.lock
path: utils.py
cmd: python extract_function.py --name check_db
md5: s0m3h45h # computed via `{cmd} {path} | md5sum`

testing:

git clone https://github.com/casperdcl/dvc-udf
cd dvc-udf
pip install -r requirements.txt
dvc repro -f -v out  # assuming dvc installed from this PR

@casperdcl casperdcl marked this pull request as draft August 8, 2020 22:26
@casperdcl casperdcl self-assigned this Aug 8, 2020
@casperdcl casperdcl requested a review from efiop August 8, 2020 22:26
@casperdcl casperdcl added enhancement Enhances DVC research ui user interface / interaction labels Aug 8, 2020
@@ -58,6 +62,21 @@ def file_md5(fname, tree=None):
open_func = open

if exists_func(fname):
filtered = None
if cmd:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if there are aspects of stage.run.cmd_run which should be used here

Comment on lines +102 to +105
if filtered is not None:
from dvc.utils.fs import remove

remove(filtered)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is required - maybe automatically handled elsewhere (i.e. entire tmpdir deleted before exit)

@casperdcl casperdcl force-pushed the user_filter branch 2 times, most recently from a469eb4 to aa4cd31 Compare August 8, 2020 22:47


def _get(stage, p, info):
if isinstance(p, dict):
p = list(p.items())
assert len(p) == 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe assert not required (should be handled by schema)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other CLI commands can create/load dependencies, skipping the schema. Good to have an assert.

@efiop efiop changed the title dependency: fine grained (user cmd filter) [WIP] dependency: fine grained (user cmd filter) Aug 9, 2020
@efiop
Copy link
Contributor

efiop commented Mar 23, 2021

Closing for now, we'll get back after dep/out refactor to properly accommodate this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC research ui user interface / interaction
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support function specific dependencies
3 participants