Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline project meta #776

Open
dmpetrov opened this issue Jan 3, 2025 · 5 comments
Open

Inline project meta #776

dmpetrov opened this issue Jan 3, 2025 · 5 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@dmpetrov
Copy link
Member

dmpetrov commented Jan 3, 2025

Description

To increase scripts reusability, it would be great to have all project meta in one place.

An alternative approach - using https://peps.python.org/pep-0723/

# __pyproject__
"""
[project]
requires-python = ">=3.10"
dependencies = [
    "moviepy < 2.0"
]

[tool.datachain.inputs]
input_bucket = "gs://mybucket"
input_dir = "examples/videos/"

[tool.datachain.params]
min_length_sec = 1
cache_input = False

[tool.datachain.attachments]
yolo_model="gs://mybucket/share/yolov8m_holding_v4.pt"


[tool.datachain.outputs]
result_dataset = "ds://res"
result_dir = "{input_bucket}/temp"
"""

import io
from datachain import DataChain, File, C, inline
...

bucket = inline.get_input("input_bucket")
dir = inline.get_input("input_dir")
result_dataset = inline.get_output("result_dataset")
duration_limit = inline.get_params("min_length_sec")
@dmpetrov dmpetrov added enhancement New feature or request question Further information is requested labels Jan 3, 2025
@ilongin
Copy link
Contributor

ilongin commented Jan 7, 2025

On the first look, I like this approach over script metadata.
We can also add ability to upload this config file separately in Studio UI when running scripts, as users could have it saved somewhere and reuse it on every script run.
So user would have 3 options:

  1. Inline config in script itself as in example
  2. Upload config separately
  3. Add config values manually as it's done now -> this should be transformed again to config file / format in the backend.

WDYT?

@dmpetrov
Copy link
Member Author

dmpetrov commented Jan 7, 2025

We should prioritize a unified approach that works in both - UI as well as CLI. User should be able to copy-past from CLI to SaaS or opposite without even thinking about this 🙂

So, it's ok to have a customization in UI (as we have it now) but I'd try to deprioritize it and potentially get rid ot it completely.

@dmpetrov
Copy link
Member Author

dmpetrov commented Jan 8, 2025

uv run myscript.py natively supports the comment way - PEP 723. It looks really good!

image

https://www.linkedin.com/posts/julienhuraultanalytics_uv-is-so-cool-i-just-discovered-how-to-activity-7282422214799286272-WV5N

@shcheklein
Copy link
Member

that looks cool! :)

@dmpetrov
Copy link
Member Author

I've tested this - works like a charm.

More details:

  • 👍 Installing packages works fast as usual with uv
  • 👎 Preparing packages/"compiling" takes a lot of time in the firs run.
  • 👍 Caching - Following runs do not require installation or preporation.
# /// script
# requires-python = ">=3.12"
# dependencies = [
#   "pandas < 2.1.0"
# ]
# ///

import sys
import pandas as pd

print(f"Python version: {sys.version_info}")
print(f"Pandas version: {pd.__version__}")

First run:

$ time uv run test_inline.py
Reading inline script metadata from `test_inline.py`
Installed 6 packages in 114ms
Python version: sys.version_info(major=3, minor=13, micro=1, releaselevel='final', serial=0)
Pandas version: 2.2.3

real    0m17.804s
user    0m2.084s
sys     0m1.277s

Next run:

$ time uv run test_inline.py
Reading inline script metadata from `test_inline.py`
Python version: sys.version_info(major=3, minor=13, micro=1, releaselevel='final', serial=0)
Pandas version: 2.2.3

real    0m0.443s
user    0m0.344s
sys     0m0.087s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants