Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inject namespace files into the Execution context so that they are accessible from the execution's working directory #2405

Closed
Tracked by #2337
anna-geller opened this issue Oct 27, 2023 · 0 comments · Fixed by #2467
Assignees

Comments

@anna-geller
Copy link
Member

anna-geller commented Oct 27, 2023

The namespaceFiles property

We want to add an optional namespaceFiles property, which can be set either to:

  • a boolean — by default, this property is set to false; if set to true, we inject/load all Namespace Files from that specific namespace into the working directory of that task
  • or a map with include and exclude keys and values being a list of strings. Those strings are regular expressions allowing to include or exclude specific files or directories.

WorkingDirectory as a first task to include namespaceFiles property

First, we want to add that property to the WorkingDirectory task. Then, we can add the same property to other tasks in plugins listed in the section below. Example with the include pattern to only load namespace files from the scripts directory:

id: include_scripts
namespace: dev

tasks:
  - id: wdir
    type: io.kestra.core.tasks.flows.WorkingDirectory
    namespaceFiles:
      include:
        - "scripts/**"
    tasks:
      - id: myscript
        type: io.kestra.plugin.scripts.python.Commands
        commands:
          - python scripts/myscript.py

Simple example loading all namespace files into the task's working directory:

id: include_all_namespace_files
namespace: dev

tasks:
  - id: wdir
    type: io.kestra.core.tasks.flows.WorkingDirectory
    namespaceFiles: true
    tasks:
      - id: myscript
        type: io.kestra.plugin.scripts.python.Commands
        commands:
          - python scripts/myscript.py

Problem to be solved

We want to make it easy to orchestrate entire projects containing:

  • dbt core models and tests
  • SqlMesh
  • custom SQL queries
  • custom Python scripts and entire modules
  • Terraform modules
  • Ansible projects
  • Docker image builds
  • CloudQuery ingestion sync files
  • ...and many more.

Currently, users would need to store those in a Git repository and use a combination of WorkingDirectory and a git.Clone task.

To make that easier, we want to allow them to store all these projects as Namespace Files tied to a given namespace. Those files can be used as if you'd be working with a local directory. This way, it's possible to use as simple syntax as follows to orchestrate a Python script stored as a Namespace File in the directory scripts:

id: users
namespace: dev

tasks:
  - id: getUsers
    type: io.kestra.plugin.scripts.python.Commands
    docker:
      image: ghcr.io/kestra-io/pydata:latest
    commands:
      - python scripts/get_users.py
    namespaceFiles:
      include:
        - "scripts/**" 

image

Task types that will need the namespaceFiles property

Inject the namespace files only for the following tasks, as only those can benefit from them. The syntax may be implemented as described in this issue but on a task level #2341

id: myflow
namespace: dev

tasks:
  - id: dbt-build
    type: io.kestra.plugin.dbt.cli.DbtCLI
    runner: DOCKER
    docker:
      image: ghcr.io/kestra-io/dbt-duckdb
    commands:
      - dbt build
    namespaceFiles:
      include:
        - "dbt/**"  # fetch all files from internal storage from that repository
        - dbt_project.yml
        - profiles.yml
        - packages.yml
      exclude:
        - "dbt/*.py"

Where namespace files could be considered in the future, but not for now

Unsure about the usefulness of namespace files in compression tasks, but those compression + encryption tasks could potentially take a path to namespace files:

All upload tasks could, in theory, take a file from namespace files, but also here, I am not sure how useful that would be, as namespace files are meant for code, not data files:

Singer doesn't need namespace files.
Soda also doesn't.

@github-project-automation github-project-automation bot moved this to Backlog in All issues Oct 27, 2023
@anna-geller anna-geller moved this from Backlog to Ready in All issues Oct 27, 2023
@tchiotludo tchiotludo assigned tchiotludo and unassigned loicmathieu Nov 7, 2023
@tchiotludo tchiotludo moved this from Ready to In Progress in All issues Nov 7, 2023
@tchiotludo tchiotludo moved this from In Progress to Review in All issues Nov 7, 2023
tchiotludo added a commit that referenced this issue Nov 8, 2023
close #2405

Co-authored-by: Anna Geller <anna.m.geller@gmail.com>
@github-project-automation github-project-automation bot moved this from Review to Done in All issues Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-scripts that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-cloudquery that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-terraform that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-dataform that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-sqlmesh that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-ansible that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-azure that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-modal that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-dbt that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-aws that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-gcp that referenced this issue Nov 8, 2023
tchiotludo added a commit to kestra-io/plugin-docker that referenced this issue Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants