Releases: pytask-dev/pytask
v0.5.1
What's Changed
This release contains mostly bug fixes but also the guide for complex task repetitions has been updated and includes some learnings from recent projects.
- [automated] Update plugin list by @github-actions in #615
- Fix interaction between provisional nodes and
@mark.persist
. by @tobiasraabe in #617 - Ensure that
root_dir
ofDirectoryNode
is created. by @tobiasraabe in #618 - tests: remove dependence on root folder name by @erooke in #620
- tests: make coiled an optional import by @erooke in #619
- Fix the pull request template. by @tobiasraabe in #621
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #610
- Bump peter-evans/create-pull-request from 6.0.5 to 6.1.0 in the github-actions group by @dependabot in #623
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #622
- Add warning when non-matching files are passed to pytask. by @tobiasraabe in #627
- Fix duplicated collection of task modules. by @tobiasraabe in #628
- Fix display issues with the programmatic interface. by @tobiasraabe in #631
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #629
- Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 in the github-actions group by @dependabot in #630
- Fix issue with rerunning tasks via
pytask.build
. by @tobiasraabe in #626 - Redesign the scaling tasks guide. by @tobiasraabe in #616
- Follow-up on #616. by @tobiasraabe in #632
New Contributors
Full Changelog: v0.5.0...v0.5.1
v0.5.0
Highlights
✨ pytask v0.5.0 is released and contains two big features. ✨
🤖 Provisional Nodes and Task Generators
pytask now has mechanisms to define tasks that produce an unknown number of products, which are provisional nodes. For example, when a task splits data by rows into chunks of roughly 100MB.
If you want further to process each chunk of data with a separate task, use task generators to create as many tasks as there are chunks.
Everything is explained in the how-to guide Provisional Nodes and Task Generators.
🚀 Support for HPC and your favorite cloud provider (AWS, GCP, Azure) using dask / coiled
pytask can build your project on virtual machines from the cloud provider of your choice. Read more about it in the release announcement of pytask-parallel v0.5.0.
Removals
- The markers
@pytask.mark.depends_on
and@pytask.mark.produces
are removed. Please define dependencies and products using the new approaches. - Instead of
@pytask.mark.task
usefrom pytask import task
and@task
.
What's Changed
- Fix type hints for
Task.execute()
andTaskWithoutPath.execute()
. by @tobiasraabe in #548 - [automated] Update plugin list by @github-actions in #550
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #549
- Remove
depends_on
andproduces
markers. by @tobiasraabe in #551 - Remove
@pytask.mark.task
and switch to@task
. by @tobiasraabe in #552 - [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #554
- Forbid strings as values for
paths
in config file. by @tobiasraabe in #553 - Use new-style hook wrappers and require pluggy 1.3. by @tobiasraabe in #555
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #556
- Fix issue with
@task(after=...)
in notebooks and terminals. by @tobiasraabe in #557 - Fix path in "Node is dir" error message by @egerlach in #561
- Bump the github-actions group with 2 updates by @dependabot in #560
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #559
- Make universal-pathlib 0.2.2 an official dependency. by @tobiasraabe in #566
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #562
- Bump the github-actions group with 1 update by @dependabot in #564
- Fix typos in defining_dependencies_products.md by @ChristianZimpelmann in #563
- Improve handling of
task_files
. by @tobiasraabe in #568 - Remove hooks related to the DAG. by @tobiasraabe in #569
- Remove redundant calls of
PNode.state()
. by @tobiasraabe in #571 - [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #573
- Remove
pytask_execute_create_scheduler
hook. by @tobiasraabe in #575 - Implement task generators and provisional nodes. by @tobiasraabe in #487
- Fix interaction with
--pdb
,--trace
and tasks that return. by @tobiasraabe in #579 - Simplify the code related to tracebacks. by @tobiasraabe in #581
- [automated] Update plugin list by @github-actions in #584
- Improve typing of
capture.py
. by @tobiasraabe in #587 - Improve linting. by @tobiasraabe in #586
- Reset class variables of
ExecutionReport
andTraceback
. by @tobiasraabe in #588 - Fix error introduced by #588. by @tobiasraabe in #590
- Invalidate cache to check whether remote files exist. by @tobiasraabe in #591
- Resolve root paths and module names for imported files. by @tobiasraabe in #589
- Use uv to speed up CI. by @tobiasraabe in #567
- Bump the github-actions group with 1 update by @dependabot in #578
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #577
- Recreate
PythonNode
s every time. by @tobiasraabe in #593 - Publish
NodeLoadError
. by @tobiasraabe in #594 - Manage project with rye. by @tobiasraabe in #596
- Bump peter-evans/create-pull-request from 6.0.2 to 6.0.3 in the github-actions group by @dependabot in #597
- Replace requests with httpx. by @tobiasraabe in #598
- Stop unwrapping coiled functions. by @tobiasraabe in #595
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #592
- Add fixture for changing cwd. by @tobiasraabe in #599
- Refactor tests using subprocesses. by @tobiasraabe in #600
- Bump peter-evans/create-pull-request from 6.0.3 to 6.0.4 in the github-actions group by @dependabot in #602
- [automated] Update plugin list by @github-actions in #601
- Fix example about capturing warnings in docs. by @tobiasraabe in #603
- Fix
PythonNode
examples in docs. by @tobiasraabe in #604 - Improve infra and add step for type checking. by @tobiasraabe in #605
- Bump peter-evans/create-pull-request from 6.0.4 to 6.0.5 in the github-actions group by @dependabot in #608
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #607
- Allow tasks to be pending. by @tobiasraabe in #609
- Remove status from
pytask_execute_task_log_start
. by @tobiasraabe in #611 - Validate the names of data catalogs. by @tobiasraabe in #612
- Improve documentation for data catalogs. by @tobiasraabe in #606
- [automated] Update plugin list by @github-actions in #613
New Contributors
- @egerlach made their first contribution in #561
- @ChristianZimpelmann made their first contribution in #563
Full Changelog: v0.4.5...v0.5.0
v0.4.7
What's Changed
A patch release to fix an error where using --pdb
or --trace
lead to tasks always returning None
which affected tasks that use returns as products.
- Backport #580 to v0.4.7. by @tobiasraabe in #580
Full Changelog: v0.4.6...v0.4.7
v0.4.6
What's Changed
A bug fix release that fixes an error when users used from pytask import mark
to mark their tasks.
- Skip collection of
MarkGenerator
when usingfrom pytask import mark
. by @tobiasraabe in #576
Full Changelog: v0.4.5...v0.4.6
v0.4.5
Highlights
pytask is moving towards v0.5, which will remove lots of deprecated features. Please upgrade your syntax or pin pytask to <0.5
in your requirements files.
The release comes with two new features and lots of little improvements.
-
pytask can now handle files that are referenced via HTTP(S) URLs or files on popular cloud storage like AWS, all thanks to universal_pathlib. Explanations can be found in this guide.
-
It is now easier to extend pytask than before. The new
hook_module
configuration option allows adding modules that contain hook implementations. This guide offers more explanation.
What's Changed
- CI: remove unneeded install of graphviz on ubuntu by @NickCrews in #515
- Raise error when non-existing task paths are added to the config. by @tobiasraabe in #517
- Do not allow builtin functions as tasks. by @tobiasraabe in #519
- Enhance issue templates. by @tobiasraabe in #522
- Refactor
get_file
. by @tobiasraabe in #523 - Improve some linter and formatter rules. by @tobiasraabe in #524
- Bump actions/setup-python from 4 to 5 by @dependabot in #527
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #526
- Enable
PathNode
andPickleNode
to deal with URLs, S3, etc.. by @tobiasraabe in #525 - Add error message for not collected tasks with
@task
decorator. by @tobiasraabe in #521 - Improve codecov and coverage. by @tobiasraabe in #528
- Bump sigstore/gh-action-sigstore-python from 2.1.0 to 2.1.1 by @dependabot in #533
- Bump actions/download-artifact from 3 to 4 by @dependabot in #531
- Bump actions/upload-artifact from 3 to 4 by @dependabot in #532
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #530
- Reenable and fix tests with Jupyter. by @tobiasraabe in #535
- Allow task functions to be partialed. by @tobiasraabe in #536
- Fix coverage. by @tobiasraabe in #537
- Change CLI entrypoint and allow passing task function to pytask.build. by @tobiasraabe in #540
- Refactor the plugin manager. by @tobiasraabe in #542
- Implement
hook_module
config option. by @tobiasraabe in #539 - Update imports in tests. by @tobiasraabe in #543
- Some changes to the docs. by @tobiasraabe in #538
- Require sqlalchemy>=2 and upgrade code. by @tobiasraabe in #544
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #541
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #546
- Release v0.4.5. by @tobiasraabe in #545
Full Changelog: v0.4.4...v0.4.5
v0.4.4
What's Changed
- Fix typing issues with the DataCatalog. by @tobiasraabe in #510
- [automated] Update plugin list by @github-actions in #511
- Improve the documentation. by @tobiasraabe in #509
Full Changelog: v0.4.3...v0.4.4
v0.4.3
What's Changed
This release contains a lot of smaller improvements and bug fixes. Here is a short list.
- #484 raises an error message when a
PathNode
was used with a directory instead a file. - #496 makes pytask even lazier. When a preceding task is executed and produces the same outputs, the following task will no longer be executed.
- Objects in task modules that overwrite
__getattr__
should not cause any problems anymore (#507 was fixed in #508). Same applies to importTask
in task modules.
Complete list of changes
- Simplify the teardown of a task. by @tobiasraabe in #483
- Correctly unconfigure pytask. by @tobiasraabe in #485
- Raise informative error when path nodes point to directories. by @tobiasraabe in #484
- Add default names to
PPathNode
s. by @tobiasraabe in #486 - Modernize
TopologicalSorter
. by @tobiasraabe in #458 - Raise error for invalid value in return annotation. by @tobiasraabe in #488
- Refactor and better test products. by @tobiasraabe in #489
- Refactor and better test parsing of dependencies. by @tobiasraabe in #490
- Addition to #489. by @tobiasraabe in #491
- Make pytask even lazier. by @tobiasraabe in #496
- Bump sigstore/gh-action-sigstore-python from 1.2.3 to 2.1.0 by @dependabot in #495
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #494
- Remove unnecessary code from the collection of tasks. by @tobiasraabe in #497
- Fix errors when using
Task
andTaskWithoutPath
in task modules. by @tobiasraabe in #498 - Allow tasks to depend on other tasks. by @tobiasraabe in #493
- Move test dependencies to
pyproject.toml
by @tobiasraabe in #500 - [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #499
- Remove
MetaNode
. by @tobiasraabe in #501 - [automated] Update plugin list by @github-actions in #505
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #504
- Catch objects pretending to be
PTask
. by @tobiasraabe in #508
Full Changelog: v0.4.2...v0.4.3
v0.4.2
Highlights
This release contains a new feature and some improvements for users.
- 🚀 The new feature is the
pytask.DataCatalog
that allows users to manage dependencies and products in projects more easily. Read the tutorial to get started. 🚀 - File changes are now detected by hashes instead of modification timestamps. It should prevent accidental executions when working with cloud storage providers like Dropbox or OneDrive and in many other situations. To save runtime, pytask uses a cache for the hashes when the modification timestamp has not changed.
- Nodes now have signatures that separate how nodes are named and displayed from how nodes are identified internally. If you have written a custom node, please update it according to the how-to guide.
- All of pytask's internal files are now stored in a
.pytask
folder in your project. The file.pytask.sqlite3
is moved to this location as well. Add.pytask
to your.gitignore
to prevent accidentally committing the folder.
What's Changed
- Simplify building the plugin manager. by @tobiasraabe in #449
- Rename
graph.py
todag_command.py
and improvecollect_command.py
. by @tobiasraabe in #451 - Remove more
.svg
s and replace them with animations. by @tobiasraabe in #454 - [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #452
- [automated] Update plugin list by @github-actions in #453
- Add more explanation when
PNode.load()
fails during execution. by @tobiasraabe in #455 - Refer to source code on Github in API docs. by @tobiasraabe in #456
- Refactor code for
format_node_name
. by @tobiasraabe in #457 - Add hook to sort
__all__
. by @tobiasraabe in #459 - Simplify removing internal tracebacks from exceptions with cause. by @tobiasraabe in #460
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #461
- Fix import error for pluggy<1.3. by @tobiasraabe in #462
- Raise error when function is defined outside the loop body. by @tobiasraabe in #463
- Improve pins. by @tobiasraabe in #464
- Test that internal tracebacks are removed by reports. by @tobiasraabe in #465
- Add
is_product
toPNode.load()
. by @tobiasraabe in #472 - Add a data catalog. by @tobiasraabe in #419
- Hash files instead of relying on modification timestamps. by @tobiasraabe in #469
- Move
.pytask.sqlite3
to.pytask/
. by @tobiasraabe in #470 - [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #471
- Update PyPI action. by @tobiasraabe in #477
- Add node signatures. by @tobiasraabe in #473
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #476
- Add snapshot tests. by @tobiasraabe in #475
- Switch from black to ruff-format. by @tobiasraabe in #478
- Rework reports and tracebacks. by @tobiasraabe in #474
- Give skips higher precendence than ancestor failed as outcome. by @tobiasraabe in #479
- Remove checks for missing root nodes. by @tobiasraabe in #480
- Improve coverage. by @tobiasraabe in #481
- Fix handling of names and signatures of
PythonNode
s. by @tobiasraabe in #482
Full Changelog: v0.4.1...v0.4.2
v0.4.1
What's Changed
Of course, it's a mandatory bug fix release after a bigger release.
Using the product annotation, Annotated[..., Product]
did not work with multiple products.
- Fix setting the name of
PythonNode
. by @tobiasraabe in #443 - Move content of
setup.cfg
topyproject.toml
. by @tobiasraabe in #444 - [automated] Update plugin list by @github-actions in #445
- Fix when multiple product annotations are used. by @tobiasraabe in #448
- Fix
PythonNode
when used as return. by @tobiasraabe in #446 - Simplify the
tree_map
code for generating the DAG. by @tobiasraabe in #447
Full Changelog: v0.4.0...v0.4.1
v0.4.0
News
pytask became three years old in July, which is a suitable event to rethink pytask's design and blow dust off of some of its oldest components.
Here are the highlights of v0.4.0 🚀 ⭐
Highlights
New interfaces for products.
Every argument can be declared as a product with the new' Product' annotation. The path can be passed as a default value.
from pathlib import Path
from pytask import Product
from typing_extensions import Annotated
def task_hello_earth(path: Annotated[Path, Product] = Path("hello_earth.txt")):
path.write_text("Hello, earth!")
More explanation can be found at https://tinyurl.com/yrezszr4.
It is also possible to use the return of the task function as a product, which allows wrapping any third-party function as a task function. Read more about it here: https://tinyurl.com/pytask-return.
from pathlib import Path
from pytask import Product
from typing_extensions import Annotated
def task_hello_earth() -> Annotated[str, Path("hello_earth.txt")]:
return "Hello, earth!"
Every task argument is a dependency
In older pytask versions, only paths were treated as task dependencies. That meant when you passed other arguments to the task, and they changed, it did not trigger a rerun of the task.
Now, every argument to a task can be a dependency, and you can hash them if they should trigger a rerun. It is explained in https://tinyurl.com/pytask-hash.
from pathlib import Path
from typing import Annotated
from pytask import Product
from pytask import PythonNode
def task_example(
text: Annotated[str, PythonNode(value="Hello, World", hash=True)],
path: Annotated[Path, Product] = Path("file.txt"),
) -> None:
path.write_text(text)
A new functional interface
The functional interface for pytask has been reworked and accepts a list of task functions. You can use it within your terminal or a Jupyter notebook. Read this guide to learn more about it: https://tinyurl.com/pytask-functional.
from pathlib import Path
from typing import Annotated
from pytask import build
def create_text() -> Annotated[str, Path("hello_earth.txt")]:
return "Hello, earth!"
session = build(tasks=[create_text])
Custom Nodes through Protocols
In the newest version, nodes (dependencies and products) and tasks follow protocols. It allows for customizations like PickleNode
s that store any Python object as a pickle file and inject the object into the task when used as a dependency. It is explained in more detail in this guide: https://tinyurl.com/pytask-custom-nodes.
Other notable changes
- Python 3.12 is supported, and support for Python 3.7 is dropped.
@pytask.mark.depends_on
and@pytask.mark.produces
are deprecated. There are better options to define dependencies and products explained in https://tinyurl.com/yrezszr4.@pytask.mark.task
is also deprecated and replaced byfrom pytask import task
and@task
.
What's Changed
- Remove Python 3.7 support and add a new action for mamba. by @tobiasraabe in #323
- Replace pony with sqlalchemy>=1.4.36. by @tobiasraabe in #387
- Remove
@pytask.mark.parametrize
. by @tobiasraabe in #391 - Parse dependencies from all args if
depends_on
is not used. by @tobiasraabe in #384 - Add products with
typing.Annotation
. by @tobiasraabe in #394 - Refactor pybaum to
_pytask.tree_util
. by @tobiasraabe in #395 - Replace pybaum with optree and add paths to PythonNode names. by @tobiasraabe in #396
- Add support for
NamedTuple
and attrs classes in@pytask.mark.task(kwargs=...)
. by @tobiasraabe in #397 - Deprecate decorators for
depends_on
andproduces
. by @tobiasraabe in #398 - Use protocols instead of ABCs. by @tobiasraabe in #402
- Allow tasks to return products. by @tobiasraabe in #404
- Tracking changes in v0.4.0. by @tobiasraabe in #400
- Bump peter-evans/create-pull-request from 5.0.1 to 5.0.2 by @dependabot in #390
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #388
- Allow to use prefix trees as nodes to parse function returns. by @tobiasraabe in #406
- Remove
.value
fromNode
protocol. by @tobiasraabe in #408 - Make
.from_annot
an optional feature of nodes. by @tobiasraabe in #409 - Allow to pass functions to
PythonNode(hash=...)
. by @tobiasraabe in #410 - Add protocols for tasks. by @tobiasraabe in #412
- Remove scripts to generate
.svg
s. by @tobiasraabe in #413 - Allow more ruff rules. by @tobiasraabe in #414
- A new functional interface. by @tobiasraabe in #411
- Deprecate
@pytask.mark.task
in favor of@pytask.task
. by @tobiasraabe in #417 - Simplify and fix code in
dag.py
. by @tobiasraabe in #418 - Convert
DeprecationWarning
toFutureWarning
for deprecated decorators. by @tobiasraabe in #420 - Remove deprecation warning for
produces
. by @tobiasraabe in #421 - Document new interface. by @tobiasraabe in #392
- Fix
import_path
. by @tobiasraabe in #424 - Publish
pytask.tree_util
. by @tobiasraabe in #426 - Fix type annotations of
task.depends_on
andtask.produces
. by @tobiasraabe in #427 - Document functional interface. by @tobiasraabe in #423
- Update example in
README.md
. by @tobiasraabe in #428 - Add better error message when
node.state()
throws error during DAG validation. by @tobiasraabe in #429 - Update parts of the documentation. by @tobiasraabe in #430
- Enable colors in WSL. by @tobiasraabe in #431
- Fix type checking for
pytask.mark.x
. by @tobiasraabe in #432 - Fix ids of
PythonNode
s. by @tobiasraabe in #433 - Add support for Python 3.12. by @tobiasraabe in #434
- Fix detection of task functions. by @tobiasraabe in #437
- Clarify some types. by @tobiasraabe in #438
- Refine typing. by @tobiasraabe in #440
Full Changelog: v0.3.2...v0.4.0