Skip to content

Releases: pytask-dev/pytask

v0.5.1

20 Jul 12:14
Compare
Choose a tag to compare

What's Changed

This release contains mostly bug fixes but also the guide for complex task repetitions has been updated and includes some learnings from recent projects.

New Contributors

Full Changelog: v0.5.0...v0.5.1

v0.5.0

26 May 21:23
Compare
Choose a tag to compare

Highlights

✨ pytask v0.5.0 is released and contains two big features. ✨

🤖 Provisional Nodes and Task Generators

pytask now has mechanisms to define tasks that produce an unknown number of products, which are provisional nodes. For example, when a task splits data by rows into chunks of roughly 100MB.

If you want further to process each chunk of data with a separate task, use task generators to create as many tasks as there are chunks.

Everything is explained in the how-to guide Provisional Nodes and Task Generators.

🚀 Support for HPC and your favorite cloud provider (AWS, GCP, Azure) using dask / coiled

pytask can build your project on virtual machines from the cloud provider of your choice. Read more about it in the release announcement of pytask-parallel v0.5.0.

Removals

  • The markers @pytask.mark.depends_on and @pytask.mark.produces are removed. Please define dependencies and products using the new approaches.
  • Instead of @pytask.mark.task use from pytask import task and @task.

What's Changed

New Contributors

Full Changelog: v0.4.5...v0.5.0

v0.4.7

19 Mar 19:22
72f7289
Compare
Choose a tag to compare

What's Changed

A patch release to fix an error where using --pdb or --trace lead to tasks always returning None which affected tasks that use returns as products.

Full Changelog: v0.4.6...v0.4.7

v0.4.6

13 Mar 16:08
19325a1
Compare
Choose a tag to compare

What's Changed

A bug fix release that fixes an error when users used from pytask import mark to mark their tasks.

  • Skip collection of MarkGenerator when using from pytask import mark. by @tobiasraabe in #576

Full Changelog: v0.4.5...v0.4.6

v0.4.5

09 Jan 20:17
c7f3da8
Compare
Choose a tag to compare

Highlights

pytask is moving towards v0.5, which will remove lots of deprecated features. Please upgrade your syntax or pin pytask to <0.5 in your requirements files.

The release comes with two new features and lots of little improvements.

  • pytask can now handle files that are referenced via HTTP(S) URLs or files on popular cloud storage like AWS, all thanks to universal_pathlib. Explanations can be found in this guide.

  • It is now easier to extend pytask than before. The new hook_module configuration option allows adding modules that contain hook implementations. This guide offers more explanation.

What's Changed

Full Changelog: v0.4.4...v0.4.5

v0.4.4

04 Dec 20:43
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.4.3...v0.4.4

v0.4.3

01 Dec 12:51
a06d948
Compare
Choose a tag to compare

What's Changed

This release contains a lot of smaller improvements and bug fixes. Here is a short list.

  • #484 raises an error message when a PathNode was used with a directory instead a file.
  • #496 makes pytask even lazier. When a preceding task is executed and produces the same outputs, the following task will no longer be executed.
  • Objects in task modules that overwrite __getattr__ should not cause any problems anymore (#507 was fixed in #508). Same applies to import Task in task modules.

Complete list of changes

Full Changelog: v0.4.2...v0.4.3

v0.4.2

08 Nov 11:57
7592664
Compare
Choose a tag to compare

Highlights

This release contains a new feature and some improvements for users.

  • 🚀 The new feature is the pytask.DataCatalog that allows users to manage dependencies and products in projects more easily. Read the tutorial to get started. 🚀
  • File changes are now detected by hashes instead of modification timestamps. It should prevent accidental executions when working with cloud storage providers like Dropbox or OneDrive and in many other situations. To save runtime, pytask uses a cache for the hashes when the modification timestamp has not changed.
  • Nodes now have signatures that separate how nodes are named and displayed from how nodes are identified internally. If you have written a custom node, please update it according to the how-to guide.
  • All of pytask's internal files are now stored in a .pytask folder in your project. The file .pytask.sqlite3 is moved to this location as well. Add .pytask to your .gitignore to prevent accidentally committing the folder.

What's Changed

Full Changelog: v0.4.1...v0.4.2

v0.4.1

11 Oct 08:16
ce6a825
Compare
Choose a tag to compare

What's Changed

Of course, it's a mandatory bug fix release after a bigger release.

Using the product annotation, Annotated[..., Product] did not work with multiple products.

Full Changelog: v0.4.0...v0.4.1

v0.4.0

07 Oct 18:18
b191559
Compare
Choose a tag to compare

News

pytask became three years old in July, which is a suitable event to rethink pytask's design and blow dust off of some of its oldest components.

Here are the highlights of v0.4.0 🚀 ⭐

Highlights

New interfaces for products.

Every argument can be declared as a product with the new' Product' annotation. The path can be passed as a default value.

from pathlib import Path

from pytask import Product
from typing_extensions import Annotated


def task_hello_earth(path: Annotated[Path, Product] = Path("hello_earth.txt")):
    path.write_text("Hello, earth!")

More explanation can be found at https://tinyurl.com/yrezszr4.

It is also possible to use the return of the task function as a product, which allows wrapping any third-party function as a task function. Read more about it here: https://tinyurl.com/pytask-return.

from pathlib import Path

from pytask import Product
from typing_extensions import Annotated


def task_hello_earth() -> Annotated[str, Path("hello_earth.txt")]:
    return "Hello, earth!"

Every task argument is a dependency

In older pytask versions, only paths were treated as task dependencies. That meant when you passed other arguments to the task, and they changed, it did not trigger a rerun of the task.

Now, every argument to a task can be a dependency, and you can hash them if they should trigger a rerun. It is explained in https://tinyurl.com/pytask-hash.

from pathlib import Path
from typing import Annotated

from pytask import Product
from pytask import PythonNode


def task_example(
    text: Annotated[str, PythonNode(value="Hello, World", hash=True)],
    path: Annotated[Path, Product] = Path("file.txt"),
) -> None:
    path.write_text(text)

A new functional interface

The functional interface for pytask has been reworked and accepts a list of task functions. You can use it within your terminal or a Jupyter notebook. Read this guide to learn more about it: https://tinyurl.com/pytask-functional.

from pathlib import Path
from typing import Annotated

from pytask import build


def create_text() -> Annotated[str, Path("hello_earth.txt")]:
    return "Hello, earth!"


session = build(tasks=[create_text])

Custom Nodes through Protocols

In the newest version, nodes (dependencies and products) and tasks follow protocols. It allows for customizations like PickleNodes that store any Python object as a pickle file and inject the object into the task when used as a dependency. It is explained in more detail in this guide: https://tinyurl.com/pytask-custom-nodes.

Other notable changes

  • Python 3.12 is supported, and support for Python 3.7 is dropped.
  • @pytask.mark.depends_on and @pytask.mark.produces are deprecated. There are better options to define dependencies and products explained in https://tinyurl.com/yrezszr4.
  • @pytask.mark.task is also deprecated and replaced by from pytask import task and @task.

What's Changed

Full Changelog: v0.3.2...v0.4.0