GitHub - fairtracks/omnipy: Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration (under development)

Omnipy is a type-driven Python library for:

data conversion, parsing and wrangling
tool and web service interoperability, and
scalable dataflow orchestration

Why use Omnipy?

Dataflows, Not Workflows

Traditional workflows rely on command-line tools and intermediate files, adding complexity to data pipelines. Omnipy replaces this with dataflows that operate directly in memory or on standard formats like JSON or CSV. Built on Pydantic models, Omnipy enhances data parsing, conversion, and serialization for structured data processing.

"It's Static Typing!"… "It's Dynamic!"… "It's Omnipy!"

Omnipy blends Python’s dynamic typing with runtime type safety. Models behave like native Python structures while ensuring type guarantees without the rigidity of static typing. Defined in Python, Omnipy models can be as general or specific as needed.

Parse, Don’t Validate

Strict validation often breaks pipelines when data is messy. Inspired by "Parse, don't validate", Omnipy eagerly parses input into structured models that retain integrity throughout the pipeline. This approach aligns with the Robustness Principle: "be liberal in what you accept, and conservative in what you send!"

Self-Constraining Data Models

Omnipy models aren’t just one-time validators. A Model[list[int]]() behaves like a list but ensures its elements are always integers. Every modification parses data to enforce integrity, rolling back invalid operations automatically.

Omnify Your Data Pipelines

Omnipy invites you to "omnify" pipelines — break them into reusable, universal components. By defining dataflows and tasks with structured input and output models, Omnipy simplifies reuse and promotes good coding practices, improving maintainability as projects grow.

Catalog of Components for Interoperability

Omnipy includes components for tasks like asynchronous API requests with rate limiting, parsing JSON or tabular data, and flattening nested data into relational tables. Integration with REST APIs and data wrangling/analysis tools like Pandas simplifies interoperability across diverse systems. Expect the catalog to grow as the community expands!

Built to Scale

Omnipy’s hierarchical Dataset structure simplifies batch processing of directory-based data, including parsing, serialization, and metadata handling. With built-in Prefect support, Omnipy scales seamlessly from local experiments to distributed deployment, meeting the demands of projects large and small.

Installing Omnipy

Make sure that your Python version is between 3.10 and 3.12 (Python 3.13 is not yet supported), e.g.:
```
$ python --version
Python 3.10.14
```
Create and activate a virtual environment for your project, e.g.:
```
$ python -m venv myproject
$ source myproject/bin/activate
```
TIP:
- If you need help with setting up a virtual environment, check out the relevant section in the FastAPI documentation. (Please note that Omnipy does not depend on FastAPI, it is just that their documentation is excellent!)
- If you are using Omnipy in a Jupyter notebook, you can most likely skip this step.
Install Omnipy using:
```
$ pip install omnipy
```

Getting started

Text to come soon.

Running example scripts

Install omnipy-examples:
- pip install omnipy-examples
Example script:
- omnipy-examples isajson
For help on the command line interface:
- omnipy-examples --help
For help on a particular example:
- omnipy-examples isajson --help

Output of flow runs

The output will by default appear in the data directory, with a timestamp.

It is recommended to install a file viewer that are capable of browsing tar.gz files. For instance, the "File Expander" plugin in PyCharm is excellent for this.
To unpack the compressed files of a run on the command line (just make sure to replace the datetime string from this example):
```
for f in $(ls data/2023_02_03-12_51_51/*.tar.gz); do mkdir ${f%.tar.gz}; tar xfzv $f -C ${f%.tar.gz}; done
```

Run with the Prefect engine

Omnipy is integrated with the powerful Prefect dataflow orchestration library.

To run an example using the prefect engine, e.g.:
- omnipy-examples --engine prefect isajson
After completion of some runs, you can check the flow logs and orchestration options in the Prefect UI:
- prefect server start

More info on Prefect configuration will come soon.

Name		Name	Last commit message	Last commit date
Latest commit History 1,241 Commits
.github/workflows		.github/workflows
docker/jupyter_omnipy		docker/jupyter_omnipy
docs		docs
scripts		scripts
src/omnipy		src/omnipy
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
mkdocs.yml		mkdocs.yml
pycharm-file-watchers.xml		pycharm-file-watchers.xml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why use Omnipy?

Installing Omnipy

Getting started

Running example scripts

Output of flow runs

Run with the Prefect engine

About

Releases

Packages

Contributors 6

Languages

License

fairtracks/omnipy

Folders and files

Latest commit

History

Repository files navigation

Why use Omnipy?

Installing Omnipy

Getting started

Running example scripts

Output of flow runs

Run with the Prefect engine

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages