Skip to content

Latest commit

 

History

History
388 lines (317 loc) · 11.6 KB

README.md

File metadata and controls

388 lines (317 loc) · 11.6 KB

utz

("yoots"): utilities I've missed in the Python standard library, Pytest, Pandas, Plotly, …

Install

pip install utz
  • utz has one dependency, stdlb (wild-card standard library imports).
  • "Extras" provide optional deps (e.g. Pandas, Plotly, …).

Use

I usually do this at the top of Jupyter notebooks:

from utz import *

This imports most standard library modules/functions (via stdlb), as well as the utz members below.

Below are a few modules, in rough order of how often I use them:

utz.process: subprocess wrappers; shell out to commands, parse output

from utz.process import *

# Run a command
run('git', 'commit', '-m', 'message')  # Commit staged changes

# Return `list[str]` of stdout lines
lines('git', 'log', '-n5', '--format=%h')  # Last 5 commit SHAs

# Verify exactly one line of stdout, return it
line('git', 'log', '-1', '--format=%h')  # Current HEAD commit SHA

# Return stdout as a single string
output('git', 'log', '-1', '--format=%B')  # Current HEAD commit message

# Check whether a command succeeds, suppress output
check('git', 'diff', '--exit-code', '--quiet')  # `True` iff there are no uncommitted changes

err("This will be output to stderr")

# Execute a "pipeline" of commands
pipeline(['seq 10', 'head -n5'])  # '1\n2\n3\n4\n5\n'

# Diff two command pipelines, e.g. compare lines/words/chars in a gzipped CSV, at Git HEAD vs. worktree:
cmds = ['gunzip -c', 'wc']
file = 'foo.csv.gz'
diff_cmds(
    [f'git show HEAD:{file}', *cmds],
    [f'cat {file}', *cmds]
)

diff_cmds is also exposed as a CLI, diff-x:

# Diff the contents of two `.gz` files
seq 10 | gzip -c > a.gz
seq 2 12 | gzip -c > b.gz
diff-x 'gunzip -c' {a,b}.gz
# 1d0
# < 1
# 10a10,11
# > 11
# > 12

# Pass multiple commands to create a pipeline:
diff-x 'gunzip -c' 'head -n5' {a,b}.gz
# 1d0
# < 1
# 5a5
# > 6

See also: test_process.py.

utz.collections: Collection/list helpers

from utz.collections import *

# Verify a collection has one element, return it
singleton(["aaa"])         # "aaa"
singleton(["aaa", "bbb"])  # error

See also: test_collections.py.

utz.cd: "change directory" contextmanager

from utz import cd
with cd('..'):  # change to parent dir
    ...

utz.fn: decorator/function utilities

utz.decos: compose decorators

from utz import decos
from click import option

common_opts = decos(
    option('-n', type=int),
    option('-v', is_flag=True),
)

@common_opts
def subcmd1(n: int, v: bool):
    ...

@common_opts
def subcmd2(n: int, v: bool):
    ...

utz.call: only pass expected kwargs to functions

from utz import call, wraps
def fn1(a, b):
    ...

@wraps(fn1)
def fn2(a, c, **kwargs):
    ...
kwargs = dict(a=11, b='22', c=33, d=44)
call(fn1, **kwargs)  # passes {a, b}, not {c, d}
call(fn2, **kwargs)  # passes {a, b, c}, not {d}

utz.gzip: deterministic GZip helpers

from utz import deterministic_gzip_open, hash_file
with deterministic_gzip_open('a.gz', 'w') as f:
    f.write('\n'.join(map(str, range(10))))
hash_file('a.gz')  # dfbe03625c539cbc2a2331d806cc48652dd3e1f52fe187ac2f3420dbfb320504

See also: test_gzip.py.

utz.plot: Plotly helpers

Helpers for Plotly transformations I make frequently, e.g.:

from utz import plot
import plotly.express as px
fig = px.bar(x=[1, 2, 3], y=[4, 5, 6])
plot(
    fig,
    name='my-plot',  # Filename stem. will save my-plot.png, my-plot.json, optional my-plot.html
    title=['Some Title', 'Some subtitle'],  # Plot title, followed by "subtitle" line(s) (smaller font, just below)
    bg='white', xgrid='#ccc',  # white background, grey x-gridlines
    hoverx=True,  # show x-values on hover
    x="X-axis title",  # x-axis title or configs
    y=dict(title="Y-axis title", zerolines=True),  # y-axis title or configs
    # ...
)

Example usages: hudcostreets/nj-crashes, ryan-williams/arrayloader-benchmarks.

utz.setup: setup.py helper

utz/setup.py provides defaults for various setuptools.setup() params:

  • name: use parent directory name
  • version: parse from git tag (otherwise from git describe --tags)
  • install_requires: read requirements.txt
  • author_{name,email}: infer from last commit
  • long_description: parse README.md (and set long_description_content_type)
  • description: parse first <p> under opening <h1> from README.md
  • license: parse from LICENSE file (MIT and Apache v2 supported)

For an example, see gsmo==0.0.1 (and corresponding release).

This library also "self-hosts" using utz.setup; see pyproject.toml:

[build-system]
requires = ["setuptools", "utz[setup]==0.4.2", "wheel"]
build-backend = "setuptools.build_meta"

and setup.py:

from utz.setup import setup

extras_require = {
    # …
}

# Various fields auto-populated from git, README.md, requirements.txt, …
setup(
    name="utz",
    version="0.8.0",
    extras_require=extras_require,
    url="https://github.com/runsascoded/utz",
    python_requires=">=3.10",
)

The setup helper can be installed via a pip "extra":

pip install utz[setup]

utz.test: dataclass test cases, raises helper

utz.parametrize: pytest.mark.parametrize wrapper, accepts dataclass instances

from utz import parametrize
from dataclasses import dataclass


def fn(f: float, fmt: str) -> str:
    """Example function, to be tested with ``Case``s below."""
    return f"{f:{fmt}}"


@dataclass
class case:
    """Container for a test-case; float, format, and expected output."""
    f: float
    fmt: str
    expected: str

    @property
    def id(self):
        return f"fmt-{self.f}-{self.fmt}"


@parametrize(
    case(1.23, "0.1f", "1.2"),
    case(123.456, "0.1e", "1.2e+02"),
    case(-123.456, ".0f", "-123"),
)
def test_fn(f, fmt, expected):
    """Example test, "parametrized" by several ``Cases``s."""
    assert fn(f, fmt) == expected

test_parametrize.py contains more examples, customizing test "ID"s, adding parameter sweeps, etc.

utz.raises: pytest.raises wrapper, match a regex or multiple strings

utz.time: now/today helpers

now and today are wrappers around datetime.datetime.now that expose convenient functions:

from utz import now, today
now()     # 2024-10-11T15:43:54Z
today()   # 2024-10-11
now().s   # 1728661583
now().ms  # 1728661585952

Use in conjunction with utz.bases codecs for easy timestamp-nonces:

from utz import b62, now
b62(now().s)   # A18Q1l
b62(now().ms)  # dZ3fYdS
b62(now().us)  # G31Cn073v

Sample values for various units and codecs:

unit b62 b64 b90
s A18RXZ +a/I/7 :?98>
ds R0165M D3KFIY "sJh_?
cs CBp0oXI /TybqKo =8d'#K
ms dZ3no2f M6vLchJ #6cRfBj
us G31ExCseD 360KU8v9V D,f`6&uX

(generated by time-slug-grid.py)

from utz import hash_file
hash_file("path/to/file")  # sha256 by default
hash_file("path/to/file", 'md5')

Misc other modules:

  • o: dict wrapper exposing keys as attrs (e.g.: o({'a':1}).a == 1)
  • docker: DSL for programmatically creating Dockerfiles (and building images from them)
  • bases: encode/decode in various bases (62, 64, 90, …)
  • tmpdir: make temporary directories with a specific basename
  • escape: split/join on an arbitrary delimiter, with backslash-escaping
  • ssh: SSH tunnel wrapped in a context manager
  • backoff: exponential-backoff utility
  • git: Git helpers, wrappers around GitPython
  • pnds: pandas imports and helpers
  • ctxs: compose contextmanagers