Skip to content

Utilities augmenting the Python standard library; processes, Pytest, Pandas, Plotly, …

License

Notifications You must be signed in to change notification settings

runsascoded/utz

Repository files navigation

utz

("yoots"): utilities augmenting the Python standard library; processes, Pytest, Pandas, Plotly, …

Install

pip install utz
  • utz has one dependency, stdlb (wild-card standard library imports).
  • "Extras" provide optional deps (e.g. Pandas, Plotly, …).

Import: from utz import *

Jupyter

I often import utz.* in Jupyter notebooks:

from utz import *

This imports most standard library modules/functions (via stdlb), as well as the utz members below.

Python REPL

You can also import utz.* during Python REPL startup:

cat >~/.pythonrc <<EOF
try:
    from utz import *
    err("Imported utz")
except ImportError:
    err("Couldn't find utz")
EOF
export PYTHONSTARTUP=~/.pythonrc
# Configure for Python REPL in new Bash shells:
echo 'export PYTHONSTARTUP=~/.pythonrc' >> ~/.bashrc

Modules

Here are a few utz modules, in rough descending order of how often I use them:

utz.proc: subprocess wrappers; shell out commands, parse output

from utz.proc import *

# Run a command
run('git', 'commit', '-m', 'message')  # Commit staged changes

# Return `list[str]` of stdout lines
lines('git', 'log', '-n5', '--format=%h')  # Last 5 commit SHAs

# Verify exactly one line of stdout, return it
line('git', 'log', '-1', '--format=%h')  # Current HEAD commit SHA

# Return stdout as a single string
output('git', 'log', '-1', '--format=%B')  # Current HEAD commit message

# Check whether a command succeeds, suppress output
check('git', 'diff', '--exit-code', '--quiet')  # `True` iff there are no uncommitted changes

err("This will be output to stderr")

# Execute a "pipeline" of commands
pipeline(['seq 10', 'head -n5'])  # '1\n2\n3\n4\n5\n'

See also: test_proc.py.

utz.proc.aio: async subprocess wrappers

Async versions of most utz.proc helpers are also available:

from utz.proc.aio import *
import asyncio
from asyncio import gather

async def test():
  _1, _2, _3, nums = await gather(*[
      run('sleep', '1'),
      run('sleep', '2'),
      run('sleep', '3'),
      lines('seq', '10'),
  ])
  return nums

asyncio.run(test())
# ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

utz.collections: collection/list helpers

from utz.collections import *

# Verify a collection has one element, return it
singleton(["aaa"])         # "aaa"
singleton(["aaa", "bbb"])  # error

# `solo` is an alias for `singleton`; both also work on dicts, verifying and extracting a single "item" pair:
solo({'a': 1})  # ('a', 1)

# Filter by a predicate
solo([2, 3, 4], pred=lambda n: n % 2)  # 3
solo([{'a': 1}, {'b': 2}], pred=lambda o: 'a' in o)  # {'a': 1}

See also: test_collections.py.

utz.env: os.environ wrapper + contextmanager

from utz import env, os

# Temporarily set env vars
with env(FOO='bar'):
    assert os.environ['FOO'] == 'bar'

assert 'FOO' not in os.environ

The env() contextmanager also supports configurable on_conflict and on_exit kwargs, for handling env vars that were patched, then changed while the context was active.

See also test_env.py.

utz.fn: decorator/function utilities

utz.decos: compose decorators

from utz import decos
from click import option

common_opts = decos(
    option('-n', type=int),
    option('-v', is_flag=True),
)

@common_opts
def subcmd1(n: int, v: bool):
    ...

@common_opts
def subcmd2(n: int, v: bool):
    ...

utz.call: only pass expected kwargs to functions

from utz import call, wraps
def fn1(a, b):
    ...

@wraps(fn1)
def fn2(a, c, **kwargs):
    ...
kwargs = dict(a=11, b='22', c=33, d=44)
call(fn1, **kwargs)  # passes {a, b}, not {c, d}
call(fn2, **kwargs)  # passes {a, b, c}, not {d}

utz.jsn: JsonEncoder for datetimes, dataclasses

from utz import dataclass, Encoder, fromtimestamp, json  # Convenience imports from standard library
epoch = fromtimestamp(0)
print(json.dumps({ 'epoch': epoch }, cls=Encoder))
# {"epoch": "1969-12-31 19:00:00"}
print(json.dumps({ 'epoch': epoch }, cls=Encoder("%Y-%m-%d"), indent=2))
# {
#   "epoch": "1969-12-31"
# }

@dataclass
class A:
    n: int

print(json.dumps(A(111), cls=Encoder))
# {"n": 111}

See test_jsn.py for more examples.

utz.cli: click helpers

utz.cli provides wrappers around click.option for parsing common option formats:

  • @count: "count" options, including optional value mappings (e.g. -v → "info", -vv → "debug")
  • @multi: parse comma-delimited values (or other delimiter), with optional value-parse callback (e.g. -a1,2 -a3(1,2,3))
  • @num: parse numeric values, including human-readable SI/IEC suffixes (i.e. 10k10_000)
  • @obj: parse dictionaries from multi-value options (e.g. -eFOO=BAR -eBAZ=QUXdict(FOO="BAR", BAZ="QUX"))
# cli.py
from utz.cli import cmd, count, multi, num, obj
from typing import Literal

@cmd  # Alias for `click.command`
@multi('-a', '--arr', parse=int, help="Comma-separated integers")
@obj('-e', '--env', help='Env vars, in the form `k=v`')
@num('-m', '--max-memory', help='Max memory size (e.g. "100m"')
@count('-v', '--verbosity', values=['warn', 'info', 'debug'], help='0x: "warn", 1x: "info", 2x: "debug"')
def main(
    arr: tuple[int, ...],
    env: dict[str, str],
    max_memory: int,
    verbosity: Literal['warn', 'info', 'debug'],
):
    print(f"{arr} {env} {max_memory} {verbosity}")

if __name__ == '__main__':
    main()

Saving the above as cli.py and running will yield:

python cli.py -a1,2 -a3 -eAAA=111 -eBBB=222 -m10k
# (1, 2, 3) {'AAA': '111', 'BBB': '222'} 10000 warn
python cli.py -m 1Gi -v
# () {} 1073741824 info

See test_cli for more examples.

utz.mem: memray wrapper

Use memray to profile memory allocations, extract stats and peak memory use:

from utz.mem import Tracker
from utz import iec
with (tracker := Tracker()):
    nums = list(sorted(range(1_000_000, 0, -1)))

peak_mem = tracker.peak_mem
print(f'Peak memory use: {peak_mem:,} ({iec(peak_mem)})')
# Peak memory use: 48,530,432 (46.3 MiB)

utz.time: Time timer, now/today helpers

Time: timer class

from utz import Time, sleep

time = Time()
time("step 1")
sleep(1)
time("step 2")
sleep(1)
time()  # "close" "step 2"
print(f'Step 1 took {time["step 1"]:.1f}s, step 2 took {time["step 2"]:.1f}s.')
# Step 1 took 1.0s, step 2 took 1.0s.

# Can also be used as a contextmanager:
with time("run"):
    sleep(1)

print(f'Run took {time["run"]:.1f}s')
# Run took 1.0s

now, today

now and today are wrappers around datetime.datetime.now that expose convenient functions:

from utz import now, today
now()     # 2024-10-11T15:43:54Z
today()   # 2024-10-11
now().s   # 1728661583
now().ms  # 1728661585952

Use in conjunction with utz.bases codecs for easy timestamp-nonces:

from utz import b62, now
b62(now().s)   # A18Q1l
b62(now().ms)  # dZ3fYdS
b62(now().us)  # G31Cn073v

Sample values for various units and codecs:

unit b62 b64 b90
s A2kw7P +aYIh1 :Kn>H
ds R7FCrj D8oM9b "tn_BH
cs CCp7kK0 /UpIuxG =Fc#jK
ms dj4u83i MFSOKhy #8;HF8g
us G6cozJjWb 385u0dp8B D>$y/9Hr

(generated by time-slug-grid.py)

utz.size: humanize.naturalsize wrapper

iec wraps humanize.naturalsize, printing IEC-formatted sizes by default, to 3 sigfigs:

from utz import iec
iec(2**30 + 2**29 + 2**28 + 2**27)
# '1.88 GiB'

utz.hash_file: hash file contents

from utz import hash_file
hash_file("path/to/file")  # sha256 by default
hash_file("path/to/file", 'md5')

utz.ym: YM (year/month) class

The YM class represents a year/month, e.g. 202401 for January 2024.

from utz import YM
ym = YM(202501)  # Jan '25
assert ym + 1 == YM(202502)  # Add one month
assert YM(202502) - YM(202406) == 8  # subtract months
YM(202401).until(YM(202501))  # 202401, 202402, ..., 202412

# `YM` constructor accepts several representations:
assert all(ym == YM(202401) for ym in [
    YM(202401),
    YM('202401'),
    YM('2024-01'),
    YM(2024, 1),
    YM(y=2024, m=1),
    YM(dict(year=2022, month=12)),
    YM(YM(202401)),
])

utz.cd: "change directory" contextmanager

from utz import cd
with cd('..'):  # change to parent dir
    ...

utz.gzip: deterministic GZip helpers

from utz import deterministic_gzip_open, hash_file
with deterministic_gzip_open('a.gz', 'w') as f:
    f.write('\n'.join(map(str, range(10))))
hash_file('a.gz')  # dfbe03625c539cbc2a2331d806cc48652dd3e1f52fe187ac2f3420dbfb320504

See also: test_gzip.py.

utz.plot: Plotly helpers

Helpers for Plotly transformations I make frequently, e.g.:

from utz import plot
import plotly.express as px
fig = px.bar(x=[1, 2, 3], y=[4, 5, 6])
plot(
    fig,
    name='my-plot',  # Filename stem. will save my-plot.png, my-plot.json, optional my-plot.html
    title=['Some Title', 'Some subtitle'],  # Plot title, followed by "subtitle" line(s) (smaller font, just below)
    bg='white', xgrid='#ccc',  # white background, grey x-gridlines
    hoverx=True,  # show x-values on hover
    x="X-axis title",  # x-axis title or configs
    y=dict(title="Y-axis title", zerolines=True),  # y-axis title or configs
    # ...
)

Example usages: hudcostreets/nj-crashes, ryan-williams/arrayloader-benchmarks.

utz.setup: setup.py helper

utz/setup.py provides defaults for various setuptools.setup() params:

  • name: use parent directory name
  • version: parse from git tag (otherwise from git describe --tags)
  • install_requires: read requirements.txt
  • author_{name,email}: infer from last commit
  • long_description: parse README.md (and set long_description_content_type)
  • description: parse first <p> under opening <h1> from README.md
  • license: parse from LICENSE file (MIT and Apache v2 supported)

For an example, see gsmo==0.0.1 (and corresponding release).

This library also "self-hosts" using utz.setup; see pyproject.toml:

[build-system]
requires = ["setuptools", "utz[setup]==0.4.2", "wheel"]
build-backend = "setuptools.build_meta"

and setup.py:

from utz.setup import setup

extras_require = {
    # …
}

# Various fields auto-populated from git, README.md, requirements.txt, …
setup(
    name="utz",
    version="0.8.0",
    extras_require=extras_require,
    url="https://github.com/runsascoded/utz",
    python_requires=">=3.10",
)

The setup helper can be installed via a pip "extra":

pip install utz[setup]

utz.test: dataclass test cases, raises helper

utz.parametrize: pytest.mark.parametrize wrapper, accepts dataclass instances

from utz import parametrize
from dataclasses import dataclass


def fn(f: float, fmt: str) -> str:
    """Example function, to be tested with ``Case``s below."""
    return f"{f:{fmt}}"


@dataclass
class case:
    """Container for a test-case; float, format, and expected output."""
    f: float
    fmt: str
    expected: str

    @property
    def id(self):
        return f"fmt-{self.f}-{self.fmt}"


@parametrize(
    case(1.23, "0.1f", "1.2"),
    case(123.456, "0.1e", "1.2e+02"),
    case(-123.456, ".0f", "-123"),
)
def test_fn(f, fmt, expected):
    """Example test, "parametrized" by several ``Cases``s."""
    assert fn(f, fmt) == expected

test_parametrize.py contains more examples, customizing test "ID"s, adding parameter sweeps, etc.

utz.raises: pytest.raises wrapper, match a regex or multiple strings

Misc other modules:

  • bases: encode/decode in various bases (62, 64, 90, …)
  • escape: split/join on an arbitrary delimiter, with backslash-escaping; utz.esc escapes a specific character in a string.
  • ctxs: compose contextmanagers
  • o: dict wrapper exposing keys as attrs (e.g.: o({'a':1}).a == 1)
  • docker: DSL for programmatically creating Dockerfiles (and building images from them)
  • tmpdir: make temporary directories with a specific basename
  • ssh: SSH tunnel wrapped in a context manager
  • backoff: exponential-backoff utility
  • git: Git helpers, wrappers around GitPython
  • pnds: pandas imports and helpers

Examples / Users

Some repos that use utz:

About

Utilities augmenting the Python standard library; processes, Pytest, Pandas, Plotly, …

Resources

License

Stars

Watchers

Forks

Languages