("yoots"): utilities augmenting the Python standard library; processes, Pytest, Pandas, Plotly, …
- Install
- Import:
from utz import *
- Modules
utz.proc
:subprocess
wrappers; shell out commands, parse outpututz.collections
: collection/list helpersutz.env
:os.environ
wrapper +contextmanager
utz.fn
: decorator/function utilitiesutz.jsn
:JsonEncoder
for datetimes,dataclasses
utz.cli
:click
helpersutz.mem
: memray wrapperutz.time
:Time
timer,now
/today
helpersutz.size
:humanize.naturalsize
wrapperutz.hash_file
: hash file contentsutz.ym
:YM
(year/month) classutz.cd
: "change directory" contextmanagerutz.gzip
: deterministic GZip helpersutz.plot
: Plotly helpersutz.setup
:setup.py
helperutz.test
:dataclass
test cases,raises
helperutz.docker
,utz.tmpdir
, etc.
- Examples / Users
pip install utz
utz
has one dependency,stdlb
(wild-card standard library imports).- "Extras" provide optional deps (e.g. Pandas, Plotly, …).
I often import utz.*
in Jupyter notebooks:
from utz import *
This imports most standard library modules/functions (via stdlb
), as well as the utz
members below.
You can also import utz.*
during Python REPL startup:
cat >~/.pythonrc <<EOF
try:
from utz import *
err("Imported utz")
except ImportError:
err("Couldn't find utz")
EOF
export PYTHONSTARTUP=~/.pythonrc
# Configure for Python REPL in new Bash shells:
echo 'export PYTHONSTARTUP=~/.pythonrc' >> ~/.bashrc
Here are a few utz
modules, in rough descending order of how often I use them:
utz.proc
: subprocess
wrappers; shell out commands, parse output
from utz.proc import *
# Run a command
run('git', 'commit', '-m', 'message') # Commit staged changes
# Return `list[str]` of stdout lines
lines('git', 'log', '-n5', '--format=%h') # Last 5 commit SHAs
# Verify exactly one line of stdout, return it
line('git', 'log', '-1', '--format=%h') # Current HEAD commit SHA
# Return stdout as a single string
output('git', 'log', '-1', '--format=%B') # Current HEAD commit message
# Check whether a command succeeds, suppress output
check('git', 'diff', '--exit-code', '--quiet') # `True` iff there are no uncommitted changes
err("This will be output to stderr")
# Execute a "pipeline" of commands
pipeline(['seq 10', 'head -n5']) # '1\n2\n3\n4\n5\n'
See also: test_proc.py
.
utz.proc.aio
: async subprocess
wrappers
Async versions of most utz.proc
helpers are also available:
from utz.proc.aio import *
import asyncio
from asyncio import gather
async def test():
_1, _2, _3, nums = await gather(*[
run('sleep', '1'),
run('sleep', '2'),
run('sleep', '3'),
lines('seq', '10'),
])
return nums
asyncio.run(test())
# ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
utz.collections
: collection/list helpers
from utz.collections import *
# Verify a collection has one element, return it
singleton(["aaa"]) # "aaa"
singleton(["aaa", "bbb"]) # error
# `solo` is an alias for `singleton`; both also work on dicts, verifying and extracting a single "item" pair:
solo({'a': 1}) # ('a', 1)
# Filter by a predicate
solo([2, 3, 4], pred=lambda n: n % 2) # 3
solo([{'a': 1}, {'b': 2}], pred=lambda o: 'a' in o) # {'a': 1}
See also: test_collections.py
.
utz.env
: os.environ
wrapper + contextmanager
from utz import env, os
# Temporarily set env vars
with env(FOO='bar'):
assert os.environ['FOO'] == 'bar'
assert 'FOO' not in os.environ
The env()
contextmanager also supports configurable on_conflict
and on_exit
kwargs, for handling env vars that were patched, then changed while the context was active.
See also test_env.py
.
utz.fn
: decorator/function utilities
from utz import decos
from click import option
common_opts = decos(
option('-n', type=int),
option('-v', is_flag=True),
)
@common_opts
def subcmd1(n: int, v: bool):
...
@common_opts
def subcmd2(n: int, v: bool):
...
from utz import call, wraps
def fn1(a, b):
...
@wraps(fn1)
def fn2(a, c, **kwargs):
...
kwargs = dict(a=11, b='22', c=33, d=44)
call(fn1, **kwargs) # passes {a, b}, not {c, d}
call(fn2, **kwargs) # passes {a, b, c}, not {d}
utz.jsn
: JsonEncoder
for datetimes, dataclasses
from utz import dataclass, Encoder, fromtimestamp, json # Convenience imports from standard library
epoch = fromtimestamp(0)
print(json.dumps({ 'epoch': epoch }, cls=Encoder))
# {"epoch": "1969-12-31 19:00:00"}
print(json.dumps({ 'epoch': epoch }, cls=Encoder("%Y-%m-%d"), indent=2))
# {
# "epoch": "1969-12-31"
# }
@dataclass
class A:
n: int
print(json.dumps(A(111), cls=Encoder))
# {"n": 111}
See test_jsn.py
for more examples.
utz.cli
provides wrappers around click.option
for parsing common option formats:
@count
: "count" options, including optional value mappings (e.g.-v
→ "info",-vv
→ "debug")@multi
: parse comma-delimited values (or other delimiter), with optional value-parse
callback (e.g.-a1,2 -a3
→(1,2,3)
)@num
: parse numeric values, including human-readable SI/IEC suffixes (i.e.10k
→10_000
)@obj
: parse dictionaries from multi-value options (e.g.-eFOO=BAR -eBAZ=QUX
→dict(FOO="BAR", BAZ="QUX")
)
# cli.py
from utz.cli import cmd, count, multi, num, obj
from typing import Literal
@cmd # Alias for `click.command`
@multi('-a', '--arr', parse=int, help="Comma-separated integers")
@obj('-e', '--env', help='Env vars, in the form `k=v`')
@num('-m', '--max-memory', help='Max memory size (e.g. "100m"')
@count('-v', '--verbosity', values=['warn', 'info', 'debug'], help='0x: "warn", 1x: "info", 2x: "debug"')
def main(
arr: tuple[int, ...],
env: dict[str, str],
max_memory: int,
verbosity: Literal['warn', 'info', 'debug'],
):
print(f"{arr} {env} {max_memory} {verbosity}")
if __name__ == '__main__':
main()
Saving the above as cli.py
and running will yield:
python cli.py -a1,2 -a3 -eAAA=111 -eBBB=222 -m10k
# (1, 2, 3) {'AAA': '111', 'BBB': '222'} 10000 warn
python cli.py -m 1Gi -v
# () {} 1073741824 info
See test_cli
for more examples.
Use memray to profile memory allocations, extract stats and peak memory use:
from utz.mem import Tracker
from utz import iec
with (tracker := Tracker()):
nums = list(sorted(range(1_000_000, 0, -1)))
peak_mem = tracker.peak_mem
print(f'Peak memory use: {peak_mem:,} ({iec(peak_mem)})')
# Peak memory use: 48,530,432 (46.3 MiB)
utz.time
: Time
timer, now
/today
helpers
from utz import Time, sleep
time = Time()
time("step 1")
sleep(1)
time("step 2")
sleep(1)
time() # "close" "step 2"
print(f'Step 1 took {time["step 1"]:.1f}s, step 2 took {time["step 2"]:.1f}s.')
# Step 1 took 1.0s, step 2 took 1.0s.
# Can also be used as a contextmanager:
with time("run"):
sleep(1)
print(f'Run took {time["run"]:.1f}s')
# Run took 1.0s
now
and today
are wrappers around datetime.datetime.now
that expose convenient functions:
from utz import now, today
now() # 2024-10-11T15:43:54Z
today() # 2024-10-11
now().s # 1728661583
now().ms # 1728661585952
Use in conjunction with utz.bases
codecs for easy timestamp-nonces:
from utz import b62, now
b62(now().s) # A18Q1l
b62(now().ms) # dZ3fYdS
b62(now().us) # G31Cn073v
Sample values for various units and codecs:
unit | b62 | b64 | b90 |
---|---|---|---|
s | A2kw7P |
+aYIh1 |
:Kn>H |
ds | R7FCrj |
D8oM9b |
"tn_BH |
cs | CCp7kK0 |
/UpIuxG |
=Fc#jK |
ms | dj4u83i |
MFSOKhy |
#8;HF8g |
us | G6cozJjWb |
385u0dp8B |
D>$y/9Hr |
(generated by time-slug-grid.py
)
utz.size
: humanize.naturalsize
wrapper
iec
wraps humanize.naturalsize
, printing IEC-formatted sizes by default, to 3 sigfigs:
from utz import iec
iec(2**30 + 2**29 + 2**28 + 2**27)
# '1.88 GiB'
utz.hash_file
: hash file contents
from utz import hash_file
hash_file("path/to/file") # sha256 by default
hash_file("path/to/file", 'md5')
utz.ym
: YM
(year/month) class
The YM
class represents a year/month, e.g. 202401
for January 2024.
from utz import YM
ym = YM(202501) # Jan '25
assert ym + 1 == YM(202502) # Add one month
assert YM(202502) - YM(202406) == 8 # subtract months
YM(202401).until(YM(202501)) # 202401, 202402, ..., 202412
# `YM` constructor accepts several representations:
assert all(ym == YM(202401) for ym in [
YM(202401),
YM('202401'),
YM('2024-01'),
YM(2024, 1),
YM(y=2024, m=1),
YM(dict(year=2022, month=12)),
YM(YM(202401)),
])
utz.cd
: "change directory" contextmanager
from utz import cd
with cd('..'): # change to parent dir
...
utz.gzip
: deterministic GZip helpers
from utz import deterministic_gzip_open, hash_file
with deterministic_gzip_open('a.gz', 'w') as f:
f.write('\n'.join(map(str, range(10))))
hash_file('a.gz') # dfbe03625c539cbc2a2331d806cc48652dd3e1f52fe187ac2f3420dbfb320504
See also: test_gzip.py
.
Helpers for Plotly transformations I make frequently, e.g.:
from utz import plot
import plotly.express as px
fig = px.bar(x=[1, 2, 3], y=[4, 5, 6])
plot(
fig,
name='my-plot', # Filename stem. will save my-plot.png, my-plot.json, optional my-plot.html
title=['Some Title', 'Some subtitle'], # Plot title, followed by "subtitle" line(s) (smaller font, just below)
bg='white', xgrid='#ccc', # white background, grey x-gridlines
hoverx=True, # show x-values on hover
x="X-axis title", # x-axis title or configs
y=dict(title="Y-axis title", zerolines=True), # y-axis title or configs
# ...
)
Example usages: hudcostreets/nj-crashes, ryan-williams/arrayloader-benchmarks.
utz.setup
: setup.py
helper
utz/setup.py
provides defaults for various setuptools.setup()
params:
name
: use parent directory nameversion
: parse from git tag (otherwise fromgit describe --tags
)install_requires
: readrequirements.txt
author_{name,email}
: infer from last commitlong_description
: parseREADME.md
(and setlong_description_content_type
)description
: parse first<p>
under opening<h1>
fromREADME.md
license
: parse fromLICENSE
file (MIT and Apache v2 supported)
For an example, see gsmo==0.0.1
(and corresponding release).
This library also "self-hosts" using utz.setup
; see pyproject.toml:
[build-system]
requires = ["setuptools", "utz[setup]==0.4.2", "wheel"]
build-backend = "setuptools.build_meta"
and setup.py:
from utz.setup import setup
extras_require = {
# …
}
# Various fields auto-populated from git, README.md, requirements.txt, …
setup(
name="utz",
version="0.8.0",
extras_require=extras_require,
url="https://github.com/runsascoded/utz",
python_requires=">=3.10",
)
The setup
helper can be installed via a pip "extra":
pip install utz[setup]
utz.test
: dataclass
test cases, raises
helper
utz.parametrize
: pytest.mark.parametrize
wrapper, accepts dataclass
instances
from utz import parametrize
from dataclasses import dataclass
def fn(f: float, fmt: str) -> str:
"""Example function, to be tested with ``Case``s below."""
return f"{f:{fmt}}"
@dataclass
class case:
"""Container for a test-case; float, format, and expected output."""
f: float
fmt: str
expected: str
@property
def id(self):
return f"fmt-{self.f}-{self.fmt}"
@parametrize(
case(1.23, "0.1f", "1.2"),
case(123.456, "0.1e", "1.2e+02"),
case(-123.456, ".0f", "-123"),
)
def test_fn(f, fmt, expected):
"""Example test, "parametrized" by several ``Cases``s."""
assert fn(f, fmt) == expected
test_parametrize.py
contains more examples, customizing test "ID"s, adding parameter sweeps, etc.
utz.docker
, utz.tmpdir
, etc.
Misc other modules:
- bases: encode/decode in various bases (62, 64, 90, …)
- escape: split/join on an arbitrary delimiter, with backslash-escaping;
utz.esc
escapes a specific character in a string. - ctxs: compose
contextmanager
s - o:
dict
wrapper exposing keys as attrs (e.g.:o({'a':1}).a == 1
) - docker: DSL for programmatically creating Dockerfiles (and building images from them)
- tmpdir: make temporary directories with a specific basename
- ssh: SSH tunnel wrapped in a context manager
- backoff: exponential-backoff utility
- git: Git helpers, wrappers around GitPython
- pnds: pandas imports and helpers
Some repos that use utz
: