Yarm: Yet Another Report Maker.
Yarm makes it easy for you to create recurring reports by:
- Importing multiple spreadsheets and CSVs into a temporary database.
- Offering easy options for common data cleaning tasks (e.g.
replace
,slugify_columns
,pivot
) - Running SQL queries (or, for pandas fans, custom Python code) on all this data.
- Exporting the results as a new spreadsheet, CSV, or even SQLite database.
- All configured in a simple YAML file for easy reuse. Download fresh data,
yarm run
, and you're done.
-
Collect your XLSX and/or CSV data files into a directory for this report.
-
Initialize a new YAML config file:
$ yarm new
-
Edit the YAML config file (see below).
-
Run the report:
$ yarm run
- Send the output spreadsheet to your boss/client/head of state. Was it really that easy?
-
Collect fresh data. Save it over the old files.
-
Run the report.
$ yarm run -f
-
Send the output spreadsheet.
-
Take the afternoon off.
Please see the extensive documentation for more details and features.
You configure a report in a single YAML file.
Each query becomes a separate sheet in your output spreadsheet.
This example config file is moderately complex. Your report can be much simpler; you might have only one or two tables and a single query. (Or you might have ten queries, each with a custom postprocess function...)
---
output:
dir: Output
basename: Sales_Report
# Optional input options (more are available):
input:
slugify_columns: true
lowercase_columns: true
# Set up your data sources:
tables_config:
# CSV file: the easiest data source.
products:
- path: Products.csv
# Spreadsheet: You need both the path and the sheet name.
orders:
- path: Orders.xlsx
sheet: Orders
# You can import different sheets as separate tables.
order_details:
- path: Orders.xlsx
sheet: Order Details
# You can combine multiple data sources into a single table,
# as long as their columns can be merged.
tax:
- path: Sales Tax Rates Northeast.xlsx
sheet: NY
- path: Sales Tax Rates Northeast.xlsx
sheet: PA
- path: TAXES_SOUTH.csv
# Set up your output spreadsheet:
queries:
- name: Order Details with Product Names
sql: SELECT * FROM order_details as od JOIN products as p ON od.product_id = p.id;
- name: Orders With Sales Tax
sql: >
SELECT orders.*,
tax.rate
FROM orders
JOIN tax
ON orders.billing_state = tax.state
;
# These query results will need a Python function to complete this sheet:
postprocess: calculate_tax
# But first, we can do simple regex replacements right here:
replace:
billing_state:
Virginia: VA
West Virginia: WV
# Since we need that custom function calculate_tax(), we'll
# write it in a separate Python file.
import:
- path: custom.py
Read more about basic configuration and advanced options.
If the power of SQL and make-it-easy options like slugify_columns
aren't enough for you, you can write a custom postprocess function for any query you like.
Yarm is currently in alpha. Core features are working and thoroughly documented.
I rely on yarm
for my own recurring reports.
If you are desperate to stop doing a recurring report by hand, give yarm a try.
If something breaks, or if you have any suggestions or comments, please file an issue. I'd love to hear what you think.
For upcoming features, see the Roadmap.
- Python 3.7 or later
- A terminal
- One or more spreadsheets that you want to query
- Something to do with all this impending free time...
You can install yarm via pip from PyPI:
$ pip install yarm
But since yarm is a command line tool, you may prefer the excellent pipx:
$ pipx install yarm
Complete, extensive documentation is at yarm.readthedocs.io.
Dive right in.
This tool has a clear focus: Make it easy to run and rerun reports from the command line that query multiple sources of tabular data.
Once you set up the initial configuration file, the workflow for future reports is simple. Download fresh data over the old files, then rerun the report.
This means that yarm
is not a tool for data exploration.
True, you may still want yarm
to prepare your data for exploration. Once you get used to listing a few data sources, setting a few options, and spitting out a nice, clean SQLite database or set of CSV files to play with, you may get hooked.
But for iterative tinkering with your data, you're going to need other tools.
-
sqlitebrowser: An excellent GUI for exploring your SQLite database. I sometimes use this to figure out my queries before I save them into my config file.
-
Jupyter Lab: If you find your SQL queries getting more and more arcane and complex, it's probably time to learn pandas, and that means unleashing the power of this interactive "notebook". Some reports are so complex that they really deserve to be run step-by-step, with immediate output after every command. Jupyter Lab makes that absurdly easy... and repeatable.
-
SQL Notebook: A newer offering that I haven't used yet, but it looks like an interesting GUI combination of sqlitebrowser and a "Jupyter-style notebook interface". Could be very powerful.
-
For quick, one-off data manipulations on the command line, you can reach for excellent tools like jq for JSON, mlr for CSV, and even htmlq for HTML. But once the command gets long and complex enough that you want to save it to a script, you might start missing SQL queries and
yarm
features likeslugify_columns: true
.
Contributions are very welcome. To learn more, see the Contributor Guide.
Distributed under the terms of the Apache 2.0 license, Yarm is free and open source software.
If you encounter any problems, please file an issue along with a detailed description.
This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.