03 Sep 19:04

e42f499

0.133.1

Highlights

¹ This release doubles down on Polars' capabilities, as we now, as a matter of policy track the latest polars upstream. If you think qsv has a torrid release schedule, you should see Polars. They're constantly fixing bugs, adding new features and optimizations!
To keep up, we've added Polars revision info to the --version output, and the --envlist option now includes Polars relevant env vars. We've also added support for the POLARS_BACKTRACE_IN_ERR env var to control whether Polars backtraces are included in error messages.
We also removed the to parquet subcommand as its redundant with the Polars-powered sqlp's ability to create parquet files. This removes the HUGE duckdb dependency, which should markedly make compile times shorter and binaries smaller.


¹	This release doubles down on Polars' capabilities, as we now, as a matter of policy track the latest polars upstream. If you think qsv has a torrid release schedule, you should see Polars. They're constantly fixing bugs, adding new features and optimizations! To keep up, we've added Polars revision info to the `--version` output, and the `--envlist` option now includes Polars relevant env vars. We've also added support for the `POLARS_BACKTRACE_IN_ERR` env var to control whether Polars backtraces are included in error messages. We also removed the `to parquet` subcommand as its redundant with the Polars-powered `sqlp`'s ability to create parquet files. This removes the HUGE duckdb dependency, which should markedly make compile times shorter and binaries smaller.

Other highlights include:

New edit command that allows you to edit CSV files.
The count command's --width option now includes record width stats beyond max length (avg, median, min, variance, stddev & MAD).
The fixlengths command now has --quote and --escape options.
The stats command adds a sort_order streaming statistic.

NOTE: 0.133.0 was skipped because of a dev dependency conflict with the csvs_convert crate, preventing us from publishing 0.133.0 to crates.io. This has been resolved in 0.133.1.

Added

count: expanded --width options, adding record width stats beyond max length (avg, median, min, variance, stddev & MAD). Also added --json output when using --width #2099
edit: add qsv edit command by @rzmk in #2074
fixlengths: added --quote and --escape options #2104
stats: add sort_order streaming statistic #2101
polars: add polars revision info to --version output e60e44f
polars: added Polars relevant env vars to --envlist option 0ad68fe
polars: add & document POLARS_BACKTRACE_IN_ERR env var f9cc559

Changed

Optimize polars optflags #2089
deps: bump polars 0.42.0 to latest upstream at time of release 3b7af51
bump polars to latest upstream, removing smartstring #2091
build(deps): bump actions/setup-python from 5.1.1 to 5.2.0 by @dependabot in #2094
build(deps): bump flate2 from 1.0.32 to 1.0.33 by @dependabot in #2085
build(deps): bump flexi_logger from 0.28.5 to 0.29.0 by @dependabot in #2086
build(deps): bump indexmap from 2.4.0 to 2.5.0 by @dependabot in #2096
build(deps): bump jsonschema from 0.18.0 to 0.18.1 by @dependabot in #2084
build(deps): bump serde from 1.0.208 to 1.0.209 by @dependabot in #2082
build(deps): bump serde_json from 1.0.125 to 1.0.127 by @dependabot in #2079
build(deps): bump sysinfo from 0.31.2 to 0.31.3 by @dependabot in #2077
build(deps): bump qsv-stats from 0.18.0 to 0.19.0 by @dependabot in #2100
build(deps): bump tokio from 1.39.3 to 1.40.0 by @dependabot in #2095
apply select clippy lint suggestions
updated several indirect dependencies
made various doc and usage text improvements
pin Rust nightly to 2024-08-26 from 2024-07-26, aligning with Polars pinned nightly

Fixed

Ensure portable binaries are "added" to the publish zip archive, instead of replacing all the binaries with just the portable version. Fixes #2083. 34ad206

Removed

removed to parquet subcommand as its redundant with sqlp's ability to create parquet files. This also removes the HUGE duckdb dependency, which should markedly make compile times shorter and binaries much smaller #2088
removed smartstring dependency now that Polars has its own compact inlined string type 47f047e
removed to parquet benchmark

Full Changelog: 0.132.0...0.133.1

ChatGPT prompt: Using the logos for the Polars project and the qsv project as a baseline, can you create a version with the cowboy riding a polar bear instead? ↩

Contributors

dependabot and rzmk

Assets 15

21 Aug 10:34

jqnatividad

0.132.0

d644e83

0.132.0

Highlights

With this release, we finally finish the stats caching refactor started in 0.131.0, replacing the binary encoded stats cache with a simpler JSONL cache. The stats cache stores the necessary statistical metadata to make several key commands smarter & faster. Per the benchmarks:

frequency is 6x faster (frequency_index_stats_mode_auto).
Not only is it faster, it now doesn't need to compile a hashmap for columns with ALL unique values (e.g. ID columns) - practically, making it able to handle "real-world" datasets of any size (that is, unless all the columns have ALL unique cardinalities. In that case, the entire CSV will have to fit into memory).
tojsonl is 2.67x faster (tojsonl_index)
schema is two orders of magnitude (100x) faster!!! (schema_index)

The stats cache also provides the foundation for even more "smart" features and commands in the future. It also has the side-benefit of adding a way to produce stats in JSONL format that can be used for other purposes beyond qsv.

The search, searchset, and replace commands now also have a --literal option that allows you to search for and replace strings with regex special/reserved characters. This makes it easier to search for and replace strings that contain otherwise reserved regex characters without having to escape them (especially useful with URL columns that often contain characters like ?,:,-,., etc.)

Added

search, searchset & replace: add --literal option #2060 & 7196053
slice: added usage text examples 04afaa3
publish: added workflow to build "portable" binaries with CPU features disabled
contrib(completions): add --literal for search and searchset by @rzmk in #2061
contrib(completions): add --literal completion to replace by @rzmk in #2062
add more polars metadata in --version info #2073
docs: added more info to SECURITY.md 609d4df
docs: expanded Goals/Non-Goals 54998e3
docs: added Installation "Option 0" quick start bf5bf82
added search --literal benchmark

Changed

stats, schema, frequency & tojsonl: stats caching refactor, replacing binary encoded stats cache with a simpler JSONL cache #2055
rename stats --stats-json option to stats --stats-jsonl #2063
changed "broken pipe" error to a warning 7353275
docs: update multithreading and caching sections of PERFORMANCE.md 5e6bc45
deps: switch to our qsv-optimized fork of csv crate 3fc1e82
deps: bump polars from 0.41.3 to 0.42.0 #2051
build(deps): bump actix-web from 4.8.0 to 4.9.0 by @dependabot in #2041
build(deps): bump flate2 from 1.0.31 to 1.0.32 by @dependabot in #2071
build(deps): bump indexmap from 2.3.0 to 2.4.0 by @dependabot in #2049
build(deps): bump reqwest from 0.12.6 to 0.12.7 by @dependabot in #2070
build(deps): bump rust_decimal from 1.35.0 to 1.36.0 by @dependabot in #2068
build(deps): bump serde from 1.0.205 to 1.0.206 by @dependabot in #2043
build(deps): bump serde from 1.0.206 to 1.0.207 by @dependabot in #2047
build(deps): bump serde from 1.0.207 to 1.0.208 by @dependabot in #2054
build(deps): bump serde_json from 1.0.122 to 1.0.124 by @dependabot in #2045
build(deps): bump serde_json from 1.0.124 to 1.0.125 by @dependabot in #2052
apply select clippy lint suggestions
updated several indirect dependencies
made various usage text improvements

Fixed

stats: fix --output delimiter inferencing based on file extension #2065
make process_input helper handle stdin better #2058
docs: fix completions for --stats-jsonl and qsv pro installation text update by @rzmk in #2072
docs: added Note about why luau feature is disabled in musl binaries - ffa2bc5 & 27d0f8e

Removed

Removed bincode dependency now that we're using JSONL stats cache #2055 babd92b

Full Changelog: 0.131.1...0.132.0

Contributors

dependabot and rzmk

Assets 15

09 Aug 14:44

jqnatividad

0.131.1

c60b99d

0.131.1

Changed

deps: bump polars to latest upstream post py-1.41.1 release at the time of this release
build(deps): bump filetime from 0.2.23 to 0.2.24 by @dependabot in #2038

Fixed

frequency: change --stats-mode default to none from auto.
This is because of a big performance regression when using --stats-mode auto on datasets with columns with ALL unique values.
See #2040 for more info.

Full Changelog: 0.131.0...0.131.1

Contributors

dependabot

Assets 15

09 Aug 01:03

jqnatividad

0.131.0

c26cd0b

0.131.0

Highlights

Refactored frequency to make it smarter and faster.
frequency's core algorithm essentially compiles an in-memory hashmap to determine the frequency of each unique value for each column. It does this using multi-threaded, multi-I/O techniques to make it blazing fast.
However, for columns with ALL unique values (e.g. ID columns), this takes a comparatively long time and consumes a lot of memory as it essentially compiles a hashmap of the ENTIRE column, with a hashmap entry for each column value with a count of 1.
Now, with the new --stats-mode option (enabled by default), frequency can compile the dataset in a more intelligent way by looking up a column's cardinality in the stats cache.
If the cardinality of a column is equal to the CSV's rowcount (indicating a column with ALL unique values), it short-circuits frequency calculations for that column - dramatically reducing the time and memory requirements for the ID column as it eliminates the need to maintain a hashmap for it.
Practically speaking, this makes frequency able to handle "real-world" datasets of any size.
To ensure frequency is as fast as possible, be sure to index and compute stats for your datasets beforehand.
Setting the stage for Datapusher+ v1 and...
The "itches we've been scratching" the past few months have been informed by our work at several clients towards the release of Datapusher+ 1.0 and qsv pro 1.0 (more info below) - both targeted for release this month.
DP+ is our third-gen, high-speed data ingestion/registration tool for CKAN that uses qsv as its data wrangling/analysis engine. It will enable us to reinvent the way data is ingested into CKAN - with exponentially faster data ingestion, metadata inferencing, data validation, computed metadata fields, and more!
We're particularly excited how qsv will allow us to compute and infer high-quality metadata for datasets (with a focus on inferring optional recommended DCAT-US v3 metadata fields) in "near real-time", while dataset publishers are still entering metadata. This will be a game-changer for CKAN administrators and data publishers!
...qsv pro 1.0
qsv pro is datHere's enterprise-grade data wrangling/curation workbench that’s planned for v1.0 release this month.
Building the core functionality of qsv pro's Workflow feature is one of the primary reasons for a v1.0 release.
We feel qsv pro may be a game-changer for data wranglers and data curators who need to work with spreadsheets and large datasets to view statistical data and metadata while also performing complex data wrangling operations in a user-friendly way without having to write code.

Added

docs: added Shell Completion section 556a2ff
docs: add 🪄 emoji in legend to indicate "automagical" commands 2753c90
Add building deb package (WIP) by @tino097 in #2029
Added GitHub workflow to test debian package (WIP) by @tino097 in #2032
tests: added false positive to _typos.toml configuration d576af2
added more benchmarks
added more tests

Changed

fetch & fetchpost: remove expired diskcache entries on startup 9b6ab5d
frequency: smarter frequency compilation with new --stats-mode option #2030
json: refactored for maintainability & performance 62e9216 and 4e44b18
improved self-update messages 5c874e0 and 0aa0b13
contrib(completions): frequency updates & remove bashly/fish by @rzmk in #2031
Debian package update by @tino097 in #2017
publish: optimized enabled CPU features when building release binaries in all GitHub Actions "publishing" workflows
publish: ensure latest Python patch release is used when building qsvpy binary variants 2ab03a0 and ec6f486
tests: also enabled CPU features in CI tests
docs: wordsmith qsv "elevator pitch" cc47fe6
docs: point to https://100.dathere.com in Whirlwind tour fc49aef
deps: bump polars to latest upstream post py-1.41.1 release at the time of this release
build(deps): bump bytes from 1.6.1 to 1.7.0 by @dependabot in #2018
build(deps): bump bytes from 1.7.0 to 1.7.1 by @dependabot in #2021
build(deps): bump flate2 from 1.0.30 to 1.0.31 by @dependabot in #2027
build(deps): bump indexmap from 2.2.6 to 2.3.0 by @dependabot in #2020
build(deps): bump jaq-parse from 1.0.2 to 1.0.3 by @dependabot in #2016
build(deps): bump redis from 0.26.0 to 0.26.1 by @dependabot in #2023
build(deps): bump regex from 1.10.5 to 1.10.6 by @dependabot in #2025
build(deps): bump serde_json from 1.0.121 to 1.0.122 by @dependabot in #2022
build(deps): bump sysinfo from 0.30.13 to 0.31.0 by @dependabot in #2019
build(deps): bump sysinfo from 0.31.0 to 0.31.2 by @dependabot in #2024
build(deps): bump tempfile from 3.11.0 to 3.12.0 by @dependabot in #2033
build(deps): bump serde from 1.0.204 to 1.0.205 by @dependabot in #2036
apply select clippy suggestions
updated several indirect dependencies
made various usage text improvements
bumped MSRV to 1.80.1

Fixed

sqlp & joinp: fixed .ssv.sz output auto-compression support 5397f6c & d86ba63
docs: fix link by @uncenter in #2026
tests: correct misnamed test 8ae6000
tests: fix flaky reverse property tests d86ba63

Removed

docs: "Quicksilver" is the name of the logo horse, not how you pronounce "qsv" e4551ae

New Contributors

@uncenter made their first contribution in #2026

Full Changelog: 0.130.0...0.131.0

Contributors

tino097, dependabot, and 2 other contributors

Assets 15

29 Jul 19:07

jqnatividad

0.130.0

1d4b2bd

0.130.0

Following the 0.129.0 release - the largest release to date, 0.130.0 continues to polish qsv as a data-wrangling engine, packing new features, fixes, and improvements, previewing upcoming features in qsv pro 1.0. Here are a few highlights:

Highlights

Added .ssv (semicolon separated values) automatic support. Semicolon separated values are now automatically detected and supported by qsv. Though not as common as CSV, SSV is used in some regions and industries, so qsv now supports it.
Added cargo deb compatibility. In preparation for the release of DataPusher+ 1.0, we're now making it easier to upgrade qsvdp so CKAN administrators can install and upgrade it easily using apt-get install qsvdp or apt-get upgrade qsvdp.
DP+ is our next-gen, high-speed data ingestion tool for CKAN that uses qsv as its analysis engine. Its not only a robust, fast, validating data pump that guarantees high quality data, it also does extended analysis to infer and automatically derive high-quality metadata - what we call "automagical metadata".
Upgraded to the latest Polars upstream at the py-polars-1.3.0 tag. Polars tops the TPC-H Benchmark and is several orders of magnitude faster than traditional dataframe libraries (cough - 🐼 pandas). qsv proudly rides the 🐻‍❄️ Polars bear to get subsecond response times even with very large datasets!
qsv v0.130.0 shell completions files are available for download here. With shell completions, pressing tab in a compatible shell provides suggestions for various qsv commands, subcommands, and options that you can choose from. Supported shells include bash, zsh, powershell, fish, nushell, fig, and elvish. View tips on how to install completions for the bash shell here.

Added

apply: add base62 encode/decode operations #2013
headers: add --just-count option #2004
json: add --select option #1990
searchset: add --not-one flag by @rzmk in #1994
Added .ssv (semicolon separated values) automatic support #1987
Added cargo deb compatibility by @tino097 in #1991
contrib(completions): add --just-count for headers by @rzmk in #2006
contrib(completions): add --select for json by @rzmk in #1992
added several benchmarks
added more tests

Changed

diff: allow selection of --key and --sort-columns by name, not just by index #2010
fetch & fetchpost: replace deprecated Redis execute command 75cbe2b
stats: more intelligent --infer-lenoption c6a0e64
validate: return delimiter detected upon successful CSV validation #1977
bump polars to latest upstream at py-polars-1.3.0 tag #2009
deps: bump csvs_convert from 0.8.12 to 0.8.13 d1d0800
build(deps): bump cached from 0.52.0 to 0.53.0 by @dependabot in #1983
build(deps): bump cached from 0.53.0 to 0.53.1 by @dependabot in #1986
build(deps): bump postgres from 0.19.7 to 0.19.8 by @dependabot in #1985
build(deps): bump pyo3 from 0.22.1 to 0.22.2 by @dependabot in #1979
build(deps): bump redis from 0.25.4 to 0.26.0 by @dependabot in #1995
build(deps): bump serde_json from 1.0.120 to 1.0.121 by @dependabot in #2011
build(deps): bump simple-expand-tilde from 0.1.7 to 0.4.0 by @dependabot in #1984
build(deps): bump tokio from 1.38.0 to 1.38.1 by @dependabot in #1973
build(deps): bump tokio from 1.38.1 to 1.39.1 by @dependabot in #1988
build(deps): bump xxhash-rust from 0.8.11 to 0.8.12 by @dependabot in #1997
apply select clippy suggestions
updated several indirect dependencies
made various usage text improvements
pin Rust nightly to 2024-07-26

Fixed

diff: clarify --key usage examples, resolves #1998 by @rzmk in #2001
json: refactored so it didn't need to use threads to spawn qsv select to order the columns. Had to do this as sometimes intermediate output was sent to stdout before the final output was ready 0f25def
py: replace row with col in usage text by @allen-chin in #2008
reverse: fix indexed bug #2007
validate: properly auto-detect tab delimiter when file extension is TSV or TAB #1975
fix panic when process_input helper fn receives unexpected input from stdin 152fec4

Removed

docs: remove *nix only message for foreach by @rzmk in #1972

New Contributors

@tino097 made their first contribution in #1991
@allen-chin made their first contribution in #2008

Full Changelog: 0.129.1...0.130.0

To stay updated with datHere's latest news and updates (including qsv pro, datHere's CKAN DMS, and analyze.dathere.com), subscribe to the newsletter here: dathere.com/newsletter

Contributors

tino097, allen-chin, and 2 other contributors

Assets 15

15 Jul 10:56

jqnatividad

0.129.1

756bfba

0.129.1

This is a small patch release to fix some publishing issues, update tab completion, and to fix minor CI errors.
See 0.129.0 release notes to get the details on qsv's biggest release to date!

Changed

clipboard: add error handling based on clipboard::Error by @rzmk in #1970
contrib(completions): add all commands (except applydp & generate) by @rzmk in #1971
Temporarily suppressed some CI tests that were flaky on GH macOS Apple Silicon action runners. They previously worked fine on self-hosted macOS Apple Silicon action runners that are temporarily unavailable.

Full Changelog: 0.129.0...0.129.1

Contributors

rzmk

Assets 15

14 Jul 10:06

jqnatividad

0.129.0

640a32c

0.129.0

This release is the biggest one ever!

Packed with new features, improvements, and previews of upcoming qsv pro features, here are a few highlights:

📌 Highlights (click each dropdown for more info)

Meet @rzmk - qsv pro's software engineer now also co-maintains qsv!

@rzmk has contributed to projects in the qsv ecosystem including qsv's describegpt, prompt, json, and clipboard commands; qsv's tab completion support; qsv.dathere.com including its online configurator and benchmarks page; 100.dathere.com with its qsv lessons and exercises; and qsv pro the spreadsheet data wrangling desktop app (along with its promo site). @rzmk now also co-maintains qsv!

With @rzmk now also co-maintaining qsv, our data-wrangling portfolio's roadmap may get more intriguing as @rzmk's work on qsv pro, 100.dathere.com, and other initiatives can result in contributions to qsv as we've seen in this release. Perhaps some aims may be put towards AI; "automagical" metadata inferencing; DCAT 3; and expanded recipe support with the accelerated evolution of qsv pro as an enterprise-grade Data-Wrangling/Data Curation Workbench.

Polars v0.41.3 - numerous sqlp and joinp improvements

sqlp: expanded SQL support
- Natural Join support
- DuckDB-like COLUMNS SQL function to select columns that match a pattern
- ORDER BY ALL support
- Support POSTGRESQL ^@ ("starts with"), ~~,~~*,!~~,!~~* ("like", "ilike") string-matching operators
- Support for SQL SELECT * ILIKE wildcard syntax
- Support SQL temporal functions STRFTIME and STRPTIME
sqlp: added --streaming option

New command qsv prompt - Use a file dialog for qsv file input and output

Be more interactive with qsv by using a file dialog to select a file for input and output.

Here are a few key highlights:

Start with qsv prompt when piping commands to provide a file as input from an open file dialog and pipe it into another command, for example: qsv prompt | qsv stats.
End with qsv prompt -f when piping commands to save the output to a file you choose with a save file dialog.

There are other options too, so feel free to explore more with qsv prompt --help.

This will allow you to create qsv pipelines that are more "user-friendly" and distribute them to non-technical users. It's not as flexible as qsv pro's full-blown GUI, but it's a start!

New command qsv json - Convert JSON data to CSV and optionally provide a jq-like filter

The new json command allows you to convert non-nested JSON data to CSV. If your data is not in the expected format, try using the --jaq option to provide a jq-like filter. See qsv json --help for more information and examples.

Here are a few key highlights:

Specify the path to a JSON file to attempt conversion to CSV with qsv json <filepath>.
Attempt conversion of JSON to CSV data from stdin, for example: qsv slice <filepath.csv> --json | qsv json.
Write the output to a file with the --output <filepath> (or -o for short) option.
Use the --jaq <filter> option to try converting nested or complex JSON data into the intended format before parsing to CSV.

You may learn more by running qsv json --help.

Along with the jsonl command, we now have more options to convert JSON to CSV with qsv!

New command qsv clipboard - Provide input from your clipboard and save output to your clipboard

Provide your clipboard content using qsv clipboard and save output to your clipboard by piping into qsv clipboard --save (or -s for short).

100.dathere.com - Try out lessons and exercises with qsv from your browser!

You may run qsv commands from your browser without having to install it locally at 100.dathere.com.

Within the lesson (in-page) using Thebe	In a Jupyter Lab environment

Thanks to Jupyter Book, datHere has released a website available at 100.dathere.com where you may explore lessons and exercises with qsv by running them within the web page, in a Jupyter Lab environment, or locally after following the provided installation instructions. There are multiple exercises planned, but feel free to try out the first few available lessons/exercises by visiting 100.dathere.com and star the source code's repository here.

New multi-shell completions draft (bash, zsh, powershell, fish, nushell, fig, elvish)

There's a draft of more qsv shell completion support including 7 different shells! The plan is to add the rest of the commands in this implementation since we can use one codebase to generate the 7 shell completion script files. Feel free to try out the various shell completions in the examples folder from contrib/completions to verify if the examples work (as of today's release date only qsv count and qsv clipboard may be available) and also contribute to adding the rest of the completions if you know a bit of Rust.

The existing Bash shell completions for v0.129.0 and fish shell completions draft are available for now as the multi-shell completions draft is being developed.

Bash completions demo	Fish completions demo

With shell completions enabled, you may identify qsv commands more easily when pressing the tab key on your keyboard in certain positions using the relevant Bash or fish shell from your terminal. You may follow the instructions from 100.dathere.com here to learn how to install the Bash completions and under the Usage section here for fish shell completions. Note that the fish shell completions are incomplete and both of the implementations may be replaced by the multi-shell completions implementation once complete.

qsvpro.dathere.com - Preview: Download spreadsheets from a compatible CKAN instance into the qsv pro Workflow

This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.

In addition to importing local spreadsheet files and uploading to a CKAN instance, this new feature allows users to select a locally registered CKAN instance where they have the create_dataset permission to download a spreadsheet file from their CKAN instance and load the new local spreadsheet file into the Workflow. qsv pro's Workflow would therefore have both upload and download capability to and from a compatible CKAN instance.

qsvpro.dathere.com - Preview: Attempt SQL query generation from natural language with a compatible LLM API instance

This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.
Also note that this video is sped up as you may see by...

Contributors

dependabot and rzmk

Assets 15

25 May 22:33

jqnatividad

0.128.0

98c9d95

0.128.0

[0.128.0] - 2024-05-25

❤️ csv,conf,v8 Edition 🎉
🏇🏽 ¡Ándale! ¡Ándale! ¡Arriba! ¡Arriba! 💨

Yii-hah! We're Mexico bound as we head to csv,conf,v8 to present and share qsv with fellow data-makers and wranglers from all over!

And we've packed a lot into this release for the occasion:

search got a lot of love as it now powers qsv pro's new search feature to get near-instant search results even on large datasets.
stats - the ❤️ of qsv, now has several cache fine-tuning options with --cache-threshold. It now also computes max_precision for floats and is_ascii for strings. It also has a new --round 9999 sentinel value to suppress rounding of statistics.
schema & tojsonl are now faster thanks to stats --cache-threshold autoindex & cache creation/deletion logic.
We upgraded Polars to 0.40.0 to unlock additional capabilities in the count, joinp & sqlp commands.
count now has an additional blazing fast counting mode using Polars' read_csv() table function.
frequency gets some micro-optimizations for even faster frequency analysis.
luau is now bundled with luau 0.625 from 0.622. We also upgraded the bundled LuaDate library from 2.2.0 to 2.2.1. All of this, while making it ~10% faster!

Overall, qsv manages to keep its performance edge despite the addition of new capabilities and features. We'll give a whirlwind tour of qsv and these updates in our talk at csv,conf,v8.

We'll also preview what we've been calling the People's APPI - our "Answering People/Policymaker Interface" in qsv pro.

This is a new way to interact with qsv that's more conversational and less command-line-y using a natural language interface. It's a way to make qsv more accessible to more people, especially those who are not comfortable with the command line.

We're excited to share all these qsv innovations with the csv,conf,v8 community and the wider world! Nos vemos en Puebla!

¡Ándele! ¡Ándele! ¡Epa! ¡Epa! ¡Epa!

Added

count: additional Polars-powered counting mode using read_csv() SQL table function 05c5809
input: add --quote-style option df3c8f1
joinp: add --coalesce option 8d142e5
search: add --preview-match option #1785
search: add --json output option #1790
search: add "match-only" --flag option mode #1799
search: add --not-one flag for not using exit code 1 when no match by @rzmk in #1810
sqlp: add --decimal-comma option #1832
stats: add --cache-threshold option #1795
stats: add --cache-threshold autoindex creation/deletion logic #1809
stats: add additional mode to --cache-threshold 63fdc55
stats: now computes max_precision for floats #1815
stats: add --round 9999 sentinel value support to suppress rounding #1818
stats: add is_ascii column #1824
added new benchmarks for search command 58d73c3

Changed

count: document three count modes 3d5a333
describegpt: update --max-tokens type for LLMs with larger context sizes by @rzmk #1841
excel: use simpler range::headers() to get headers 069acbf
frequency: ensure --other-sorted works with --other-text 7430ad7
frequency: microoptimize hot loop d9c01e1, 7c9f925 and
luau: improve usage text cb6b4d9
luau: we now bundle luau 0.625 from 0.622 4060975
luau: update vendored LuaDate library from 2.2.0 to 2.2.1 #1840
schema: adjust to reflect stats --cache-threshold option 92fed86
slice: move json output helpers to util 1f44b48
tojsonl: refactor boolcheck helper 74d5f5a
docs: cross-reference split & partition commands #1828
contrib(bashly): update completions.bash for qsv v0.127.0 by @rzmk in #1776
contrib(bashly): update completions.bash for qsv v0.128.0 by @rzmk in #1838
deps: upgrade to polars 0.40.0 #1831
build(deps): bump actix-web from 4.5.1 to 4.6.0 by @dependabot in #1825
build(deps): bump anyhow from 1.0.82 to 1.0.83 by @dependabot in #1798
build(deps): bump anyhow from 1.0.83 to 1.0.85 by @dependabot in #1823
build(deps): bump anyhow from 1.0.85 to 1.0.86 by @dependabot in #1826
build(deps): bump cached from 0.50.0 to 0.51.0 by @dependabot in #1789
build(deps): bump cached from 0.51.0 to 0.51.1 by @dependabot in #1793
build(deps): bump cached from 0.51.1 to 0.51.2 by @dependabot in #1802
build(deps): bump cached from 0.51.2 to 0.51.3 by @dependabot in #1805
build(deps): bump crossbeam-channel from 0.5.12 to 0.5.13 by @dependabot in #1827
build(deps): bump csvs_convert from 0.8.9 to 0.8.10 by @dependabot in #1808
build(deps): bump data-encoding from 2.5.0 to 2.6.0 by @dependabot in #1780
build(deps): bump file-format from 0.24.0 to 0.25.0 by @dependabot in #1807
build(deps): bump flate2 from 1.0.28 to 1.0.29 by @dependabot in #1778
build(deps): bump flate2 from 1.0.29 to 1.0.30 by @dependabot in #1784
build(deps): bump hashbrown from 0.14.3 to 0.14.5 by @dependabot in #1781
build(deps): bump itertools from 0.12.1 to 0.13.0 by @dependabot in #1822
deps: bump forked jsonschema from 0.17.1 to 0.18.0 f02620f
build(deps): bump mimalloc from 0.1.41 to 0.1.42 by @dependabot in #1829
build(deps): bump mlua from 0.9.7 to 0.9.8 by @dependabot in #1821
build(deps): bump qsv-stats from 0.16.0 to 0.17.1 by @dependabot in #1813
build(deps): bump qsv-stats from 0.17.1 to 0.17.2 by @dependabot in #1814
build(deps): bump qsv-stats from 0.17.2 to 0.18.0 by @dependabot in #1816
build(deps): bump ryu from 1.0.17 to 1.0.18 by @dependabot in #1801
build(deps): bump semver from 1.0.22 to 1.0.23 by @dependabot in #1800
build(deps): bump serde from 1.0.198 to 1.0.199 by @dependabot in #1777
build(deps): bump serde from 1.0.199 to 1.0.200 by @dependabot in #1787
build(deps): bump serde from 1.0.200 to 1.0.201 by @dependabot in #1804
build(deps): bump serde from 1.0.201 to 1.0.202 by @dependabot in #1817
build(deps): bump serde_json from 1.0.116 to 1.0.117 by @dependabot in #1806
build(deps): bump serial_test from 3.1.0 to 3.1.1 by @dependabot in #1779
build(deps): bump simple-expand-tilde from 0.1.5 to 0.1.6 by @dependabot in #1811
build(deps): bump sysinfo from 0.30.11 to 0.30.12 by @dependabot in https://github.com/jq...

Contributors

dependabot and rzmk

Assets 15

25 Apr 09:53

jqnatividad

0.127.0

cf4c180

0.127.0

📊 Enhanced Frequency Analysis 📊

This a quick release adding several frequency enhancements for more detailed frequency analysis. The frequency command now includes a percentage column, calculates other values, and supports limiting unique counts and negative limits.
These options provides additional context for Datapusher+, qsv-pro and describegpt so their metadata inferences are more accurate and comprehensive.

Previously, for a 775-row CSV file containing one column named state with entries for all 50 states, frequency only showed¹:

qsv frequency freq_state_example.csv | qsv table
field  value  count
state  NY     100
state  NJ     70
state  CA     60
state  MA     55
state  FL     45
state  TX     43
state  NM     40
state  AZ     39
state  NV     38
state  MI     35

Now, there's a new percentage column and other values calculation, both of which have configurable options:

qsv frequency freq_state_example.csv | qsv table
field  value       count  percentage
state  NY          100    12.90323
state  NJ          70     9.03226
state  CA          60     7.74194
state  MA          55     7.09677
state  FL          45     5.80645
state  TX          43     5.54839
state  NM          40     5.16129
state  AZ          39     5.03226
state  NV          38     4.90323
state  MI          35     4.51613
state  Other (40)  250    32.25806

This release is also out of cycle to address a big performance regression in the excel command caused by unnecessary formula info retrieval for the --error-format option introduced in 0.126.0. This has been fixed, and the excel command is now back to its speedy self.

Added

frequency: added percentage column; other values calculation, implementing #1774 #1775
benchmarks: added new frequency and excel benchmarks b83ad3a

Changed

contrib(bashly): update completions.bash for qsv v0.126.0 by @rzmk in #1771
build(deps): bump mimalloc from 0.1.39 to 0.1.41 by @dependabot in #1772
build(deps): bump qsv-stats from 0.14.0 to 0.15.0 by @dependabot in #1773
updated several indirect dependencies
applied select clippy recommendations

Fixed

excel: fixed performance regression because qsv was unnecessarily getting formula info (an expensive operation) for --error-format option even when not required 772af34
renamed 0.126.0 sqlp_vs_duckdb benchmark results so they're next to each other for easy direct comparison. 7bcd59e.
Per the benchmarks, sqlp is 2.87 times faster than duckdb v0.10.2 for a simple aggregation (0.066 secs vs 0.19 secs), and 1.42 times faster for an "expensive" aggregation (0.143 secs vs 0.203 secs).

Full Changelog: 0.126.0...0.127.0

with its default --limit setting of 10 only show the top 10 unique values in the column, sorted by occurence ↩

Contributors

dependabot and rzmk

Assets 15

22 Apr 15:35

jqnatividad

0.126.0

ecd0ac7

0.126.0

🤖 Expanded Metadata Inferencing 🤖

describegpt headlines this release, with its new ability to support other local Large Language Models (LLMs) using popular tools that serve them through APIs such as Ollama and Jan. This broadens the tool's utility in diverse AI environments. Beyond OpenAI, qsv can now use other popular LLMs like Llama 3, Mistral, and Gemma. It also unlocks expanded metadata inferencing capabilities in qsv pro.

Several commands got additional options: cat with --no-headers support in the rowskey subcommand; excel with new options like --error-format and short --metadata mode; and foreach with a --dry-run option. frequency also got new options, including --unq-limit for limiting unique counts, support for negative limits, and a --lmt-threshold option for compiling comprehensive frequencies below a threshold. slice now supports negative indices and new JSON output options, providing more flexibility in data slicing.

This is all rounded out with sqlp improvements, including support for single-line comments in SQL scripts and a special SKIP_INPUT value to skip input preprocessing when using table functions directly in Polars SQL (e.g. read_csv() and read_parquet()) - all while increasing performance thanks to the Polars engine being upgraded to 0.39.2.

New Features

cat: Added --no-headers support to the rowskey subcommand.
describegpt: Added compatibility for other local Large Language Models (LLMs) such as Ollama and Jan, broadening the tool's utility in diverse AI environments.
excel: Introduced new options in the excel command: --error-format for better error handling and a short --metadata JSON mode.
foreach: added a --dry-run option, allowing users to preview the results of scripts without executing them.
frequency: New options added such as --unq-limit for limiting unique counts; support for negative limits to only show frequencies >= abs(negative limit); and a --lmt-threshold option to allow the compilation of comprehensive frequencies below the threshold - all providing more detailed control over frequency analysis.
slice: Support for negative indices to slice from the end and new JSON output options.
sqlp: sqlp now supports single-line comments and includes a special SKIP_INPUT value for more efficient data loading. The Polars engine has also been upgraded to 0.39.2, providing enhanced performance and stability.

Changes and Optimizations

Performance Enhancements: Microoptimizations in datefmt and validate commands, and increased default length for --infer-len in sqlp for improved performance.
Dependency Updates: Numerous updates including bumping Luau, jql-runner, pyo3, and other dependencies to enhance stability and security.
Benchmarks Added: New performance benchmarks for sqlp vs duckdb added to ensure there are no performance regressions between releases. Right now, sqlp is faster than duckdb in most cases (thanks to Polars - see the latest TPC-H benchmarks), but we want to make sure that we keep it that way.

Security and Robustness

Security Fixes: Updated rustls to fix a specific CVE, and other minor fixes to enhance the security and robustness of network and data processing features.
Bug Fixes: Various bug fixes including improvements in error formatting in excel and robustness in fetch and fetchpost commands.

Added

cat: add --no-headers support to rowskey subcommand #1762
describegpt: add compatibility for other (local) LLMs (Ollama, Jan, etc.) by @rzmk in #1761
excel: add --error-format option #1721
excel: add --metadata short JSON mode #1738
foreach: add --dry-run option #1740
frequency: add --unq-limit option #1763
frequency: add support for negative --limits #1765
frequency: add --lmt-threshold option #1766
slice: add support for negative --index option values #1726
slice: implement --json output option #1729
sqlp: added support for single-line comments in SQL scripts bb52bce
sqlp: added SKIP_INPUT special value to short-circuit input processing if the user wants to
load input files directly using table functions (e.g. read_csv(), read_parquet(), etc.) fe850ad
validate: add --valid-output option #1730
contrib: add sample Bashly completions implementation by @rzmk in #1731
benchmarks: added sqlp vs duckdb benchmarks.

Changed

datefmt: microoptimize formatting 0ee27e7
joinp: adapt to breaking change in Polars 0.39 for lazyframe sort c625ca9
sqlp: change --infer-len option default from 250 to 1000 for increased performance da1d215
validate: microoptimize to_json_instance() c2e4a1c
bump Luau from 0.616 to 0.622 9216ec3
build(deps): bump jql-runner from 7.1.6 to 7.1.7 by @dependabot in #1711
build(deps): bump pyo3 from 0.21.0 to 0.21.1 by @dependabot in #1712
build(deps): bump pyo3 from 0.21.1 to 0.21.2 by @dependabot in #1750
build(deps): bump strsim from 0.11.0 to 0.11.1 by @dependabot in #1715
build(deps): bump sysinfo from 0.30.7 to 0.30.8 by @dependabot in #1716
build(deps): bump sysinfo from 0.30.8 to 0.30.9 by @dependabot in #1732
build(deps): bump sysinfo from 0.30.9 to 0.30.10 by @dependabot in #1735
build(deps): bump sysinfo from 0.30.10 to 0.30.11 by @dependabot in #1755
build(deps): bump redis from 0.25.2 to 0.25.3 by @dependabot in #1720
build(deps): bump mlua from 0.9.6 to 0.9.7 by @dependabot in #1724
build(deps): bump reqwest from 0.12.2 to 0.12.3 by @dependabot in #1725
build(deps): bump reqwest from 0.12.3 to 0.12.4 by @dependabot in #1759
build(deps): bump anyhow from 1.0.81 to 1.0.82 by @dependabot in #1733
build(deps): bump robinraju/release-downloader from 1.9 to 1.10 by @dependabot in #1734
build(deps): bump chrono from 0.4.37 to 0.4.38 by @dependabot in #1744
bump polars from 0.38 to 0.39 #1745
build(deps): bump polars from 0.39.0 to 0.39.1 by @dependabot in #1746
build(deps): bump polars from 0.39.1 to 0.39.2 by @dependabot in #1752
build(deps): bump qsv-dateparser from 0.12.0 to 0.12.1 by @dependabot in #1747
build(deps): bump serde_json from 1.0.115 to 1.0.116 by @dependabot in #1749
build(deps): bump serde from 1.0.197 to 1.0.198 by @dependabot in #1751
build(deps): bump rustls from 0.22.3 to 0.22.4 by @dependabot in #1758
build(deps): bump simple-expand-tilde from 0.1.4 to 0.1.5 by @dependabot in #1767
build(deps): bump serial_test from 3.0.0 to 3.1.0 by @dependabot in #1768
build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in #1769
applied select clippy recommendations
updated several indirect dependencies
added several benchmarks for new/changed commands
pin Rust nightly to 2024-04-15 - the same nightly that Polars 0.39 is pinned to
bumped MSRV to 1.77.2

Fixed

Make init_logger more robust #1717
count: empty CSVs count as zero also for polars. Fixes #1741 #1742
excel: fix #1682 by adding --error-format option #1689
fetch & fetchpost: more robust JSON response validation ebc7287
slice: use write! macro to get rid of GH Advanced Security lint c739097
sqlp: fixed docopt defaults that were not being parsed correctly fe850ad
deps: bump h2 from 0.4.3 to 0.4.4 ...

Contributors

dependabot and rzmk

Assets 15

Releases: dathere/qsv

0.133.1

Highlights

Added

Changed

Fixed

Removed

Contributors

0.132.0

Highlights

Added

Changed

Fixed

Removed

Contributors

0.131.1

Changed

Fixed

Contributors

0.131.0

Highlights

Added

Changed

Fixed

Removed

New Contributors

Contributors

0.130.0

Highlights

Added

Changed

Fixed

Removed

New Contributors

Contributors

0.129.1

Changed

Contributors

0.129.0

📌 Highlights (click each dropdown for more info)

Contributors

0.128.0

[0.128.0] - 2024-05-25

❤️ csv,conf,v8 Edition 🎉🏇🏽 ¡Ándale! ¡Ándale! ¡Arriba! ¡Arriba! 💨

Added

Changed

Contributors

0.127.0

📊 Enhanced Frequency Analysis 📊

Added

Changed

Fixed

Contributors

0.126.0

🤖 Expanded Metadata Inferencing 🤖

New Features

Changes and Optimizations

Security and Robustness

Added

Changed

Fixed

Contributors

❤️ csv,conf,v8 Edition 🎉
🏇🏽 ¡Ándale! ¡Ándale! ¡Arriba! ¡Arriba! 💨