Releases: neilotoole/sq
v0.48.4
v0.48.3
Small bugfix release.
Fixed
- #415: The JSON ingester could fail due to a bug when a JSON blob landed on the edge of a buffer.
- The JSON ingester wasn't able to handle the case where a post-sampling JSON field had a different kind from the kind determined by the sampling process. For example, let's say the sample size was 1000, and the field
zip
was determined to be of kindint
, because values 0-1000 were all parseable as integers. But then the 1001st value wasBX123
, which obviously is not an integer.sq
will now see the non-integer value, and alter the ingest DB schema to a compatible kind, e.g.text
. This flexibility is powerful, but it does come at the cost of slower ingest speed. But that's a topic for another release.
v0.48.1
This release features significant improvements to sq diff
.
Added
-
Previously
sq diff --data
diffed every row, which could get crazy with a large table. Now the command stops after N differences, where N is controlled by the--stop
(-n
) flag, or the new config optiondiff.stop
. The default stop-after value is3
; set to0
to show all differences.# Stop on first difference $ sq diff @prod.actor @staging.actor --data --stop 1 # Stop after 5 differences, using the -n shorthand flag $ sq diff @prod.actor @staging.actor --data -n5
-
#353: The performance of
sq diff
has been significantly improved. There's still more to do. -
Previously,
sq diff --data
compared the rendered (text) representation of each value. This could lead to inaccurate results, for example with two timestamp values in different time zones, but the text rendering omitted the time zone. Now,sq diff --data
compares the raw values, not the rendered text. Note in particular with time values that both time and location components are compared. -
sq
can now handle a SQLite DB onstdin
. This is useful for testing, or for working with SQLite DBs in a pipeline.$ cat sakila.db | sq '.actor | .first_name, .last_name'
It's also surprisingly handy in daily life, because there are sneaky SQLite DBs all around us. Let's see how many text messages I've sent and received over the years:
$ cat ~/Library/Messages/chat.db | sq '.message | count' count 215439
I'm sure that number makes me an amateur with these millenials 👴🏻.
Note that you'll need to enable macOS Full Disk Access to read the
chat.db
file. -
sq
now allows you to usetrue
andfalse
literals in queries. Which, in hindsight, does seem like a bit of an oversight 😳. (Although previously you could usually get away with using1
and0
).$ sq '.people | where(.is_alive == false)' name is_alive Kubla Khan false $ sq '.people | where(.is_alive == true)' name is_alive Kaiser Soze true
Changed
-
☢️ Previously,
sq diff
only exited non-zero on an error. Now,sq diff
exits0
when no differences, exits1
if differences are found, and exits2
on any error. This aligns with the behavior of GNU diff:Exit status is 0 if inputs are the same, 1 if different, 2 if trouble.
-
Minor fiddling with the color scheme for some command output.
v0.47.4
Patch release with changes to flags. See the earlier v0.47.0
release for recent headline features.
Added
-
By default,
sq
prints source locations with the password redacted. This is a sensible default, but there are legitimate reasons to access the unredacted connection string. Thus a new global flag--no-redact
(and a correspondingredact
config option).# Default behavior: password is redacted $ sq src -v @sakila/pg12 postgres postgres://sakila:xxxxx@192.168.50.132/sakila # Unredacted $ sq src -v --no-redact @sakila/pg12 postgres postgres://sakila:p_ssW0rd@192.168.50.132/sakila
-
Previously, if an error occurred when
verbose
was true, anderror.format
wastext
,sq
would print a stack trace tostderr
. This was poor default behavior, flooding the user terminal, so the default is now no stack trace. To restore the previous behavior, use the new-E
(--error.stack
) flag, or set theerror.stack
config option.
Changed
-
The
--src.schema
flag (as used insq inspect
,sq sql
, and the rootsq
cmd) now accepts--src.schema=CATALOG.
. Note the.
suffix onCATALOG.
. This is in addition to the existing allowed formsSCHEMA
andCATALOG.SCHEMA
. This newCATALOG.
form is effectively equivalent toCATALOG.CURRENT_SCHEMA
.# Inspect using the default schema in the "sales" catalog $ sq inspect --src.schema=sales.
-
The
--src.schema
flag is now validated. Previously, if you provided a non-existing catalog or schema value,sq
would silently ignore it and use the defaults. This could mislead the user into thinking that they were getting valid results from the non-existent catalog or schema. Now an error is returned.
v0.47.3
Minor bug fix release. See the earlier v0.47.0
release for recent headline features.
Fixed
- Shell completion for
bash
only worked for top-level commands, not for subcommands, flags, args, etc. This bug was due to an unnoticed behavior change in an imported library 🤦♂️. It's now fixed, and tests have been added.
Changed
- Shell completion now initially suggests only sources within the active group. Previously, all sources were suggested, potentially flooding the user with irrelevant suggestions. However, if the user continues to input a source handle that is outside the active group, completion will suggest all matching sources. This behavior is controlled via the new config option
shell-completion.group-filter
.
v0.47.2
v0.47.1
v0.47.0
This is a significant release, focused on improving i/o, responsiveness, and performance. The headline features are caching of ingested data for document sources such as CSV or Excel, and download caching for remote document sources. There are a lot of under-the-hood changes, so please open an issue if you encounter any weirdness.
Added
- Long-running operations (such as data ingestion, or file download) now result in a progress bar being displayed. Display of the progress bar is controlled by the new config options
progress
andprogress.delay
. You can also use the--no-progress
flag to disable the progress bar.- 👉 The progress bar is rendered on
stderr
and is always zapped from the terminal when command output begins. It won't corrupt the output.
- 👉 The progress bar is rendered on
- #307: Ingested document sources (such as CSV or Excel) now make use of an ingest cache DB. Previously, ingestion of document source data occurred on each
sq
command. It is now a one-time cost; subsequent use of the document source utilizes the cache DB. Until, that is, the source document changes: then the ingest cache DB is invalidated and ingested again. This is a significantly improved experience for large document sources. - There are several new commands to interact with the cache (although you shouldn't need to):
sq cache enable
andsq cache disable
control cache usage. You can also instead use the newingest.cache
config option.sq cache clear
clears the cache.sq cache location
prints the cache location on disk.sq cache stat
shows stats about the cache.sq cache tree
shows a tree view of the cache.
- #24: The download mechanism for remote document sources (e.g. a CSV file at
https://sq.io/testdata/actor.csv
) has been completely overhauled. Previously,sq
would re-download the remote file on every command. Now, the remote file is downloaded and cached locally. Subsequentsq
invocations check for staleness of the cached download, and re-download if necessary. - As part of the download revamp, new config options have been introduced:
http.request.timeout
is the timeout for the initial response from the server, andhttp.response.timeout
is the timeout for reading the entire response body. We separate these two timeouts because it's possible that the server responds quickly, but then for a large file, the download takes too long.https.insecure-skip-verify
controls whether HTTPS connections verify the server's certificate. This is useful for remote files served with a self-signed certificate.download.cache
controls whether remote files are cached locally.download.refresh.ok-on-err
controls whethersq
should continue with a stale cached download if an error occurred while trying to refresh the download. This is a sort of "Airplane Mode" for remote document sources:sq
continues with the cached download when the network is unavailable.
- There are two more new config options introduced as part of the above work.
cache.lock.timeout
controls the time thatsq
will wait for a lock on the cache DB. The cache lock is introduced for when you have multiplesq
commands running concurrently, and you want to avoid them stepping on each other.- Similarly,
config.lock.timeout
controls the timeout for acquiring the (newly-introduced) lock onsq
's config file. This helps prevent issues with multiplesq
processes mutating the config concurrently.
sq
's own logs previously outputted in JSON format. Now there's a newlog.format
config option that permits setting the log format tojson
ortext
. Thetext
format is more human-friendly, and is now the default.
Changed
Fixed
- Opening a DB connection now correctly honors
conn.open-timeout
.
v0.46.1
Fixed
sq
sometimes failed to read from stdin if piped input was slow to arrive. This is now fixed.
v0.46.0
Added
-
#338: While
sq
has hadgroup_by
for some time, somehow thehaving
mechanism was never implemented. That's fixed.$ sq '.payment | .customer_id, sum(.amount) | group_by(.customer_id) | having(sum(.amount) > 200)' customer_id sum(.amount) 526 221.55 148 216.54
-
#340: The
group_by
function now has a synonymgb
, andorder_by
now has synonymob
. These synonyms are experimental 🧪. The motivation is to reduce typing, especially the underscore (_
) in both function names, but it's not clear that the loss of clarity is worth it. Maybe synonymsgroup
andorder
might be better? Feedback welcome.# Previously $ sq '.payment | .customer_id, sum(.amount) | group_by(.customer_id) | order_by(.customer_id)' # Now $ sq '.payment | .customer_id, sum(.amount) | gb(.customer_id) | ob(.customer_id)'
-
#340:
sq inspect
: added flag shorthand-C
for--catalogs
and-S
for--schemata
. These were the onlyinspect
flags without shorthand.