sample
: add random sampling of remote CSVs without downloading the entire CSV first using http range requests
#2140
Labels
When sampling a remote CSV, qsv has to download the file first into a tempfile, before commencing sampling.
Even with the new
--max-size
option, we're limited to sampling only the downloaded portion.For servers that support http range requests (which is pretty much most modern servers) and provide http content-length info, do the sampling using range-requests calls instead.
This should allow qsv to sample very large CSV files quickly as we don't need to download to a temporary file first.
When implementing, ensure to download the first N rows (default:1000?) so we can get the header and infer the schema.
The text was updated successfully, but these errors were encountered: