Leverage the power of polars across qsv commands #1649
Replies: 4 comments 4 replies
-
What prevents you from using that sqlp query?
|
Beta Was this translation helpful? Give feedback.
-
Great motivation for an Open Source project…
|
Beta Was this translation helpful? Give feedback.
-
This is a great idea @13minutes-yt ! Some quick timings : # unindexed taxi.csv
/usr/bin/time qsv count taxi.csv
7433139
0.89 real 0.81 user 0.05 sys
# sqlp count is 9x faster!
/usr/bin/time qsv sqlp taxi.csv "select count(*) from _t_1"
len
7433128
(1, 1)
0.10 real 0.61 user 0.09 sys
# create an index
/usr/bin/time qsv index taxi.csv
0.89 real 0.81 user 0.06 sys
# indexed count is still faster
/usr/bin/time qsv count taxi.csv
7433139
0.01 real 0.00 user 0.00 sys So if the Curiously though, sqlp returns a different count... I'll have to dig into that and report it to the Polars project if its a bug... |
Beta Was this translation helpful? Give feedback.
-
it seems the different count is due to csv parsing. I ran polars cli with default output (table) . It gives correct count while on csv (polars -o csv) it gave a discrepancy. |
Beta Was this translation helpful? Give feedback.
-
Is it possible to implement
qsv count taxi.csv
to search if polars feature is installed and utilize polars instead.qsv sqlp -Q taxi.csv "select count(*) from read_csv('taxi.csv')
. it will make it much faster.Beta Was this translation helpful? Give feedback.
All reactions