-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added 'column numbers everywhere' to csvcut #1209
Conversation
04189d6
to
a8eaadb
Compare
@jpmckinney I put it for |
@jpmckinney fixed the |
@jpmckinney for some reason https://github.com/wireservice/csvkit/actions/runs/6371290078/job/17361039598#step:4:186 |
culprit here: wireservice/agate-sql#40 |
@jpmckinney I pinned agate-sql to |
Sorry @jpmckinney , should have added a test on this feature earlier. Coverage should go up again with the last commit |
Hmm, what about putting this in csvlook? |
I think having the option to combine it with large tables is intriguing, e.g. One way how I have come to love this feature is by "collecting" relevant columns in the following way:
In that way I can quickly collect all relevant "ipsum" columns in a long and wide table and eventually just leave the (see above)
|
As implemented, this seems more like a new utility – like csvprefix – than a logical addition to csvcut. (Changing For your example, you can achieve the same with existing tools: $ csvgrep -m ipsum -c 1-75 -a test.csv | tail +2 | csvgrep -n | grep ipsum
45: evil, ipsum, with commas,
66: ipsum
74: ipsum |
I agree with you, my example was not comprehensive enough (single line only): Let's extend to the following (still not big enough, but I hope it transmits the point)
Then I can see in one "glimpse" (without working through the data line by line with
Then I can easily add |
Maybe it fits more naturally to
|
csvgrep streams the data, so it can't remove non-matching columns. To work on multiple lines, you can add xargs. I use $ csvgrep -m ipsum -c 1-75 -a test.csv | csvformat -M $'\x1e' | xargs -d $'\x1e' -I_ sh -c 'echo _ | csvcut -n' | grep ipsum
46: ipsum
69: ipsum
77: ipsum
5: ipsum
54: ipsum |
okay, point taken, so the conclusion would be to add a new tool named
|
Is there an issue with using csvkit is in maintenance mode, so only existing commands are being maintained. Additional commands would have to be created as independent Python packages. |
Hm, I'm not an expert and probably there's a way to fix it but I could not get it run with:
At the same time, it revealed that it may not be the best idea either as it tries to execute content as code:
|
You just need to quote the replstr in xargs:
|
That again returns invalid results for me
(the file only has 75 columns) |
Ah, yeah, need to use single quotes:
|
Even more robust:
|
The last one does the trick. Thanks for adding it to the docs - I will need to look this up now every time. Still believe that the |
This would have been better as an issue, but I'll close this PR for now, and perhaps if there is any interest from other users they will still find the PR discussion. |
I agree with you, @jpmckinney, that this would have been better as an issue. Thanks for your time! |
I've run frequently into the problem that I'm facing wide CSV files where there is some interesting string/text somewhere in a long line of empty fields or numbers.
Example:
We want to approach this without manually counting the commas.
The new addition provides this and in combination with
grep
the result will be highlighted:This is not a breaking change as it keeps all columns where they are. It alters all the fields but it's sole purpose is for exploratory reasons.
Also, please let me know if you know a better or any solution how the above can be achieved without this modification.