Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: Add -x option to uniq, to set on which fields not to group #1456

Closed
aborruso opened this issue Dec 23, 2023 · 7 comments · Fixed by #1457
Closed

feature request: Add -x option to uniq, to set on which fields not to group #1456

aborruso opened this issue Dec 23, 2023 · 7 comments · Fixed by #1457

Comments

@aborruso
Copy link
Contributor

Hi @johnkerl,
sometimes I have CSV file with nearly one hundred of fields. And I need to run uniq, grouping by 97 of these fields.

It would be convenient not to write all 97 fields, but to write only those three for which I do not want to group.

Something like

mlr --c2p uniq -x -g color,shape 

to group for all fields other than color and shape.

Thank you

@johnkerl johnkerl changed the title feature request: add -x option to uniq, to set on which fields not to group feature request: Add -x option to uniq, to set on which fields not to group Dec 23, 2023
@johnkerl johnkerl self-assigned this Dec 23, 2023
@johnkerl
Copy link
Owner

@aborruso on #1457 it's not mlr uniq -x -g a,b,c but rather simple mlr uniq -x a,b,c

@aborruso
Copy link
Contributor Author

You are my hero, thank you!

But is there a way, to fix the fields to group by and to have in output all the input fields?

@aborruso
Copy link
Contributor Author

Probably my latest question makes no sense

@johnkerl
Copy link
Owner

@aborruso actually I don't understand ... it sounds like you're describing the existing mlr uniq -g ... can you give some example input and output data for your mlr --c2p uniq -x color,shape ?

@johnkerl
Copy link
Owner

johnkerl commented Dec 23, 2023

@aborruso given this input:

$ mlr --c2p cat docs/src/example.csv
color  shape    flag  k  index quantity rate
yellow triangle true  1  11    43.6498  9.8870
red    square   true  2  15    79.2778  0.0130
red    circle   true  3  16    13.8103  2.9010
red    square   false 4  48    77.5542  7.4670
purple triangle false 5  51    81.2290  8.5910
red    square   false 6  64    77.1991  9.5310
purple triangle false 7  65    80.1405  5.8240
yellow circle   true  8  73    63.9785  4.2370
yellow circle   true  9  87    63.5058  8.3350
purple square   false 10 91    72.3735  8.2430

and this output:

$ mlr --c2p uniq -x flag,k,index,quantity,rate docs/src/example.csv
color  shape
yellow triangle
red    square
red    circle
purple triangle
yellow circle
purple square
$ mlr --c2p uniq -c -x flag,k,index,quantity,rate docs/src/example.csv
color  shape    count
yellow triangle 1
red    square   3
red    circle   1
purple triangle 2
yellow circle   2
purple square   1

If we were to show all fields -- what would we show? The uniqifying aspect means to aggregate rows based on fields they have in common. For example, there are three rows with color=red and shape=square -- but different values for other columns. If we were to show, say, the index column, what would we show there? The value 15, or 48, or 64? Or an array consisting of [15,48,64]?

@johnkerl
Copy link
Owner

johnkerl commented Dec 23, 2023

Sorry about the close -- my merging PR #1457 auto-did that ... :(

@aborruso
Copy link
Contributor Author

That's perfect, thanks again and sorry for the confusion for my question earlier.

@johnkerl johnkerl removed the active label Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants