Define sort! for AbstractDataFrame and fix issues of kwargs in sorting functions #2946

bkamins · 2021-11-23T18:51:25Z

@nalimilan - do you see any risks in making sort! more flexible and allow SubDataFrame?
(I will add tests if we agree on the design)

I have also proposed to be more careful and error if columns alias but are not identical, but we might decide not to add this extra check.

bkamins · 2021-11-24T08:49:33Z

@nalimilan - I have added tests. It should be good for a review

In general what I propose is I think better. Already the previous method has quadratic complexity in the number of columns, but I think it is safer to error immediately when unsafe aliasing is used.

nalimilan · 2021-11-24T22:01:04Z

Looks good!

In general what I propose is I think better. Already the previous method has quadratic complexity in the number of columns, but I think it is safer to error immediately when unsafe aliasing is used.

I just wonder whether we should optimistically check and permute each column, and undo the changes if needed (as it should be super rare). But maybe that wouldn't make a difference for performance.

src/abstractdataframe/sort.jl

bkamins · 2021-11-24T22:09:33Z

But maybe that wouldn't make a difference for performance.

This is the point. It would only improve performance if we get an error which should be super rare.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

nalimilan · 2021-11-25T08:48:05Z

This is the point. It would only improve performance if we get an error which should be super rare.

Wait, isn't that the contrary? It could be faster when no error happens, but it would be slower if it does because we would need to restore already permuted columns to their original state. But I agree the gain should be small anyway.

bkamins · 2021-11-25T09:25:17Z

undo the changes if needed

But how would you detect you need to undo this operation. The cost of doing this detection is quadratic in number of columns. So if we want to detect it we can do this detection immediately (and this is what I do now). Additionally we need to detect exact aliases and avoid permuting them (this was the behavior that we already had).

A different situation is e.g. in push! where indeed we can cheaply detect a problem post factum (as we can just check if column lengths are correct). However, doing sort! only shuffles values so we have no easy way to detect the problem.

nalimilan · 2021-11-25T09:44:50Z

I just meant we could have a single loop over columns with the contents of the two loops you have now, and in case an error happens we would roll back any already applied changes. But forget that, it's probably not worth it.

bkamins · 2021-11-25T11:17:59Z

I ended up having to standardize everything. Now all kwargs, following the rules we set in the 1.0 release have to be either scalars or vectors (tuples are not allowed - as it was announced we will not allow tuples). I have also improved docstrings, test coverage, and error checking.

nalimilan · 2021-11-25T20:29:42Z

src/abstractdataframe/sort.jl

@@ -14,6 +14,24 @@
 #                  which allows a user to specify column specific orderings
 #                  with "order(column, rev=true, ...)"

+function _check_sort_args(lt, by, rev, order)


Why not put this in function signatures instead? People should be used to the kind of MethodError that is printed. If we start checking the type of all arguments manually the codebase is going to get quite large. :-)

Totally agreed. For some reason I started copying the old design. Now all kwargs have proper type restrictions.

nalimilan · 2021-11-25T20:31:14Z

src/abstractdataframe/sort.jl

+`cols` selects no columns, check whether `df` is sorted on all columns (this
+behaviour is deprecated and will change in future versions).


This text (that we just added) needs to be adapted a bit depending on the function.

right - fixed

src/abstractdataframe/sort.jl

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

src/abstractdataframe/sort.jl

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

bkamins · 2021-11-25T21:30:30Z

@nalimilan - let me know when you think it is OK and I will do a final check and merge (there is so much copy-paste in this PR that I want to double check everything before merging).

src/abstractdataframe/sort.jl

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

bkamins · 2021-11-26T10:15:14Z

Thank you!

Define sort! for AbstractDataFrame

d37c8ac

bkamins added the feature label Nov 23, 2021

bkamins added this to the 1.x milestone Nov 23, 2021

add tests

b377f37

bkamins marked this pull request as ready for review November 24, 2021 08:48

bkamins requested a review from nalimilan November 24, 2021 08:48

add NEWS.md

8b2a361

nalimilan approved these changes Nov 24, 2021

View reviewed changes

src/abstractdataframe/sort.jl Outdated Show resolved Hide resolved

Update src/abstractdataframe/sort.jl

3c521d8

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

fix accepted arguments in sort and improve docstring

627751c

bkamins changed the title ~~Define sort! for AbstractDataFrame~~ Define sort! for AbstractDataFrame and fix issues of kwargs in sorting functions Nov 25, 2021

bkamins modified the milestones: 1.x, 1.3 Nov 25, 2021

bkamins added the bug label Nov 25, 2021

bkamins added 2 commits November 25, 2021 13:46

fix docs

00abc8d

Merge branch 'main' into bk/improve_sort!

51e900d

nalimilan reviewed Nov 25, 2021

View reviewed changes

bkamins and others added 3 commits November 25, 2021 22:11

add type restrictions to kwargs

d2ba454

Apply suggestions from code review

e1d6d2a

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

make no cols passed case properly documented

014ccd3

nalimilan reviewed Nov 25, 2021

View reviewed changes

src/abstractdataframe/sort.jl Outdated Show resolved Hide resolved

src/abstractdataframe/sort.jl Outdated Show resolved Hide resolved

src/abstractdataframe/sort.jl Outdated Show resolved Hide resolved

src/abstractdataframe/sort.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

71d59c8

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

bkamins mentioned this pull request Nov 25, 2021

Add reverse prototype #2944

Merged

nalimilan approved these changes Nov 25, 2021

View reviewed changes

src/abstractdataframe/sort.jl Outdated Show resolved Hide resolved

bkamins and others added 2 commits November 26, 2021 00:05

Update src/abstractdataframe/sort.jl

60e0bba

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

fix tests

e44bfd7

bkamins merged commit 421db4d into main Nov 26, 2021

bkamins deleted the bk/improve_sort! branch November 26, 2021 10:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define sort! for AbstractDataFrame and fix issues of kwargs in sorting functions #2946

Define sort! for AbstractDataFrame and fix issues of kwargs in sorting functions #2946

bkamins commented Nov 23, 2021

bkamins commented Nov 24, 2021

nalimilan commented Nov 24, 2021

bkamins commented Nov 24, 2021

nalimilan commented Nov 25, 2021

bkamins commented Nov 25, 2021

nalimilan commented Nov 25, 2021

bkamins commented Nov 25, 2021

nalimilan Nov 25, 2021

bkamins Nov 25, 2021

nalimilan Nov 25, 2021

bkamins Nov 25, 2021

bkamins commented Nov 25, 2021

bkamins commented Nov 26, 2021

		`cols` selects no columns, check whether `df` is sorted on all columns (this
		behaviour is deprecated and will change in future versions).

Define sort! for AbstractDataFrame and fix issues of kwargs in sorting functions #2946

Define sort! for AbstractDataFrame and fix issues of kwargs in sorting functions #2946

Conversation

bkamins commented Nov 23, 2021

bkamins commented Nov 24, 2021

nalimilan commented Nov 24, 2021

bkamins commented Nov 24, 2021

nalimilan commented Nov 25, 2021

bkamins commented Nov 25, 2021

nalimilan commented Nov 25, 2021

bkamins commented Nov 25, 2021

nalimilan Nov 25, 2021

Choose a reason for hiding this comment

bkamins Nov 25, 2021

Choose a reason for hiding this comment

nalimilan Nov 25, 2021

Choose a reason for hiding this comment

bkamins Nov 25, 2021

Choose a reason for hiding this comment

bkamins commented Nov 25, 2021

bkamins commented Nov 26, 2021