Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow DataFrame(matrix, names, copycols=false) #2860

Closed
knuesel opened this issue Sep 6, 2021 · 9 comments · Fixed by #2859
Closed

Allow DataFrame(matrix, names, copycols=false) #2860

knuesel opened this issue Sep 6, 2021 · 9 comments · Fixed by #2859
Milestone

Comments

@knuesel
Copy link
Contributor

knuesel commented Sep 6, 2021

The documentation suggests that e.g. DataFrame(rand(2,2), [:a, :b], copycols=false) should be supported:

DataFrame(columns::AbstractVecOrMat,
names::AbstractVector{<:Union{AbstractVector, Symbol}};
makeunique::Bool=false, copycols::Bool=true)

but it fails with got unsupported keyword argument "copycols".

It would be nice to have this as a simpler version of

A = rand(2,2)
DataFrame(collect(eachcol(A)), [:a, :b], copycols=false)
@bkamins bkamins added the feature label Sep 6, 2021
@bkamins bkamins added this to the 1.3 milestone Sep 6, 2021
@bkamins bkamins added the doc label Sep 6, 2021
@bkamins
Copy link
Member

bkamins commented Sep 6, 2021

I am not sure we want it. I would rather fix the documentation that we do not allow it.

The reason is that DataFrame(collect(eachcol(A)), [:a, :b], copycols=false) will be surprising for most of users as the columns will be views which, while valid, are tricky to work with.

What is your use case?

@nalimilan - do you have any opinion here?

@nalimilan
Copy link
Member

Interesting corner case! I'd say that we could allow copycols=false if somebody has a use case for it. But I wouldn't add it just for completeness, as the result could indeed be surprising.

@bkamins
Copy link
Member

bkamins commented Sep 6, 2021

Agreed - that is why I asked for the use case. Again - most users find the behavior of DataFrames.jl confusing if the columns are views, as you cannot e.g. push! or append!.

@knuesel
Copy link
Contributor Author

knuesel commented Sep 6, 2021

I find the views reasonable since the user is basically asking for them by writing copycols=false. I expected it to work because the other "natural" ways to use views are not restricted (for example DataFrame([:a, :b] .=> eachcol(A), copycols=false)). But it's true there is a difference since here the views would be created by DataFrames.jl.

My use case: I was loading data from "large" Excel files (1 million rows). The column names are actually on the 4th row, and some columns must be discarded, so I get the actual data using XLSX.readxlsx(file)[sheetname][range] which returns a matrix. I was trying to wrap this in a data frame without copy. Full disclosure: I'll probably end up copying the columns anyway to narrow the types.

@bkamins
Copy link
Member

bkamins commented Sep 6, 2021

I was trying to wrap this in a data frame without copy.

Here is the crucial point of the question. Why do you want to avoid copying (apart from the fact that "you can"). Is there some reason to do it (the only I can see is to save memory, as copying should be fast).

@knuesel
Copy link
Contributor Author

knuesel commented Sep 6, 2021

Indeed it was to save memory: some notebooks get very memory intensive as memory won't be reclaimed from temporary variables unless they're manually reassigned, and mybinder.org instances for example are only 1-2 GB...

@bkamins
Copy link
Member

bkamins commented Sep 6, 2021

OK - I think we can have this change. I will add it to #2859

@bkamins bkamins linked a pull request Sep 6, 2021 that will close this issue
@bkamins
Copy link
Member

bkamins commented Sep 6, 2021

Added in #2859. Can you please test?

@knuesel
Copy link
Contributor Author

knuesel commented Sep 7, 2021

@bkamins sorry for the delay, I confirm it's now working for me on the main branch. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants