Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add DataFrame.transpose/2 #820

Merged
merged 3 commits into from
Jan 15, 2024

Conversation

lkarthee
Copy link
Member

df = Explorer.DataFrame.new(a: [32.0, 33.0], b: [1, 2], c: ["a", "b"])
DF.print(df)
dft = DF.transpose(df, include_header: true, header_name: "names", column_names: ["x", "y"]) #, 

+--------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 3]  |
+--------------+--------------+--------------+
|    names     |      x       |      y       |
|   <string>   |   <string>   |   <string>   |
+==============+==============+==============+
| a            | 32.0         | 33.0         |
+--------------+--------------+--------------+
| b            | 1            | 2            |
+--------------+--------------+--------------+
| c            | a            | b            |
+--------------+--------------+--------------+

Comment on lines 4217 to 4218
* `:include_header` - if `true` column names are added as first column or header column. (default: `false`)
* `:header_name` - when `include_header` is `true`, sets name of the header column. (default: `column`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need both options. We can have a :headeroption. It may be false (default), true (default to column) or a string with the header name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should :column_names be columns ? I see it as a common parameter for many DF functions ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have changed :column_names to column. I can revert it..

header =
case opts[:header] do
false -> false
nil -> nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nil -> nil

Copy link
Member Author

@lkarthee lkarthee Jan 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid to_column_name converting nil to string, I added that clause -> nil -> nil ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it would ever be nil? The accepted values should be Boolean or string, no?

Copy link
Member Author

@lkarthee lkarthee Jan 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atoms are also accepted- they are converted to strings by to_column_name(). So when user passes nil, it will be converted to "nil". Just to guard it I put that condition. Is nil to "nil" conversion is acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling atoms makes sense. Would an ArgumentError be more appropriate for header: nil?

Also I think guards on the to_column_name clause might be helpful.

Copy link
Member Author

@lkarthee lkarthee Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we disallow nil, user may pass "nil". df[nil] is valid, even though it looks funny.

iex> dft = DF.transpose(df, header: "nil", columns: ["x", "y"])
iex> dft[nil]
#Explorer.Series<
  Polars[3]
  string ["a", "b", "c"]
>

Not much we can do here. I have accepted Jose code suggestion and removed the statement.

Co-authored-by: José Valim <jose.valim@gmail.com>
assert DF.shape(df) == {2, 3}

assert_raise RuntimeError,
"Polars Error: lengths don't match: Length of new column names must be the same as the row count",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we want to validate what we can on Explorer side. We only let Polars raise when the error requires traversing the series itself. Can you validate this one in Elixir too? Thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in #823

Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! @cigrainger should decide on the API (name and options). :)

Copy link
Member

@cigrainger cigrainger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this can be tough to do with pivot_wider. It's a bit niche, but all good. I'm cool with transpose, and I think you guys got to the ideal set of opts w/ header and columns. I'm happy with this!

@josevalim josevalim merged commit 82ae428 into elixir-explorer:main Jan 15, 2024
4 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@lkarthee lkarthee deleted the add_transpose_func branch January 17, 2024 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants