-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyo3_runtime.PanicException: DataType: [] not supported in writing to csv #6038
Comments
Csv's should not have list values. Flatten your data or use another format, such as arrow, parquet, json. |
What are you referencing when saying so? |
We follow the RFC 4180: https://datatracker.ietf.org/doc/html/rfc4180 as the closest thing to a reference on what is allowed in CSV. CSV is not a format well suited for nested data. You can encode your data in a string column and then serialize that later, but that is not something we will support. It is best to use formats designed to work with nested data or if you want to use csv, transform your table to long format. We can improve the error and suggest other formats. |
I thought so. But this RFC doesn't say anything about not allowing lists. Just trying to understand the reason... |
CSV is ill suited for nested data, so we do not support it. We like to focus and improve the data structures that are well suited for a certain task. There are good alternatives: JSON, parquet, IPC |
You can serialize your list by converting them first to a list with strings and adding a delimeter that is different from your column delimiter and not appearing in your list data: In [73]: df.with_columns([pl.col("list").cast(pl.List(pl.Utf8)).arr.join(";")])
Out[73]:
shape: (1, 2)
┌─────────┬──────┐
│ text ┆ list │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪══════╡
│ sample1 ┆ 1;2 │
└─────────┴──────┘
In [74]: df.with_columns([pl.col("list").cast(pl.List(pl.Utf8)).arr.join(";")]).write_csv()
Out[74]: 'text,list\nsample1,1;2\n'
In [75]: pl.read_csv(b'text,list\nsample1,1;2\n', sep=",").with_columns([pl.col("list").str.split(";").cast(pl.List(pl.Int64))])
Out[75]:
shape: (1, 2)
┌─────────┬───────────┐
│ text ┆ list │
│ --- ┆ --- │
│ str ┆ list[i64] │
╞═════════╪═══════════╡
│ sample1 ┆ [1, 2] │
└─────────┴───────────┘ |
Thanks, but only if I was the same person who will read the files :) I was looking for Pandas replacement and Polars is very attractive. Alas, I can't use it "the Polars" way in term of final output. And having conversion to Pandas just to have lists in CSV seems... odd. |
Why don't you take a file format that is designed for nested data? |
Legacy support. Anyway, I can live with Pandas. |
With Pandas you would also not be able to read that CSV data back as a list column as it would write import io
import pandas as pd
import polars as pl
In [111]: df.to_pandas().to_csv()
Out[111]: ',text,list\n0,sample1,[1 2]\n'
In [125]: df_pd = pd.read_csv(io.StringIO(df.to_pandas().to_csv()))
In [126]: df_pd
Out[126]:
Unnamed: 0 text list
0 0 sample1 [1 2]
In [127]: df_pd["list"][0]
Out[127]: '[1 2]' |
I know! That's the point. I can't do that on Polars without Pandas |
You see that the datatype read by pandas is a |
Sure. Why? Is any way to convert that to the same using Polars? Because it's not so obvious... >>> df.with_columns([pl.col("list").str.decode("utf8")])
...
ValueError: encoding must be one of {'hex', 'base64'}, got utf8 |
So it looks like this or similar is running under the hood during CSV export: >>> df.with_columns([(pl.col("list") + "")])
...
pyo3_runtime.PanicException: this operation is not implemented/valid for this dtype: List(Int64) When more appropriate way would be: >>> df.apply(lambda t: (t[0], str(t[1])))
shape: (1, 2)
┌──────────┬──────────┐
│ column_0 ┆ column_1 │
│ --- ┆ --- │
│ str ┆ str │
╞══════════╪══════════╡
│ sample1 ┆ [1, 2] │
└──────────┴──────────┘
>>> But running internally without UDFs. |
In [129]: df.with_columns([(pl.lit("[") + pl.col("list").cast(pl.List(pl.Utf8)).arr.join(" ") + pl.lit("]")).alias("list")])
Out[129]:
shape: (1, 2)
┌─────────┬───────┐
│ text ┆ list │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪═══════╡
│ sample1 ┆ [1 2] │
└─────────┴───────┘ Or a bit nicer wrapped in a function: def list_to_str(df, list_col_name):
return df.with_column(
(
pl.lit("[") + pl.col(list_col_name).cast(pl.List(pl.Utf8)).arr.join(" ") + pl.lit("]")
).alias(list_col_name)
)
In [135]: df.pipe(list_to_str, "list")
Out[135]:
shape: (1, 2)
┌─────────┬───────┐
│ text ┆ list │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪═══════╡
│ sample1 ┆ [1 2] │
└─────────┴───────┘ |
That doesn't solve issue with nested lists, but at least demonstrate why it is not so trivial to resolve. |
I faced with the same problem. Using list is common, but polars made it hard. Maybe polars can take it into account to make itself better. |
this would be valuable to me so that I can serialize my dataframe for use in COPY operations into postgres (I sometimes work with array and json columns). I'm not aware of any easy way to do this or any implementations of polars dataframe to postgres binary COPY format. the function above mostly works for me though to get into a csv so thanks! (also apologies for commenting on a closed issue) |
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Trying to export DataFrame with data type List causes the exception.
Reproducible example
Expected behavior
Similar to what you got from Pandas:
Installed versions
The text was updated successfully, but these errors were encountered: