pyo3_runtime.PanicException: DataType: [] not supported in writing to csv #6038

ovcharenko · 2023-01-04T12:28:04Z

Polars version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Issue description

Trying to export DataFrame with data type List causes the exception.

Reproducible example

import polars as pl

df = pl.DataFrame({
    "text": ["sample1"],
    "list": [[1, 2]]
})

df.write_csv()

Expected behavior

Similar to what you got from Pandas:

>>> df.to_pandas().to_csv(index=False)
'text,list\nsample1,[1 2]\n'

Installed versions

---Version info---
Polars: 0.15.11
Index type: UInt32
Platform: macOS-13.1-arm64-arm-64bit
Python: 3.10.9 (main, Dec 15 2022, 17:11:09) [Clang 14.0.0 (clang-1400.0.29.202)]
---Optional dependencies---
pyarrow: 10.0.1
pandas: 1.5.2
numpy: 1.23.5
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: <not installed>

The text was updated successfully, but these errors were encountered:

ritchie46 · 2023-01-04T12:44:50Z

Csv's should not have list values. Flatten your data or use another format, such as arrow, parquet, json.

ovcharenko · 2023-01-04T12:53:48Z

Csv's should not have list values. Flatten your data or use another format, such as arrow, parquet, json.

What are you referencing when saying so?

ritchie46 · 2023-01-04T13:01:29Z

We follow the RFC 4180: https://datatracker.ietf.org/doc/html/rfc4180 as the closest thing to a reference on what is allowed in CSV.

CSV is not a format well suited for nested data. You can encode your data in a string column and then serialize that later, but that is not something we will support.

It is best to use formats designed to work with nested data or if you want to use csv, transform your table to long format.

We can improve the error and suggest other formats.

ovcharenko · 2023-01-04T13:09:52Z

I thought so. But this RFC doesn't say anything about not allowing lists. Just trying to understand the reason...

ritchie46 · 2023-01-04T13:19:18Z

Just trying to understand the reason

CSV is ill suited for nested data, so we do not support it. We like to focus and improve the data structures that are well suited for a certain task. There are good alternatives: JSON, parquet, IPC

ghuls · 2023-01-04T13:20:09Z

You can serialize your list by converting them first to a list with strings and adding a delimeter that is different from your column delimiter and not appearing in your list data:

In [73]: df.with_columns([pl.col("list").cast(pl.List(pl.Utf8)).arr.join(";")])
Out[73]: 
shape: (1, 2)
┌─────────┬──────┐
│ text    ┆ list │
│ ---     ┆ ---  │
│ str     ┆ str  │
╞═════════╪══════╡
│ sample1 ┆ 1;2  │
└─────────┴──────┘

In [74]: df.with_columns([pl.col("list").cast(pl.List(pl.Utf8)).arr.join(";")]).write_csv()
Out[74]: 'text,list\nsample1,1;2\n'

In [75]: pl.read_csv(b'text,list\nsample1,1;2\n', sep=",").with_columns([pl.col("list").str.split(";").cast(pl.List(pl.Int64))])
Out[75]: 
shape: (1, 2)
┌─────────┬───────────┐
│ text    ┆ list      │
│ ---     ┆ ---       │
│ str     ┆ list[i64] │
╞═════════╪═══════════╡
│ sample1 ┆ [1, 2]    │
└─────────┴───────────┘

ovcharenko · 2023-01-04T13:29:29Z

You can serialize your list by converting them first to a list with strings and adding a delimeter that is different from your column delimiter and not appearing in your list data:

In [73]: df.with_columns([pl.col("list").cast(pl.List(pl.Utf8)).arr.join(";")])
Out[73]: 
shape: (1, 2)
┌─────────┬──────┐
│ text    ┆ list │
│ ---     ┆ ---  │
│ str     ┆ str  │
╞═════════╪══════╡
│ sample1 ┆ 1;2  │
└─────────┴──────┘

In [74]: df.with_columns([pl.col("list").cast(pl.List(pl.Utf8)).arr.join(";")]).write_csv()
Out[74]: 'text,list\nsample1,1;2\n'

In [75]: pl.read_csv(b'text,list\nsample1,1;2\n', sep=",").with_columns([pl.col("list").str.split(";").cast(pl.List(pl.Int64))])
Out[75]: 
shape: (1, 2)
┌─────────┬───────────┐
│ text    ┆ list      │
│ ---     ┆ ---       │
│ str     ┆ list[i64] │
╞═════════╪═══════════╡
│ sample1 ┆ [1, 2]    │
└─────────┴───────────┘

Thanks, but only if I was the same person who will read the files :) I was looking for Pandas replacement and Polars is very attractive. Alas, I can't use it "the Polars" way in term of final output. And having conversion to Pandas just to have lists in CSV seems... odd.

ritchie46 · 2023-01-04T13:50:23Z

Thanks, but only if I was the same person who will read the files :) I was looking for Pandas replacement and Polars is very attractive. Alas, I can't use it "the Polars" way in term of final output. And having conversion to Pandas just to have lists in CSV seems... odd.

Why don't you take a file format that is designed for nested data?

ovcharenko · 2023-01-04T14:00:08Z

Thanks, but only if I was the same person who will read the files :) I was looking for Pandas replacement and Polars is very attractive. Alas, I can't use it "the Polars" way in term of final output. And having conversion to Pandas just to have lists in CSV seems... odd.

Why don't you take a file format that is designed for nested data?

Legacy support. Anyway, I can live with Pandas.

ghuls · 2023-01-04T14:02:58Z

Thanks, but only if I was the same person who will read the files :) I was looking for Pandas replacement and Polars is very attractive. Alas, I can't use it "the Polars" way in term of final output. And having conversion to Pandas just to have lists in CSV seems... odd.

With Pandas you would also not be able to read that CSV data back as a list column as it would write [1 2] and read it a just a string column without post processing.

import io

import pandas as pd
import polars as pl

In [111]: df.to_pandas().to_csv()
Out[111]: ',text,list\n0,sample1,[1 2]\n'

In [125]: df_pd = pd.read_csv(io.StringIO(df.to_pandas().to_csv()))

In [126]: df_pd
Out[126]: 
   Unnamed: 0     text   list
0           0  sample1  [1 2]

In [127]: df_pd["list"][0]
Out[127]: '[1 2]'

ovcharenko · 2023-01-04T14:04:53Z

Thanks, but only if I was the same person who will read the files :) I was looking for Pandas replacement and Polars is very attractive. Alas, I can't use it "the Polars" way in term of final output. And having conversion to Pandas just to have lists in CSV seems... odd.

With Pandas you would also not be able to read that CSV data back as a list column as it would write [1 2] and read it a just a string column without post processing.
import io

import pandas as pd
import polars as pl

In [111]: df.to_pandas().to_csv()
Out[111]: ',text,list\n0,sample1,[1 2]\n'

In [125]: df_pd = pd.read_csv(io.StringIO(df.to_pandas().to_csv()))

In [126]: df_pd
Out[126]: 
   Unnamed: 0     text   list
0           0  sample1  [1 2]

In [127]: df_pd["list"][0]
Out[127]: '[1 2]'

I know! That's the point. I can't do that on Polars without Pandas

ritchie46 · 2023-01-04T14:06:19Z

I know! That's the point. I can't do that on Polars without Pandas

You see that the datatype read by pandas is a string, not a list<i64>?

ovcharenko · 2023-01-04T15:03:59Z

I know! That's the point. I can't do that on Polars without Pandas

You see that the datatype read by pandas is a string, not a list<i64>?

Sure. Why? Is any way to convert that to the same using Polars? Because it's not so obvious...

>>> df.with_columns([pl.col("list").str.decode("utf8")])
...
ValueError: encoding must be one of {'hex', 'base64'}, got utf8

ovcharenko · 2023-01-04T15:31:52Z

So it looks like this or similar is running under the hood during CSV export:

>>> df.with_columns([(pl.col("list") + "")])
...
pyo3_runtime.PanicException: this operation is not implemented/valid for this dtype: List(Int64)

When more appropriate way would be:

>>> df.apply(lambda t: (t[0], str(t[1])))
shape: (1, 2)
┌──────────┬──────────┐
│ column_0 ┆ column_1 │
│ ---      ┆ ---      │
│ str      ┆ str      │
╞══════════╪══════════╡
│ sample1  ┆ [1, 2]   │
└──────────┴──────────┘
>>>

But running internally without UDFs.

ghuls · 2023-01-04T15:41:50Z

In [129]: df.with_columns([(pl.lit("[") + pl.col("list").cast(pl.List(pl.Utf8)).arr.join(" ") + pl.lit("]")).alias("list")])
Out[129]: 
shape: (1, 2)
┌─────────┬───────┐
│ text    ┆ list  │
│ ---     ┆ ---   │
│ str     ┆ str   │
╞═════════╪═══════╡
│ sample1 ┆ [1 2] │
└─────────┴───────┘

Or a bit nicer wrapped in a function:

def list_to_str(df, list_col_name):
    return df.with_column(
        (
            pl.lit("[") + pl.col(list_col_name).cast(pl.List(pl.Utf8)).arr.join(" ") + pl.lit("]")
        ).alias(list_col_name)
    )

In [135]: df.pipe(list_to_str, "list")
Out[135]: 
shape: (1, 2)
┌─────────┬───────┐
│ text    ┆ list  │
│ ---     ┆ ---   │
│ str     ┆ str   │
╞═════════╪═══════╡
│ sample1 ┆ [1 2] │
└─────────┴───────┘

ovcharenko · 2023-01-04T16:43:00Z

That doesn't solve issue with nested lists, but at least demonstrate why it is not so trivial to resolve.

shamy1997 · 2023-02-07T07:23:57Z

I faced with the same problem. Using list is common, but polars made it hard. Maybe polars can take it into account to make itself better.

rafmagns-skepa-dreag · 2023-02-22T22:33:26Z

this would be valuable to me so that I can serialize my dataframe for use in COPY operations into postgres (I sometimes work with array and json columns). I'm not aware of any easy way to do this or any implementations of polars dataframe to postgres binary COPY format. the function above mostly works for me though to get into a csv so thanks! (also apologies for commenting on a closed issue)

ovcharenko added bug Something isn't working python Related to Python Polars labels Jan 4, 2023

ritchie46 mentioned this issue Jan 4, 2023

feat(rust, python): improve error message when writing nested data to… #6040

Merged

ritchie46 closed this as completed in #6040 Jan 4, 2023

lkarthee mentioned this issue Jan 3, 2024

remove csv export for list column - not supported by polars livebook-dev/kino_explorer#130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyo3_runtime.PanicException: DataType: [] not supported in writing to csv #6038

pyo3_runtime.PanicException: DataType: [] not supported in writing to csv #6038

ovcharenko commented Jan 4, 2023

ritchie46 commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ritchie46 commented Jan 4, 2023 •

edited

Loading

ovcharenko commented Jan 4, 2023

ritchie46 commented Jan 4, 2023

ghuls commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ritchie46 commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ghuls commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ritchie46 commented Jan 4, 2023 •

edited

Loading

ovcharenko commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ghuls commented Jan 4, 2023 •

edited

Loading

ovcharenko commented Jan 4, 2023

shamy1997 commented Feb 7, 2023

rafmagns-skepa-dreag commented Feb 22, 2023 •

edited

Loading

pyo3_runtime.PanicException: DataType: [] not supported in writing to csv #6038

pyo3_runtime.PanicException: DataType: [] not supported in writing to csv #6038

Comments

ovcharenko commented Jan 4, 2023

Polars version checks

Issue description

Reproducible example

Expected behavior

Installed versions

ritchie46 commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ritchie46 commented Jan 4, 2023 • edited Loading

ovcharenko commented Jan 4, 2023

ritchie46 commented Jan 4, 2023

ghuls commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ritchie46 commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ghuls commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ritchie46 commented Jan 4, 2023 • edited Loading

ovcharenko commented Jan 4, 2023

ovcharenko commented Jan 4, 2023

ghuls commented Jan 4, 2023 • edited Loading

ovcharenko commented Jan 4, 2023

shamy1997 commented Feb 7, 2023

rafmagns-skepa-dreag commented Feb 22, 2023 • edited Loading

ritchie46 commented Jan 4, 2023 •

edited

Loading

ritchie46 commented Jan 4, 2023 •

edited

Loading

ghuls commented Jan 4, 2023 •

edited

Loading

rafmagns-skepa-dreag commented Feb 22, 2023 •

edited

Loading