Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv(parse_dates=True) is broken for quoted dates #6474

Closed
2 tasks done
stinodego opened this issue Jan 26, 2023 · 4 comments · Fixed by #6854
Closed
2 tasks done

read_csv(parse_dates=True) is broken for quoted dates #6474

stinodego opened this issue Jan 26, 2023 · 4 comments · Fixed by #6854
Labels
bug Something isn't working python Related to Python Polars

Comments

@stinodego
Copy link
Member

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

I noticed this test during test suite cleanup. It passes when doing result.frame_equal(expected), but fails on assert_frame_equal.

The date column is not parsed correctly and is returned as a string.

Reproducible example

csv = textwrap.dedent(
    """a,b
"2022-01-01",1
"2022-01-02",2
"""
)
result = pl.read_csv(csv.encode(), parse_dates=True)
print(result)  # column 'a' is a string column, not date

# ┌────────────┬─────┐
# │ a          ┆ b   │
# │ ---        ┆ --- │
# │ str        ┆ i64 │
# ╞════════════╪═════╡
# │ 2022-01-01 ┆ 1   │
# │ 2022-01-02 ┆ 2   │
# └────────────┴─────┘

expected = pl.DataFrame({"a": [date(2022, 1, 1), date(2022, 1, 2)], "b": [1, 2]})
assert_frame_equal(result, expected)  # Fails

Expected behavior

I expect the column to be parsed as a date.

Installed versions

master branch

@stinodego stinodego added bug Something isn't working python Related to Python Polars labels Jan 26, 2023
@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Feb 12, 2023

Looks like this works as-expected in the latest release

In [11]: csv = textwrap.dedent(
    ...:     """a,b
    ...: "2022-01-01",1
    ...: "2022-01-02",2
    ...: """
    ...: )
    ...: result = pl.read_csv(csv.encode(), parse_dates=True)
    ...: print(result)  # column 'a' is a string column, not date
    ...:
shape: (2, 2)
┌────────────┬─────┐
│ ab   │
│ ------ │
│ datei64 │
╞════════════╪═════╡
│ 2022-01-011   │
│ 2022-01-022   │
└────────────┴─────┘

@stinodego
Copy link
Member Author

stinodego commented Feb 12, 2023

Strange, it's still broken for me.

There is a test called test_quoted_date which is currently skipped - does it pass for you if you unskip it? If so, you can make a mini-PR removing the skip marker.

@MarcoGorelli
Copy link
Collaborator

Strange, it's still broken for me.

🤔 how odd, this works for me on both 0.16.3 and on master, for both Python 3.7 and 3.11

I'm running on Ubuntu (via WSL2). Any idea what else this could be down to?

@MarcoGorelli
Copy link
Collaborator

got it, it's just due to the indentation in the test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants