Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1797580: Integer columns contain Na after filtering when using to pandas in local testing #2598

Open
frederiksteiner opened this issue Nov 11, 2024 · 4 comments
Assignees
Labels
bug Something isn't working status-triage_done Initial triage done, will be further handled by the driver team

Comments

@frederiksteiner
Copy link
Contributor

  1. What version of Python are you using?

    Python 3.11.8

  2. What operating system and processor architecture are you using?

    Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35

  3. What are the component versions in the environment (pip freeze)?

    snowflake-connector-python==3.12.3
    snowflake-snowpark-python==1.24.0

  4. What did you do?

if __name__ == "__main__":
    from snowflake.snowpark import Session
    import snowflake.snowpark.functions as spf
    conn_params = {
        "schema": "SCHEMA",
        "local_testing": True,
    }

    session = Session.builder.configs(conn_params).create()
    data = [
        [1, False],
        [1, False],
        [1, False],
        [2, True],
    ]
    schema = ["INT_COL", "BOOL_COL"]
    df = session.create_dataframe(data, schema)
    df = df.with_column("INT_COL", spf.cast("INT_COL", "int"))
    filtered = df.filter(
            spf.col("BOOL_COL")
        )
    pd_df = filtered.to_pandas()
    collected = filtered.collect()
  1. What did you expect to see?

    That the pd_df has the same data as collected. But the int column is NaN for the pandas df. I already found the issue and will open a PR asap

@frederiksteiner frederiksteiner added bug Something isn't working needs triage Initial RCA is required labels Nov 11, 2024
@github-actions github-actions bot changed the title Integer columns contain Na after filtering when using to pandas in local testing SNOW-1797580: Integer columns contain Na after filtering when using to pandas in local testing Nov 11, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Nov 12, 2024
@sfc-gh-sghosh
Copy link

Hello @frederiksteiner ,

Thanks for raising the issue. We are able to reproduce the issue, with local_testing the dataframe has NaN for INT_COL.

BOOL_COL INT_COL
0 True NaN

Where as with regular session its

BOOL_COL INT_COL
0 True 2

Will work on it, if you have the PR already , please let us know, will review and process accordingly.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed needs triage Initial RCA is required labels Nov 12, 2024
@frederiksteiner
Copy link
Contributor Author

Yes, PR is already ready to be reviewed

@frederiksteiner
Copy link
Contributor Author

Any updates on this?

@sfc-gh-sghosh
Copy link

Hello @frederiksteiner ,

The team is looking into the PR #2599, will update

Regards,
Sujan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants