Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Decompress moved out of schema initialization #15550

Merged
merged 3 commits into from
Apr 10, 2024

Conversation

leoforney
Copy link
Contributor

#14192

read_csv wouldn't decompress file unless schema had None value.

Ran with test file :

import polars as pl
import gzip
import os

df = pl.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "age": [25, 32, 37, 45, 29],
    "city": ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"]
})

temp_filename = 'tempfile.csv'
df.write_csv(temp_filename)

compressed_filename = 'dataframe.csv.gz'
with open(temp_filename, 'rb') as temp_file:
    with gzip.open(compressed_filename, 'wb') as gzip_file:
        gzip_file.writelines(temp_file)

os.remove(temp_filename)

print("DataFrame is successfully exported to a gzipped CSV file.")

schema = {
    "id": pl.Int64,
    "name": pl.Utf8,
    "age": pl.Int64,
    "city": pl.Utf8,
}

df_reopened = pl.read_csv(
        'dataframe.csv.gz',
        schema=schema,
        has_header=True
    )

print(df_reopened)

@leoforney leoforney marked this pull request as draft April 9, 2024 02:17
@leoforney leoforney marked this pull request as ready for review April 9, 2024 02:17
Copy link

codspeed-hq bot commented Apr 9, 2024

CodSpeed Performance Report

Merging #15550 will not alter performance

Comparing leoforney:main (99029d7) with main (b91dedb)

Summary

✅ 22 untouched benchmarks

@ritchie46
Copy link
Member

Thanks, can you add a test on the python side as well?

@stinodego stinodego changed the title Decompress moved out of schema initialization fix: Decompress moved out of schema initialization Apr 9, 2024
@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Apr 9, 2024
@leoforney
Copy link
Contributor Author

Added!

Copy link

codecov bot commented Apr 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.16%. Comparing base (44f1097) to head (99029d7).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15550      +/-   ##
==========================================
- Coverage   81.16%   81.16%   -0.01%     
==========================================
  Files        1367     1367              
  Lines      175307   175318      +11     
  Branches     2527     2530       +3     
==========================================
+ Hits       142296   142303       +7     
- Misses      32534    32541       +7     
+ Partials      477      474       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ritchie46 ritchie46 merged commit c758416 into pola-rs:main Apr 10, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants