Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame to_csv compression with 'zip' use zipfilename as archive name #39465

Closed
3 tasks done
CyberQin opened this issue Jan 29, 2021 · 5 comments · Fixed by #44445
Closed
3 tasks done

BUG: DataFrame to_csv compression with 'zip' use zipfilename as archive name #39465

CyberQin opened this issue Jan 29, 2021 · 5 comments · Fixed by #44445
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@CyberQin
Copy link

CyberQin commented Jan 29, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


a=pd.DataFrame({'a':[1,2,3,4,5],'b':[4,5,6,7,8]})
a.to_csv('myfile.csv.zip')

Problem description

when use gzip method, the function works fine, a.to_csv('myfile.csv.gz') will create a gzip file with an archive file named myfile.csv.
the "zip" method works bad,a.to_csv('myfile.csv.zip') will create a zipfile with an archive file named myfile.csv.zip

@CyberQin CyberQin added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 29, 2021
@twoertwein
Copy link
Member

the "zip" method works bad,a.to_csv('myfile.csv.zip') will create a zipfile with an archive file named myfile.csv.zip

Do you want to create a PR to remove the ".zip" suffix? If you need a workaround:
pd.to_csv("test.csv.zip", compression={"method": "zip", "archive_name": "test.csv"})

and an strange case shows that a large dataframe will report dumplicate archive name

are you using 1.2.1? #38714

@CyberQin
Copy link
Author

CyberQin commented Feb 7, 2021

the "zip" method works bad,a.to_csv('myfile.csv.zip') will create a zipfile with an archive file named myfile.csv.zip

Do you want to create a PR to remove the ".zip" suffix? If you need a workaround:
pd.to_csv("test.csv.zip", compression={"method": "zip", "archive_name": "test.csv"})

and an strange case shows that a large dataframe will report dumplicate archive name

are you using 1.2.1? #38714

@twoertwein

now i'm using 1.2.1,i add a pr(#39647 ) to do some work with the name problem,but not the large dataframe problem

@twoertwein
Copy link
Member

do you still have the issue with the large dataframe when using 1.2.1?

@WillAyd
Copy link
Member

WillAyd commented Mar 12, 2021

Is this different than #26023?

@CyberQin
Copy link
Author

Is this different than #26023?

same purpose, but my PR#40387 will take "myfile" as archive_name when "myfile.zip" used as zipfile name,just like gzip or xz. It won't add ".csv" to the end of "myfile" automatically.
"myfile.csv.zip"->"myfile.csv"
"myfile.zip"->"myfile"
"myfile.csv.csv.zip"->"myfile.csv.csv"

@lithomas1 lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Apr 22, 2021
@jreback jreback added this to the 1.4 milestone Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment