Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Enable ExcelWriter to construct in-memory sheets #10376

Merged
merged 1 commit into from
Jun 20, 2015

Conversation

bashtage
Copy link
Contributor

Add support for StringIO/BytesIO to ExcelWriter
Add vbench support for writing excel files
Add support for serializing lists/dicts to strings
Fix bug when reading blank excel sheets

closes #8188
closes #7074
closes #6403
closes #7171

@bashtage bashtage force-pushed the improve-excel branch 2 times, most recently from 96d6fe4 to 7197f8f Compare June 17, 2015 22:39
@@ -249,10 +249,9 @@ Optional Dependencies
* `statsmodels <http://statsmodels.sourceforge.net/>`__
* Needed for parts of :mod:`pandas.stats`
* `openpyxl <http://packages.python.org/openpyxl/>`__, `xlrd/xlwt <http://www.python-excel.org/>`__
* openpyxl version 1.6.1 or higher, but lower than 2.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to take this out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - both 1.x and 2.x are clearly supported and tested in excel.py and test_excel.py. I suppose the person who added the OpenPyXL 2.x path forgot to change this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, IIRC there were some issues with >=2.0.0 and < 2.0.3 something like that (e.g. some style sheet issues. But ok, past that now, so ok.

@jreback jreback added Bug Performance Memory or execution speed performance IO Excel read_excel, to_excel labels Jun 18, 2015
@jreback jreback added this to the 0.17.0 milestone Jun 18, 2015

if sheet.nrows == 0:
return DataFrame()

if header is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about sheet.nrows==1 (and its the header)? do we construct an empty frame with the correct columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added a test for this case and it works correctly already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, saw that below. thanks.

@@ -2184,6 +2184,38 @@ argument to ``to_excel`` and to ``ExcelWriter``. The built-in engines are:

.. _io.clipboard:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move your doc to before this refernce (for the next section)

@bashtage bashtage force-pushed the improve-excel branch 3 times, most recently from 38b74a9 to 465ceb0 Compare June 19, 2015 18:15
@bashtage
Copy link
Contributor Author

@jreback The vbench for openpyxl is very slow ~ around 300s on my computer. This is around 10x slower than the next slowest test (sql). Should I reduce the df size for this test?

@jreback
Copy link
Contributor

jreback commented Jun 19, 2015

yes, a vbench should be < 1s if at all possible, so reduce the size on these.

@bashtage bashtage force-pushed the improve-excel branch 2 times, most recently from e64196c to 07fd4d6 Compare June 19, 2015 21:06
@bashtage
Copy link
Contributor Author

I think it is finished once green - I snuck in one more Excel-related issue (testing xlwt on Py34).

Openpyxl is very slow compared to xlsxwriter (5x slower)

@jreback
Copy link
Contributor

jreback commented Jun 19, 2015

ahh, the xlwt just got ported to py3. finally! excellent.

yes, openpyxl does have this perf issue (and why xlsxwriter is the default). Though the author is working on making it much better.

ping when green.

@bashtage
Copy link
Contributor Author

@jreback ready

@@ -2182,6 +2184,40 @@ argument to ``to_excel`` and to ``ExcelWriter``. The built-in engines are:

df.to_excel('path_to_file.xlsx', sheet_name='Sheet1')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a reference here (so we can link to it)

@jreback
Copy link
Contributor

jreback commented Jun 20, 2015

minor doc comment. ping after you have pushed for merging.

@bashtage
Copy link
Contributor Author

@jreback Added those.

Add support for StringIO/BytesIO to ExcelWriter
Add vbench support for writing excel files
Add support for serializing lists/dicts to strings
Fix bug when reading blank excel sheets
Added xlwt to Python 3.4 builds

closes pandas-dev#8188
closes pandas-dev#7074
closes pandas-dev#6403
closes pandas-dev#7171
closes pandas-dev#6947
@bashtage
Copy link
Contributor Author

Rebased

jreback added a commit that referenced this pull request Jun 20, 2015
ENH: Enable ExcelWriter to construct in-memory sheets
@jreback jreback merged commit 2fea54a into pandas-dev:master Jun 20, 2015
@jreback
Copy link
Contributor

jreback commented Jun 20, 2015

@bashtage thanks for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel Performance Memory or execution speed performance
Projects
None yet
2 participants