Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow path to be an StringIO object and skip extension check for that case #5992

Closed
wants to merge 12 commits into from

Conversation

fennnec
Copy link

@fennnec fennnec commented Jan 18, 2014

this is to allow in-memory creation of excel-files from pandas

… case to allow in-memory creation of excel-files from pandas
@jreback
Copy link
Contributor

jreback commented Jan 18, 2014

what is the use case for this?

@fennnec
Copy link
Author

fennnec commented Jan 18, 2014

i have a large django database and i do my exports for reporting in csv and excel formats via django-pandas (and thus via pandas)
i don't need to store the file anywhere on my server but just use a stringbuffer to stream the file to the client requesting it via wrapping it in a django HTTPResponse

xlsxwriter provides this functionality and only the file-extension check inside pandas prevented me from using the in-memory-only version of xlsxwriter

@jreback
Copy link
Contributor

jreback commented Jan 18, 2014

ahh...makes sense, ok need some tests for this then (simple as create a frame, save it to a buffer, prob save the buffer, then read it back and compare); I don't think that you can READ from a buffer as not sure that makes any sense (you would have to have xlrd stream it to you, not even sure if that is possible).

@fennnec
Copy link
Author

fennnec commented Jan 18, 2014

here is the code i use currently:

response = HttpResponse(content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet') # set contenttype
stringio = StringIO()
ew = ExcelWriter(stringio, engine='xlsxwriter', **{'options': {'in-memory': True}})
dataframe.to_excel(ew, sheet_name='keywordanalyse', engine='xmlxwriter')
ew.save()
response.content = stringio.getvalue()
return response

i don't know why the explicit call to save() is necessary but without it the xslx comes back empty.

I'll try to provide what you asked for, too

@fennnec
Copy link
Author

fennnec commented Jan 18, 2014

here is a simple test i just did:

 from pandas import DataFrame
 from pandas.io.excel import ExcelWriter
 from StringIO import  StringIO
 buf = StringIO()
 data = {'a': [1,2,3], 'b':[4,5,6], 'c':['wassup', 'wassuuuuuuuup', 'wassuuuuuuuuuuuuuuuuuuup']}
 df = DataFrame(data)
 df

Out[6]:
<> a b c
0 1 4 wassup
1 2 5 wassuuuuuuuup
2 3 6 wassuuuuuuuuuuuuuuuuuuup

 ew = ExcelWriter(buf, engine='xlsxwriter', **{'options': {'in-memory': True}})
 df.to_excel(ew, sheet_name='test', engine='xmlxwriter')
 ew.save()
 xlsxstring = buf.getvalue()
 f = open('text.xlsx', 'w')
 f.write(xlsxstring)
 f.close()

then i verified the outcome via opening the file in excel (see screenshot):
text_xlsx-3

@jreback
Copy link
Contributor

jreback commented Jan 18, 2014

what I mean is pls add the tests to the PR

their are a lot of existing tests in pandas/io/tests/test_excel.py

@fennnec
Copy link
Author

fennnec commented Jan 18, 2014

ok, I'll try my best although i have never written tests in python

@jreback
Copy link
Contributor

jreback commented Jan 18, 2014

@fennnec just copy paste the test format and modify to use your change.

The ideal way is to write the test first, have it fail, then apply your change and have it fix the issue (and not break anything else!)

see here:
https://github.com/pydata/pandas/wiki/Testing

@fennnec
Copy link
Author

fennnec commented Jan 19, 2014

i tested it in my application by hand

@jreback
Copy link
Contributor

jreback commented Jan 19, 2014

@fennnec we can only accept patches that have associated tests
and that pass Travis

your patch won't work even with local testing as their is a syntax error

@fennnec
Copy link
Author

fennnec commented Jan 19, 2014

well, it works locally here

test_stringio_writer (__main__.XlsxWriterTests) ... ok
test_stringio_writer (__main__.XlsxWriterTests_NoMerge) ... ok

so i don't know what the problem with another setup is

@@ -18,6 +18,7 @@
import pandas.compat as compat
import pandas.core.common as com
from warnings import warn
from StringIO import StringIO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to do: from pandas.compat import StringIO in order to pass python3

@fennnec
Copy link
Author

fennnec commented Jan 22, 2014

could you perhaps point me in the right direction for the python 3 errors, since i don't really understand what's going wrong there on the save method

@jreback
Copy link
Contributor

jreback commented Jan 22, 2014

IIRC you have to use io.BytesIO (and not StringIO) as its opened in binary mode. @jtratner ?

Michael Gaber and others added 6 commits January 22, 2014 17:04
@fennnec
Copy link
Author

fennnec commented Jan 23, 2014

passes travis now

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

gr8, pls add a release note (reference this PR) in 0.13.1, API changes (edit doc/source/release.rst)

then squash to a single commit

see here: https://github.com/pydata/pandas/wiki/Using-Git

@jtratner / @y-p ok with this?

@ghost
Copy link

ghost commented Jan 23, 2014

This was done in a half-assed manner and has all the appearence of a thrown-over-the-wall patch.

👎

skip extension check for that case to allow in-memory creation of
excel-files from pandas dataframes.
add a testcase and reference the PR in release notes
@ghost
Copy link

ghost commented Jan 23, 2014

@jreback, don't let me put you in a difficult position. If you're satisfied that this
is solid after the review, Merge away.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

@fennnec

  • needs additional tests for each of the different excel writers; if it doesn't work with the other writers then needs to say this in the docstring
  • docstring update
  • io.rst update
  • whatsnew entry

@cpcloud
Copy link
Member

cpcloud commented Jan 23, 2014

There's also a ton of merge commits.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

needs to be rebased on current master

@jreback
Copy link
Contributor

jreback commented Mar 9, 2014

@fennnec needs a rebase

@jreback
Copy link
Contributor

jreback commented Apr 6, 2014

closing this. @fennnec if you want to resubmit this we can take a look again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants