Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typo in tutorial_data_conversion.rst #69

Merged
merged 1 commit into from
Jan 14, 2017
Merged

Conversation

rmzelle
Copy link
Contributor

@rmzelle rmzelle commented Jan 13, 2017

By the way, could you add some example code on how to exactly open a multiple sheet Excel workbook and save its sheets to CSV files? It's not really clear to me what the right incantation is for pyexcel.save_book_as.

I'm currently using pyexcel.save_as(file_name=inputfilepath, dest_file_name=outputfilepath), but that only saves the first sheet.

@codecov-io
Copy link

codecov-io commented Jan 13, 2017

Current coverage is 98.58% (diff: 100%)

Merging #69 into master will not change coverage

@@             master        #69   diff @@
==========================================
  Files            61         61          
  Lines          5601       5601          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           5522       5522          
  Misses           79         79          
  Partials          0          0          

Powered by Codecov. Last update 4906d51...bef96ee

@chfw chfw merged commit 4d4b6db into pyexcel:master Jan 14, 2017
@chfw
Copy link
Member

chfw commented Jan 14, 2017

Surely, will do. Stay tuned.

@rmzelle rmzelle deleted the patch-1 branch January 14, 2017 23:50
chfw added a commit that referenced this pull request Jan 16, 2017
@chfw
Copy link
Member

chfw commented Jan 16, 2017

Does this section, What would happen if I save a multi sheet book into “csv” file help clarify save_book_as? If not clear, could you elaborate a bit? I can write more.

@rmzelle
Copy link
Contributor Author

rmzelle commented Jan 16, 2017

Thanks. That helps. I'm still a bit confused by pyexcel.save_book_as(bookdict=content, dest_file_name="myfile.csv") at http://pyexcel.readthedocs.io/en/latest/tutorial04.html#what-would-happen-if-i-save-a-multi-sheet-book-into-csv-file, since http://pyexcel.readthedocs.io/en/latest/generated/pyexcel.save_book_as.html?highlight=save_book_as doesn't mention a bookdict parameter.

For my little project, I'm hoping to convert specific sheets from .xlsx/.xlsm files to CSV, so I can put the data under version control. If I can't select a specific source sheet with pyexcel.save_as or pyexcel.Book(content).save_as("myfile.csv"), maybe I can do something like:

import pyexcel
sheet = pyexcel.get_sheet(file_name="example.xlsx", sheet_name="Sheet2")
sheet.save_as("example-sheet2.csv")

? Or, if I want to save all sheets in an Excel file to individual CSV files,

import pyexcel
book = pyexcel.get_book(file_name="example.xlsx")
sheets = book.to_dict()
for key, item in sheets.items():
    sheet = { key: sheets[key] }
    sheet.save_as(key + ".csv")

? Does that look reasonable?

@chfw
Copy link
Member

chfw commented Jan 16, 2017

the last line "keywords – additional keywords can be found at pyexcel.get_book()" would bring you to the get_book()'s parameters :).

for 1st part of your code, you can try:

import pyexcel as p
p.save_as(file_name="example.xlsx", sheet_name="Sheet2", dest_file_name="example-sheet2.csv")

for 2nd part of your code:

import pyexcel as p
book = p.get_book(file_name="example.xlsx")
for sheet in book:
    sheet.save_as(sheet.name+'.csv')

or:

import pyexcel as p
p.save_book_as(file_name="example.xlsx", dest_file_name="example.csv")

What's more, alternatively you could choose pyexcel-cli, which is not well documented so far but you can use help message to get around. For 1st part of your code, you could do this:

pyexcel transcode --sheet-name Sheet2 example.xlsx example-sheet2.csv

for 2nd part of your code, you could the following instead:

pyexcel transcode example.xlsx example.csv

For the help with pyexcel-cli, you can do pyexcel transcode --help

@rmzelle
Copy link
Contributor Author

rmzelle commented Jan 16, 2017

Awesome, thanks so much. I'll try out those suggestions.

chfw added a commit that referenced this pull request Jan 16, 2017
…ok_as. will find out later how to manage the same doc string in multiple places.
@rmzelle
Copy link
Contributor Author

rmzelle commented Jan 18, 2017

Sorry to turn this into a support ticket, but one more question: do you know if pyexcel offers a faster Excel-to-CSV conversion than pyexcel.save_book_as()? It took 5 minutes to convert a 3.2 Mb .xlsx Excel file (35k rows, 7 columns) to CSV with the line below:

pyexcel.save_book_as(file_name="example.xlsx", dest_file_name="example.csv")

Is that kind of performance to be expected? Doing the conversion in Excel 2010 only takes a few seconds. (i7-5600U, 16 GB RAM, Python 3.4.4 64-bits)

@chfw
Copy link
Member

chfw commented Jan 18, 2017

To be honest, the performance evaluation is not systematically done on readers.

So far pyexcel-xls and pyexcel-xlsx can read xlsx file. The latter one may be faster as it uses read-only mode. You can try switch the library and see. Pip unstall one and pip install the other.

@chfw
Copy link
Member

chfw commented Jan 19, 2017

I wondered if pyexcel-xlsx made a difference or not. Meanwhile you may also try out xlsx2csv, csvkit as well.

@rmzelle
Copy link
Contributor Author

rmzelle commented Jan 20, 2017

I wondered if pyexcel-xlsx made a difference or not. Meanwhile you may also try out xlsx2csv, csvkit as well.

I haven't had time to take another look, but I was already using "pyexcel-xlsx", I think (I installed "pyexcel" and "pyexcel-xlsx" with pip, but only imported "pyexcel" in my script).

Thanks a lot for the links to those other two libraries. I hadn't seen those before. Especially the description of xlsx2csv ("it is fast, and works for huge xlsx files") looks promising.

@chfw
Copy link
Member

chfw commented Jan 20, 2017

I am sorry to hear that. pyexcel-xlsx uses openpyxl in read-only mode, which is the best the 3rd party library could do. If you do not mind, could you please update me on the performance of xlsx2csv?

@chfw
Copy link
Member

chfw commented Dec 5, 2017

@rmzelle , could you please try using pyexcel-xlsxr? which I hope it should match the performance of xlsx2csv.

@rmzelle
Copy link
Contributor Author

rmzelle commented Dec 5, 2017

@rmzelle , could you please try using pyexcel-xlsxr? which I hope it should match the performance of xlsx2csv.

@chfw, thanks. I don't know when I will have time to return to this project and test your new tool (sorry). FWIW, I did get good performance with xlsx2csv.py.

@chfw
Copy link
Member

chfw commented Dec 5, 2017

Are you dealing with some public dataset? I may do the performance benchmarking myself.

@rmzelle
Copy link
Contributor Author

rmzelle commented Dec 5, 2017

No, sorry, it's a private database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants