-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in2csv: Empty CSV reading a specific xlsx file #1129
Comments
Can you run with
There might be some suppressed error output. |
@jpmckinney no error output. I have only the first cell :( |
Hmm, okay, I will have to test with the file. |
I also experienced the same issue. Only the very first cell is outputted. |
It seems related to the Unnamed columns. Try to use Panda library to read your file using this code: import pandas as pd
xlsx = pd.read_excel('file.xlsx', sheet_name=0)
xlsx.to_csv('file.csv', index=False) You will get something like this:
The reason could be this https://stackoverflow.com/a/67605541/511514 I don't have a solution yet. |
@raphox Can you share an XLSX file that causes the error? |
Ah, csvkit was trusting the file's metadata, which for @aborruso's file set the max columns and rows to 1. I made a commit to never trust the file, and to reset dimensions instead. I assume Pandas did the same. |
I could simulate the same issue using this code: import agate
import agateexcel
table = agate.Table.from_xlsx('file.xlsx', sheet=0)
table.to_csv('file.csv') The Panda library and Csvkit are using the Agate library. |
thank you very much |
Sorry, I can not share my file. It comes from a customer. Using your commit, I got this error message:
|
@raphox Pandas does not use agate.
Is it impossible to create a file (e.g. by modifying your customer file) to reproduce the issue?
Indeed - it looks like setting (For example, csvkit's examples/test.xlsx has data in a column that lacks a header. Setting https://openpyxl.readthedocs.io/en/stable/optimized.html#worksheet-dimensions |
Sorry, I got confused. The I can not share the customer's file, and when I try to remove the sensitive data and save the file using Microsoft Excel on MacOS, the problem disappears. But I got another weird behavior trying to convert this file sales_with_header_in_row_two_google.xlsx. I used the Google Docs to generate it based on my customer's file. The result using this file is:
Look at the original blank cells |
Using pandas:
Result in
|
I am making progress: Using Pandas:
I got the same results, using in2csv for the Google Doc, but using the customer's original file I got this:
|
To skip that useless first row, use Your comments are no longer about the issue, and are support questions (rather than bug reports). |
Thanks @jpmckinney My goal was just to convert an XLSX file to a CSV version. My objective is to have a small version of the file before importing it into my web application using Ruby on Rails. I don't want to worry about the header or skip rows. In my case, I used the custom pandas script to do this:
Ruby code: system(
Rails.root.join('bin/excel_to_csv_by_panda.py').to_s,
excel_file_path,
excel_sheet_name,
csv_file_path
) |
I just tried to use the --reset-dimensions parameter, and despite it being csvkit 1.3.0 and not yet documented, it does not seem to recognize the parameter (nor is it in the help menu - though it's in the readthedocs doc as a footer). I'm assuming the below issue is because the commit you mention is ahead of the latest 1.3.0 release. Digging into this a bit further, I see 2 issues:
Hope that helps, James! |
@steve-estes This feature is not yet part of a release. You can see it listed under "Unreleased" in the changelog: https://csvkit.readthedocs.io/en/latest/changelog.html You can install csvkit from GitHub using:
|
This is a separate issue. I haven't seen such warnings with any files, personally. Please open a new issue, and upload a file that causes the warning.
Where did you find this suggestion? |
I will open a new issue about the warning
While tinkering around with command-line python, and having the module tell me as much.
By following the referenced code path, it can be seen here on the openpyxl dev repository. |
Thank you, @steve-estes! I've released a new version of agate-excel that recalculates the dimensions (instead of only resetting them to With this change, it would be possible to have Arguably, CSV Kit could favor user experience by instead making a flag to opt out of resetting dimensions, e.g. |
Agreed I see no reason to make this opt-out rather than opt-in, given the performance implications. If we want to be a little bit fancy, one thing agate-excel could do, as a future item, is just first check whether the |
Good suggestion! I've made that change to agate-excel, so it'll occur automatically via csvkit. I didn't add a warning, as I agree that there is pretty much no chance that converting a single cell is the behavior desired by the user. |
Hi,
using in2csv to convert this "governative" COVID-19 file, I have a strange empty CSV that has only one column label.
I run
and I have only the value
iss_date
as result.If I open it with libreoffice or using other cli apps, I have the real values
The text was updated successfully, but these errors were encountered: