Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_excel error "Internal and external indices on axis 1 do not match." When multiple empty columns on first line #2404

Closed
devin-petersohn opened this issue Nov 12, 2020 · 5 comments · Fixed by #2526
Assignees
Labels
bug 🦗 Something isn't working

Comments

@devin-petersohn
Copy link
Collaborator

devin-petersohn commented Nov 12, 2020

This error comes from a mismatch in Modin pandas behavior when naming columns. The pandas naming convention is not completely consistent so further investigation is required.

Error stacktrace:

>>> pd.read_excel('test_emptyline.xlsx')
UserWarning: Parallel `read_excel` is a new feature! Please email bug_reports@modin.org if you run into any problems.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\foo\modin\pandas\dataframe.py", line 183, in __repr__
    result = repr(self._build_repr_df(num_rows, num_cols))
  File "c:\foo\modin\pandas\base.py", line 168, in _build_repr_df
    return self.iloc[indexer]._query_compiler.to_pandas()
  File "c:\foo\modin\backends\pandas\query_compiler.py", line 233, in to_pandas
    return self._modin_frame.to_pandas()
  File "c:\foo\modin\engines\base\frame\data.py", line 2063, in to_pandas
    f"Internal and external indices on axis {axis} do not match.",
  File "c:\foo\modin\error_message.py", line 63, in catch_bugs_and_request_email
    " caused this error.\n{}".format(extra_log)
Exception: Internal Error. Please email bug_reports@modin.org with the traceback and command that caused this error.
Internal and external indices on axis 1 do not match.
@devin-petersohn devin-petersohn added the bug 🦗 Something isn't working label Nov 12, 2020
@devin-petersohn devin-petersohn self-assigned this Nov 12, 2020
@devin-petersohn
Copy link
Collaborator Author

As a temporary workaround, setting df.columns = df.columns fixes the mismatch, but the columns will not be "Unnamed X". In the case these names are thrown away later it will not matter.

@aregm aregm assigned vnlitvinov and unassigned devin-petersohn Dec 3, 2020
@vnlitvinov
Copy link
Collaborator

@devin-petersohn do you have a reproducer or at least a sample? It's unclear to me how to reproduce the issue.

@devin-petersohn
Copy link
Collaborator Author

This should be able to be reproduced with an excel file with a blank first line.

@vnlitvinov
Copy link
Collaborator

Yeah, got that showing up using a cooked-up excel file. Thanks!

@vnlitvinov
Copy link
Collaborator

Note: this happens only with "parallel Excel reader" (i.e. with MODIN_ENGINE != Python).

vnlitvinov added a commit to vnlitvinov/modin that referenced this issue Dec 9, 2020
Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>
vnlitvinov added a commit to vnlitvinov/modin that referenced this issue Dec 9, 2020
…mes if some are empty

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants