Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_latex() errors if index level name is not string #19981

Closed
toobaz opened this issue Mar 4, 2018 · 1 comment · Fixed by #20797
Closed

to_latex() errors if index level name is not string #19981

toobaz opened this issue Mar 4, 2018 · 1 comment · Fixed by #20797
Labels
IO LaTeX to_latex Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Mar 4, 2018

Code Sample, a copy-pastable example if possible

In [2]: pd.DataFrame([[1, 2, 3]]*2).set_index([0, 1]).to_latex()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-22b8b94dc46c> in <module>()
----> 1 pd.DataFrame([[1, 2, 3]]*2).set_index([0, 1]).to_latex()

/home/nobackup/repo/pandas/pandas/core/generic.py in to_latex(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, bold_rows, column_format, longtable, escape, encoding, decimal, multicolumn, multicolumn_format, multirow)
   2152                            encoding=encoding, multicolumn=multicolumn,
   2153                            multicolumn_format=multicolumn_format,
-> 2154                            multirow=multirow)
   2155 
   2156         if buf is None:

/home/nobackup/repo/pandas/pandas/io/formats/format.py in to_latex(self, column_format, longtable, encoding, multicolumn, multicolumn_format, multirow)
    709 
    710         if hasattr(self.buf, 'write'):
--> 711             latex_renderer.write_result(self.buf)
    712         elif isinstance(self.buf, compat.string_types):
    713             import codecs

/home/nobackup/repo/pandas/pandas/io/formats/format.py in write_result(self, buf)
    984                          .replace('}', '\\}').replace('~', '\\textasciitilde')
    985                          .replace('^', '\\textasciicircum').replace('&', '\\&')
--> 986                          if (x and x != '{}') else '{}') for x in row]
    987             else:
    988                 crow = [x if x else '{}' for x in row]

/home/nobackup/repo/pandas/pandas/io/formats/format.py in <listcomp>(.0)
    984                          .replace('}', '\\}').replace('~', '\\textasciitilde')
    985                          .replace('^', '\\textasciicircum').replace('&', '\\&')
--> 986                          if (x and x != '{}') else '{}') for x in row]
    987             else:
    988                 crow = [x if x else '{}' for x in row]

AttributeError: 'int' object has no attribute 'replace'

Problem description

to_latex() assumes that MultiIndex level names, if any, are strings.

Related to #18669

As a general comment, I think to_latex() would benefit a lot from some refactoring which abstracted some of the (non-)escaping subtleties. There should be a way to avoid a replace() for each specific command, setting placeholders instead which are finally transformed all at once. This would also allow users to also insert "non-escapand" content (e.g. LaTeX formulae as labels) without setting escape=False, which in many cases is an imperfect workaround.

Expected Output

In [2]: pd.DataFrame([[1, 2, 3]]*2, columns=list('012')).set_index(['0', '1']).to_latex()
Out[2]: '\\begin{tabular}{llr}\n\\toprule\n  &   &  2 \\\\\n0 & 1 &    \\\\\n\\midrule\n1 & 2 &  3 \\\\\n  &   &  3 \\\\\n\\bottomrule\n\\end{tabular}\n'

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 0bfb61b
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-5-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.23.0.dev0+422.g0bfb61b21.dirty
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.14.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@shangyian
Copy link
Contributor

Definitely agree that to_latex() could do with some refactoring to clean up some of the replace calls. I'll give that a try.

@shangyian shangyian mentioned this issue Mar 7, 2018
4 tasks
@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string IO LaTeX to_latex labels Mar 7, 2018
@jreback jreback added this to the 0.23.0 milestone Mar 7, 2018
@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@jreback jreback modified the milestones: Next Major Release, 0.23.0 Apr 24, 2018
toobaz pushed a commit that referenced this issue Apr 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO LaTeX to_latex Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
3 participants