Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Considerable slowdown in writing autodoc pages with sphinx 1.6.x #4230

Closed
jorisvandenbossche opened this issue Nov 10, 2017 · 23 comments
Closed
Milestone

Comments

@jorisvandenbossche
Copy link
Contributor

For the pandas docs, we currenlty pin sphinx to 1.5.x because we previously had problems with 1.6. I was now investigating what the problem exactly is preventing us to upgrade (pandas-dev/pandas#16705), and the main problem is a considerable slowdown in the writing phase.

Problem

Building the pandas docs went from ca 20 min to > 50 min, when switching from sphinx 1.5.6 to 1.6.5.

It is especially our api pages (autodoc pages for methods/attributes generated by autosummary) that are very slow in writing. In the progress output I can clearly see that each generated docstring page takes a noticeble 0.5 à 1.5 seconds to write (and since we have many of those pages (> 1000), this takes a long time)

Procedure to reproduce the problem

Building the pandas docs, although this is not a small reproducible step.

Does anybody have an idea where to start looking or how to best debug this?

Environment info

  • OS: Linux (Ubuntu 16.04)
  • Python version: 3.6
  • Sphinx version: 1.6.5
@tk0miya
Copy link
Member

tk0miya commented Nov 12, 2017

Hmm, I feel this is a very difficult issue. We only provides -vvv option to increase verbosity for debugging.
Can I see the logs both 1.5 and 1.6?

Note: Sadly, I failed to build pandas document on my local. It is very difficult to prepare the environment to me.

@jorisvandenbossche
Copy link
Contributor Author

Yes, the pandas docs are quite huge and need a complicated environment to fully build. But I will make a smaller test docs version that should be easier to test (and to provide logs for).
Thanks for the response!

@jorisvandenbossche
Copy link
Contributor Author

I made a test repo with a small subset of our docs: https://github.com/jorisvandenbossche/pandas-test-docs. This should be easier to test out.

Running this now with different versions of sphinx, will post the logs shortly.

@jorisvandenbossche
Copy link
Contributor Author

Below the logs for building the rest repo with 1.5 vs 1.6 (for the rest they were run in exactly the same environment, only the sphinx version differed). There are some warnings, but those can be ignored (unrelated issues we are trying to solve).
So the time difference here is 37s vs 1min22 (the relative difference is bigger on the full docs, possibly because here is more overhead compared to the actual docs building).

With sphinx 1.6.5 (log-sphinx165.txt):

joris@joris-XPS-13-9350:~/scipy/pandas-test-docs$ time python make.py html > log-sphinx165.txt
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.truncate:18: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.argmax.rst: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.argmin.rst: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.from_array.rst: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.valid.rst: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.view.rst: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.to_pickle.rst:24: WARNING: Citation [R2] is not referenced.
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.mask:79: WARNING: undefined label: indexing.where_mask (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/series.py:docstring of pandas.Series.reindex:207: WARNING: undefined label: basics.reindexing (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/series.py:docstring of pandas.Series.rename:10: WARNING: undefined label: basics.rename (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.where:79: WARNING: undefined label: indexing.where_mask (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.xs:35: WARNING: undefined label: advanced.mi_slicers (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/sphinx/application.py:445: RemovedInSphinx17Warning: app.status_iterator() is now deprecated. Use sphinx.util.status_iterator() instead.
  RemovedInSphinx17Warning)
WARNING: html_static_path entry '/home/joris/scipy/pandas-test-docs/source/_static' does not exist

real	1m22.369s
user	1m18.452s
sys	0m0.608s

With sphinx 1.5.6 (log-sphinx156.txt):

joris@joris-XPS-13-9350:~/scipy/pandas-test-docs$ time python make.py html > log-sphinx156.txt
WARNING: [autosummary] failed to import 'pandas.Series.index': no module named pandas.Series.index
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.truncate:18: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.T.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.argmax.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.argmin.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.asobject.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.at.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.axes.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.base.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.blocks.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.data.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.dtype.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.dtypes.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.empty.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.flags.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.from_array.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.ftype.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.ftypes.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.hasnans.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.iat.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.iloc.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.imag.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.is_copy.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.is_monotonic.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.is_monotonic_decreasing.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.is_monotonic_increasing.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.is_unique.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.itemsize.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.ix.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.loc.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.name.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.nbytes.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.ndim.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.real.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.shape.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.size.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.strides.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.valid.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.values.rst:: WARNING: document isn't included in any toctree
/home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.view.rst:: WARNING: document isn't included in any toctree
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/series.py:docstring of pandas.Series.iloc:21: WARNING: undefined label: indexing.integer (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/series.py:docstring of pandas.Series.ix:18: WARNING: undefined label: advanced (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/series.py:docstring of pandas.Series.loc:21: WARNING: undefined label: indexing.label (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.mask:79: WARNING: undefined label: indexing.where_mask (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/series.py:docstring of pandas.Series.reindex:207: WARNING: undefined label: basics.reindexing (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/series.py:docstring of pandas.Series.rename:10: WARNING: undefined label: basics.rename (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.where:79: WARNING: undefined label: indexing.where_mask (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.xs:35: WARNING: undefined label: advanced.mi_slicers (if the link has no caption the label must precede a section header)
WARNING: html_static_path entry '/home/joris/scipy/pandas-test-docs/source/_static' does not exist

real	0m34.111s
user	0m30.296s
sys	0m0.372s

@jorisvandenbossche
Copy link
Contributor Author

Looking at the difference in the logs in the writing phase, I notice that with sphinx 1.5.6 the following events are emitted:

...
writing output... [  1%] generated/pandas.Series.abs
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.abs"...>>, 'generated/pandas.Series.abs')
[app] emitting event: 'html-page-context'('generated/pandas.Series.abs', 'page.html', {'embedded': False, 'project': 'pandas', 'release': '0.
....

while with sphinx 1.6.5 it is:

...
writing output... [  1%] generated/pandas.Series.abs
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.abs"...>>, 'generated/pandas.Series.abs')
...

So without the 'html-page-context' event.

I am not familiar with this event, but this might indicate it is not reusing a template page but each time building the full page from scratch with sphinx 1.6.5? Were there changes related to this? (do we need to update some settings?)

@jorisvandenbossche
Copy link
Contributor Author

Trying to figure out further what is happening, I patched _write_serial in Builder to time the different steps.

With 1.5.6:

 preparing documents... done
writing output... [  0%] api
[app] emitting event: 'doctree-resolved'(<document: <target...><section "api reference; api"...>>, 'api')
[app] emitting event: 'html-page-context'('api', 'page.html', {'embedded': False, 'project': 'pandas', 'release': '0.21.0', 'version': '0.21.
Time to write: 0.151 s
writing output... [  0%] generated/pandas.Series
[app] emitting event: 'missing-reference'(<sphinx.environment.BuildEnvironment object at 0x7f73d636d358>, <pending_xref: <literal...>>, <lite
[app] emitting event: 'missing-reference'(<sphinx.environment.BuildEnvironment object at 0x7f73d636d358>, <pending_xref: <literal...>>, <lite
[app] emitting event: 'missing-reference'(<sphinx.environment.BuildEnvironment object at 0x7f73d636d358>, <pending_xref: <literal...>>, <lite
[app] emitting event: 'missing-reference'(<sphinx.environment.BuildEnvironment object at 0x7f73d636d358>, <pending_xref: <literal...>>, <lite
[app] emitting event: 'missing-reference'(<sphinx.environment.BuildEnvironment object at 0x7f73d636d358>, <pending_xref: <literal...>>, <lite
[app] emitting event: 'missing-reference'(<sphinx.environment.BuildEnvironment object at 0x7f73d636d358>, <pending_xref: <literal...>>, <lite
[app] emitting event: 'missing-reference'(<sphinx.environment.BuildEnvironment object at 0x7f73d636d358>, <pending_xref: <literal...>>, <lite
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series"...>>, 'generated/pandas.Series')
[app] emitting event: 'html-page-context'('generated/pandas.Series', 'page.html', {'embedded': False, 'project': 'pandas', 'release': '0.21.0
Time to write: 0.145 s
writing output... [  1%] generated/pandas.Series.T
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.t"...>>, 'generated/pandas.Series.T')
[app] emitting event: 'html-page-context'('generated/pandas.Series.T', 'page.html', {'embedded': False, 'project': 'pandas', 'release': '0.21
Time to write: 0.070 s
writing output... [  1%] generated/pandas.Series.abs
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.abs"...>>, 'generated/pandas.Series.abs')
[app] emitting event: 'html-page-context'('generated/pandas.Series.abs', 'page.html', {'embedded': False, 'project': 'pandas', 'release': '0.
Time to write: 0.072 s
writing output... [  2%] generated/pandas.Series.add
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.add"...>>, 'generated/pandas.Series.add')
[app] emitting event: 'html-page-context'('generated/pandas.Series.add', 'page.html', {'embedded': False, 'project': 'pandas', 'release': '0.
Time to write: 0.069 s
writing output... [  2%] generated/pandas.Series.add_prefix
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.add_prefix"...>>, 'generated/pandas.Series.add_prefix')
[app] emitting event: 'html-page-context'('generated/pandas.Series.add_prefix', 'page.html', {'embedded': False, 'project': 'pandas', 'releas
Time to write: 0.073 s
writing output... [  2%] generated/pandas.Series.add_suffix
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.add_suffix"...>>, 'generated/pandas.Series.add_suffix')
[app] emitting event: 'html-page-context'('generated/pandas.Series.add_suffix', 'page.html', {'embedded': False, 'project': 'pandas', 'releas
Time to write: 0.172 s
writing output... [  3%] generated/pandas.Series.agg
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.agg"...>>, 'generated/pandas.Series.agg')
[app] emitting event: 'html-page-context'('generated/pandas.Series.agg', 'page.html', {'embedded': False, 'project': 'pandas', 'release': '0.
Time to write: 0.114 s
writing output... [  3%] generated/pandas.Series.aggregate
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.aggregate"...>>, 'generated/pandas.Series.aggregate')
[app] emitting event: 'html-page-context'('generated/pandas.Series.aggregate', 'page.html', {'embedded': False, 'project': 'pandas', 'release
Time to write: 0.111 s

...

writing output... [100%] index
[app] emitting event: 'doctree-resolved'(<document: <comment...><section "pandas: powerful python data analy ...>, 'index')
[app] emitting event: 'html-page-context'('index', 'page.html', {'embedded': False, 'project': 'pandas', 'release': '0.21.0', 'version': '0.2
Time to write: 0.044 s

Time to write FULL: 20.005 s

With 1.6.5:

preparing documents... done
writing output... [  0%] api
[app] emitting event: 'doctree-resolved'(<document: <target...><section "api reference; api"...>>, 'api')
Time to write: 0.382 s
writing output... [  0%] generated/pandas.Series
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series"...>>, 'generated/pandas.Series')
Time to write: 0.647 s
writing output... [  1%] generated/pandas.Series.abs
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.abs"...>>, 'generated/pandas.Series.abs')
Time to write: 0.253 s
writing output... [  1%] generated/pandas.Series.add
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.add"...>>, 'generated/pandas.Series.add')
Time to write: 0.359 s
writing output... [  2%] generated/pandas.Series.add_prefix
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.add_prefix"...>>, 'generated/pandas.Series.add_prefix')
Time to write: 0.255 s
writing output... [  2%] generated/pandas.Series.add_suffix
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.add_suffix"...>>, 'generated/pandas.Series.add_suffix')
Time to write: 0.251 s
writing output... [  3%] generated/pandas.Series.agg
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.agg"...>>, 'generated/pandas.Series.agg')
Time to write: 0.354 s
writing output... [  3%] generated/pandas.Series.aggregate
[app] emitting event: 'doctree-resolved'(<document: <section "pandas.series.aggregate"...>>, 'generated/pandas.Series.aggregate')
Time to write: 0.262 s

...

writing output... [100%] index
[app] emitting event: 'doctree-resolved'(<document: <comment...><section "pandas: powerful python data analy ...>, 'index')
Time to write: 0.308 s

/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.mask:79: WARNING: undefined label: indexing.where_mask (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/series.py:docstring of pandas.Series.reindex:207: WARNING: undefined label: basics.reindexing (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/series.py:docstring of pandas.Series.rename:10: WARNING: undefined label: basics.rename (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.where:79: WARNING: undefined label: indexing.where_mask (if the link has no caption the label must precede a section header)
/home/joris/miniconda3/envs/test-sphinx16/lib/python3.6/site-packages/pandas/core/generic.py:docstring of pandas.Series.xs:35: WARNING: undefined label: advanced.mi_slicers (if the link has no caption the label must precede a section header)
Time to write FULL: 62.404 s

So with sphinx 1.6.5 writing a single page consistently takes quite some more time.

@jorisvandenbossche
Copy link
Contributor Author

I just tested this with sphinx master, and apparently this is solved there. It actually is faster than 1.5.6 (the writing part went down from 20s in 1.5.6 to 4s with master, while it was 60s on 1.6.5).

Is there a timeline for the 1.7 release?
Or does anybody have an idea which commit could have fixed this in order to possibly backport it to 1.6.x? (I don't directly see something in the release notes)

@stephenfin
Copy link
Contributor

@jorisvandenbossche I can't speak as to plans for 1.7 nor what patches might have resolved this. However, could I suggest using git-bisect to figure out what patch actually repaired things here? If we do this, backporting is easily done.

@TomAugspurger
Copy link

TomAugspurger commented Jan 23, 2018

I had a try at this as well, with https://github.com/jorisvandenbossche/pandas-test-docs but didn't see the improvement @jorisvandenbossche found. Writing the API doc pages was as slow in with 1.7b as with 1.6.*, which is slower than 1.5.*

Version time
1.5 31.47
1.7b 63.23

I'll try to profile the slowdown next week.

@stephenfin
Copy link
Contributor

stephenfin commented Jan 23, 2018

@TomAugspurger Are you sure you're starting from a fresh environment each time? You should be deleting the entire build directory including doctrees.

If you do have time to profile this, it would be much appreciated.

@tk0miya tk0miya added this to the 1.7 milestone Jan 28, 2018
tk0miya added a commit to tk0miya/sphinx that referenced this issue Jan 30, 2018
``docutils.utils.new_document()`` is much slower than an instantiation
of document node.  This adds a much faster alternative of the function
which uses caches internally.
@tk0miya
Copy link
Member

tk0miya commented Jan 30, 2018

I investigated what happened in writing (in Builder._write_serial()):

    def _write_serial(self, docnames):
        # type: (Sequence[unicode]) -> None
        with logging.pending_warnings():
            import cProfile
            p = cProfile.Profile()
            count = 0
            for docname in status_iterator(docnames, 'writing output... ', "darkgreen",
                                           len(docnames), self.app.verbosity):
                if count == 0:
                    p.enable()
                doctree = self.env.get_and_resolve_doctree(docname, self)
                self.write_doc_serialized(docname, doctree)
                self.write_doc(docname, doctree)
                if count == 0:
                    p.disable()
                    p.dump_stats('sphinx.prof')
                count += 1

As a result, I got following profile:

Tue Jan 30 11:40:55 2018    sphinx.prof

         634449 function calls (615413 primitive calls) in 4.841 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    0.005    0.001    3.428    0.857 /usr/local/lib/python3.5/dist-packages/sphinx/environment/adapters/toctree.py:50(resolve)
        1    0.000    0.000    3.343    3.343 /usr/local/lib/python3.5/dist-packages/sphinx/environment/__init__.py:806(get_and_resolve_doctree)
      8/4    0.010    0.001    3.341    0.835 /usr/local/lib/python3.5/dist-packages/sphinx/environment/adapters/toctree.py:118(_entries_from_toctree)
      210    0.008    0.000    3.171    0.015 /usr/local/lib/python3.5/dist-packages/sphinx/environment/adapters/toctree.py:327(process_only_nodes)
      211    0.007    0.000    3.167    0.015 /usr/local/lib/python3.5/dist-packages/sphinx/transforms/__init__.py:89(apply_transforms)
      217    0.009    0.000    3.145    0.014 /usr/local/lib/python3.5/dist-packages/docutils/utils/__init__.py:420(new_document)
      218    0.007    0.000    3.055    0.014 /usr/local/lib/python3.5/dist-packages/docutils/frontend.py:575(__init__)
      218    0.114    0.001    2.945    0.014 /usr/local/lib/python3.5/dist-packages/docutils/frontend.py:607(populate_from_components)
    10936    0.238    0.000    2.772    0.000 /usr/lib/python3.5/optparse.py:994(add_option)
    10936    0.453    0.000    2.411    0.000 /usr/lib/python3.5/optparse.py:563(__init__)
...

This result says new_document() is too heavy. Certainly, it is called on SphinxTransformer.apply().

Furthermore, I also tested how heavy the function is.
This is a comparison it with instantiation of document node.

tkomiya@deneb> python3 -m timeit 'from docutils import nodes; nodes.document(None, None)'
100000 loops, best of 3: 12.8 usec per loop
tkomiya@deneb> python3 -m timeit 'from docutils.utils import new_document; new_document("")'
1000 loops, best of 3: 933 usec per loop

I think this is a cause of slowdown.

@tk0miya
Copy link
Member

tk0miya commented Jan 30, 2018

Note: This is my dockerfile for this issue:

FROM tk0miya/sphinx-html

RUN git clone https://github.com/jorisvandenbossche/pandas-test-docs
WORKDIR /docs/pandas-test-docs

RUN pip3 install -U pandas IPython git+https://github.com/numpy/numpydoc.git
RUN pip3 install -U nbconvert
RUN apt update && apt install -y pandoc
RUN pip3 install -U nbconvert nbsphinx
RUN pip3 install -U --pre sphinx

tk0miya added a commit to tk0miya/sphinx that referenced this issue Jan 30, 2018
``docutils.utils.new_document()`` is much slower than an instantiation
of document node.  This adds a much faster alternative of the function
which uses caches internally.
tk0miya added a commit that referenced this issue Jan 30, 2018
Fix #4230: slowdown in writing document with sphinx 1.6
@tk0miya
Copy link
Member

tk0miya commented Jan 30, 2018

On my local, #4523 resolves this problem. Could you try it please?
(I will release 1.7.0b2 including #4523 tomorrow)

Please let me know if it is still slow.

@jorisvandenbossche
Copy link
Contributor Author

@tk0miya Thanks a lot for looking into this. I will try it out.

@jorisvandenbossche
Copy link
Contributor Author

jorisvandenbossche commented Jan 30, 2018

I tried this out, but I unfortunately cannot test it, as it appears that sphinx dev version is not building our docs correctly. Almost all API doc pages are empty (and thus do build faster .. but without being good comparison).

The reason seems to be that in the docstring pages generated by autosummary, all occurences of .. automethod:: Series.abs are changed in .. autodata:: abs, which then fails to fetch the docstrings, resulting in empty pages.

@jorisvandenbossche
Copy link
Contributor Author

So trying to look through the log output of sphinx 1.7.
I see the following (when reading the source of generated/pandas.Series):

[autodoc] from pandas import Series.abs
[autodoc] => <module 'pandas' from '/home/joris/miniconda3/envs/sphinx-test/lib/python3.6/site-packages/pandas/__init__.py'>
[autodoc] getattr(_, 'Series')
[autodoc] => <class 'pandas.core.series.Series'>
[autodoc] getattr(_, 'abs')
[autodoc] => <function NDFrame.abs at 0x7fa3d669e598>
[app] emitting event: 'autodoc-process-docstring'('method', 'pandas.Series.abs', <function NDFrame.abs at 0x7fa3d669e598>, {}, ['Return an object wit

It clearly mentions 'method' in that last line, but, it writes a file with autodata, resulting in an error in the next step when reading the generated (in the above step) file 'generated/pandas.Series.abs':

reading sources... [  1%] generated/pandas.Series.abs
[app] emitting event: 'env-purge-doc'(<sphinx.environment.BuildEnvironment object at 0x7fa3c796c160>, 'generated/pandas.Series.abs')
[app] emitting event: 'source-read'('generated/pandas.Series.abs', ['pandas.Series.abs\n=================\n\n.. currentmodule:: pandas.
[autodoc] /home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.abs.rst:6: input:
.. autodata:: abs
[autodoc] from pandas.Series import abs
autodoc: failed to import data 'abs' from module 'pandas.Series'; the following exception was raised:
No module named 'pandas.Series'
[app] emitting event: 'doctree-read'(<document: <section "pandas.series.abs"...>>,)

As a comparison, the equivalent bits of the log file for sphinx 1.5.6:

[autodoc] from pandas import Series.abs
[autodoc] import pandas
[autodoc] => <module 'pandas' from '/home/joris/miniconda3/envs/sphinx-test/lib/python3.6/site-packages/pandas/__init__.py'>
[autodoc] getattr(_, 'Series')
[autodoc] => <class 'pandas.core.series.Series'>
[autodoc] getattr(_, 'abs')
[autodoc] => <function NDFrame.abs at 0x7f1a4ab169d8>
[app] emitting event: 'autodoc-process-signature'('method', 'pandas.Series.abs', <function NDFrame.abs at 0x7f1a4ab169d8>, {}, '()', None)
[app] emitting event: 'autodoc-process-docstring'('method', 'pandas.Series.abs', <function NDFrame.abs at 0x7f1a4ab169d8>, {}, ['Return an object wit
[app] emitting event: 'autodoc-process-docstring'('method', 'pandas.Series.abs', <function NDFrame.abs at 0x7f1a4ab169d8>, {}, ['', 'Return an object
reading sources... [  1%] generated/pandas.Series.abs
[app] emitting event: 'env-purge-doc'(<sphinx.environment.BuildEnvironment object at 0x7f1a3d415d30>, 'generated/pandas.Series.abs')
[app] emitting event: 'source-read'('generated/pandas.Series.abs', ['pandas\\.Series\\.abs\n===================\n\n.. currentmodule:: p
[autodoc] /home/joris/scipy/pandas-test-docs/source/generated/pandas.Series.abs.rst:6: input:
.. automethod:: Series.abs
[autodoc] from pandas import Series.abs
[autodoc] import pandas
[autodoc] => <module 'pandas' from '/home/joris/miniconda3/envs/sphinx-test/lib/python3.6/site-packages/pandas/__init__.py'>
[autodoc] getattr(_, 'Series')
[autodoc] => <class 'pandas.core.series.Series'>
[autodoc] getattr(_, 'abs')
[autodoc] => <function NDFrame.abs at 0x7f1a4ab169d8>
[app] emitting event: 'autodoc-process-signature'('method', 'pandas.Series.abs', <function NDFrame.abs at 0x7f1a4ab169d8>, {}, '()', None)
[app] emitting event: 'autodoc-process-docstring'('method', 'pandas.Series.abs', <function NDFrame.abs at 0x7f1a4ab169d8>, {}, ['Return an object wit
[autodoc] output:

.. py:method:: Series.abs()
   :module: pandas

[... docstring]
   ..
       !! processed by numpydoc !!
[app] emitting event: 'doctree-read'(<document: <section "pandas.series.abs"...>>,)

@tk0miya
Copy link
Member

tk0miya commented Jan 31, 2018

It seems generated HTML files does not change between 1.7.0b1 and latest.
It might be related with #4198 (also included in 1.7.0b1).

@tk0miya
Copy link
Member

tk0miya commented Jan 31, 2018

This is my dockerfile for the investigation.

FROM tk0miya/sphinx-html

RUN git clone https://github.com/jorisvandenbossche/pandas-test-docs
WORKDIR /docs/pandas-test-docs

RUN pip3 install -U pandas IPython git+https://github.com/numpy/numpydoc.git
RUN pip3 install -U nbconvert
RUN apt update && apt install -y pandoc
RUN pip3 install -U nbconvert nbsphinx
RUN pip3 install -U --pre sphinx
RUN python3 make.py
RUN mv build/html build/html-1.7.0b1

RUN pip3 install -U git+https://github.com/sphinx-doc/sphinx@1.7-release
RUN python3 make.py
RUN diff -uw build/html-1.7.0b1 build/html

And below is a result of diff between 1.7.0b1 and 1.7-release branch (HEAD).

Step 13/13 : RUN diff -uw build/html-1.7.0b1 build/html
 ---> Running in 7b2ceab2daf3
diff -uw build/html-1.7.0b1/.buildinfo build/html/.buildinfo
--- build/html-1.7.0b1/.buildinfo	2018-01-31 00:37:31.000000000 +0000
+++ build/html/.buildinfo	2018-01-31 01:48:41.000000000 +0000
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 20a4fe914e0b0a97e6060ad24319465c
+config: 7e3bdaf278985470d06ab73081203946
 tags: 645f666f9bcd5a90fca523b33c5a78b7
Common subdirectories: build/html-1.7.0b1/_sources and build/html/_sources
Common subdirectories: build/html-1.7.0b1/_static and build/html/_static
Common subdirectories: build/html-1.7.0b1/generated and build/html/generated
The command '/bin/sh -c diff -uw build/html-1.7.0b1 build/html' returned a non-zero code: 1

@jorisvandenbossche
Copy link
Contributor Author

@tk0miya shall I open a new issue for the API docs problem?

@tk0miya
Copy link
Member

tk0miya commented Jan 31, 2018

ah, sorry. I closed this unintentionally (on merging to master).
anyway I think this issue was fixed by #4523.
so could you file a new issue please? I'll take a look later.

Thanks,

@jorisvandenbossche
Copy link
Contributor Author

I re-did timings with 1.5.6, 1.6.6, and 1.7 (release branch):

Version time
1.5 35 - 37 s
1.6 1m20 - 1m28
1.7b 45 - 47s

So current 1.7 release branch is indeed a lot faster as 1.6, although not fully as fast as 1.5 (I think that will enable us to use 1.7 for building the pandas docs)

@tk0miya
Copy link
Member

tk0miya commented Feb 3, 2018

Good to know.
I think next tuning point is autodoc module. But it is too complicated to me. So it might be in 2.0 or 3.0.
Anyway, thank you for reporting. Please let me know if you have another trouble. I'll do my best :-)

@TomAugspurger
Copy link

TomAugspurger commented Feb 3, 2018 via email

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants