Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newest html5lib 0.999999999 breaks rendering #334

Closed
OktarinTentakel opened this issue Jul 15, 2016 · 9 comments
Closed

Newest html5lib 0.999999999 breaks rendering #334

OktarinTentakel opened this issue Jul 15, 2016 · 9 comments
Labels
bug Existing features not working as expected crash Problems preventing documents from being rendered

Comments

@OktarinTentakel
Copy link

OktarinTentakel commented Jul 15, 2016

Tested on OS X according to install instructions with Python 2.7 and 3.4

Every rendering exits with:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/wsgiref/handlers.py", line 85, in run
    self.result = application(self.environ, self.start_response)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/weasyprint/navigator.py", line 143, in app
    return make_response(render_template(url))
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/weasyprint/navigator.py", line 65, in render_template
    html = HTML(url)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/weasyprint/__init__.py", line 92, in __init__
    namespaceHTMLElements=False)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/html5parser.py", line 35, in parse
    return p.parse(doc, **kwargs)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse
    self._parse(stream, False, None, *args, **kwargs)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse
    self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__
    self.stream = HTMLInputStream(stream, **kwargs)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/_inputstream.py", line 151, in HTMLInputStream
    return HTMLBinaryInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'encoding'

Going back two versions (seven 9s after the dot oO) fixes this.

@SimonSapin
Copy link
Member

This might be

https://github.com/html5lib/html5lib-python/blob/master/CHANGES.rst#user-content-09999999910b9

Replace the charset keyword argument on parse and related methods with a set of keyword arguments: override_encoding, transport_encoding, same_origin_parent_encoding, likely_encoding, and default_encoding.

Although that says charset an the backtrack says encoding.

Anyway, I think WeasyPrint should be updated for the newer html5lib and the requirements in setup.py changed accordingly. CC @liZe

@liZe liZe closed this as completed in f1019b8 Jul 15, 2016
@liZe
Copy link
Member

liZe commented Jul 15, 2016

The argument was encoding, and we now have to pick one from (override | transport | same_origin_parent | likely | default)_encoding. We should read the documentation, but, well, there's no documentation, of course. So, it's default_encoding now, and if it breaks anything in WeasyPrint, we'll randomly change that 😉.

@liZe liZe added the bug Existing features not working as expected label Jul 15, 2016
@liZe liZe added the crash Problems preventing documents from being rendered label Jul 15, 2016
@liZe
Copy link
Member

liZe commented Jul 15, 2016

It doesn't work, because it crashes when input is unicode (we can't blame them for that).

@SimonSapin
Copy link
Member

The terminology is based on https://html.spec.whatwg.org/multipage/#determining-the-character-encoding

  • override_encoding is probably appropriate for the encoding parameter of WeasyPrint’s HTML class.
  • transport_encoding is the one from HTTP Content-Type. (The encoding key in "URL fetchers" return values.)
  • same_origin_parent refers to parent is the sense that an <iframe> element can introduced a child HTML document. This can be ignored since we don’t implement <iframe>. Or the cross-origin policy.
  • likely_encoding, the spec says “if the user agent has information on the likely encoding for this page”. Safe to declare we don’t.
  • default_encoding, the spec suggests UTF-8 in "controlled environment", but I don’t think WeasyPrint can make that kind of assumption about how it’s used. Then “In other environments, the default encoding is typically dependent on the user's locale” which is kinda terrible since it can lead to “works on my machine” sites broken for users with a different locale. But that’s what many browsers do today. I’ve heard of ideas/experiments to use the site’s top-level-domain instead but I don’t know where that’s at. Let’s not bother with any of this.

So I think we should override and transport instead of default.

@gsnedders How does this sound?

liZe added a commit that referenced this issue Jul 15, 2016
@liZe
Copy link
Member

liZe commented Jul 15, 2016

I've used override for our encoding parameter and transport for our protocol (given by the URL fetcher). Tests pass, but I'm not really sure…

@gsnedders
Copy link

@SimonSapin that sounds right

And yes, @liZe, docs are one of the two big things blocking 1.0 now…

@liZe
Copy link
Member

liZe commented Jul 15, 2016

@gsnedders No offense, that's a problem for other projects too 😉. Thanks for your hard work!

jsonn referenced this issue in jsonn/pkgsrc Jan 15, 2017
Version 0.34
------------

Released on 2016-12-21.

Bug fixes:

* `#398 <https://github.com/Kozea/WeasyPrint/issues/398>`_:
  Honor the presentational_hints option for PDFs.
* `#399 <https://github.com/Kozea/WeasyPrint/pull/399>`_:
  Avoid CairoSVG-2.0.0rc* on Python 2.
* `#396 <https://github.com/Kozea/WeasyPrint/issues/396>`_:
  Correctly close files open by mkstemp.
* `#403 <https://github.com/Kozea/WeasyPrint/issues/403>`_:
  Cast the number of columns into int.
* Fix multi-page multi-columns and add related tests.


Version 0.33
------------

Released on 2016-11-28.

New features:

* `#393 <https://github.com/Kozea/WeasyPrint/issues/393>`_:
  Add tests on MacOS.
* `#370 <https://github.com/Kozea/WeasyPrint/issues/370>`_:
  Enable @font-face on MacOS.

Bug fixes:

* `#389 <https://github.com/Kozea/WeasyPrint/issues/389>`_:
  Always update resume_at when splitting lines.
* `#394 <https://github.com/Kozea/WeasyPrint/issues/394>`_:
  Don't build universal wheels.
* `#388 <https://github.com/Kozea/WeasyPrint/issues/388>`_:
  Fix logic when finishing block formatting context.


Version 0.32
------------

Released on 2016-11-17.

New features:

* `#28 <https://github.com/Kozea/WeasyPrint/issues/28>`_:
  Support @font-face on Linux.
* Support CSS fonts level 3 almost entirely, including OpenType features.
* `#253 <https://github.com/Kozea/WeasyPrint/issues/253>`_:
  Support presentational hints (optional).
* Support break-after, break-before and break-inside for pages and columns.
* `#384 <https://github.com/Kozea/WeasyPrint/issues/384>`_:
  Major performance boost.

Bux fixes:

* `#368 <https://github.com/Kozea/WeasyPrint/issues/368>`_:
  Respect white-space for shrink-to-fit.
* `#382 <https://github.com/Kozea/WeasyPrint/issues/382>`_:
  Fix the preferred width for column groups.
* Handle relative boxes in column-layout boxes.

Documentation:

* Add more and more documentation about Windows installation.
* `#355 <https://github.com/Kozea/WeasyPrint/issues/355>`_:
  Add fonts requirements for tests.


Version 0.31
------------

Released on 2016-08-28.

New features:

* `#124 <https://github.com/Kozea/WeasyPrint/issues/124>`_:
  Add MIME sniffing for images.
* `#60 <https://github.com/Kozea/WeasyPrint/issues/60>`_:
  CSS Multi-column Layout.
* `#197 <https://github.com/Kozea/WeasyPrint/pull/197>`_:
  Add hyphens at line breaks activated by a soft hyphen.

Bux fixes:

* `#132 <https://github.com/Kozea/WeasyPrint/pull/132>`_:
  Fix Python 3 compatibility on Windows.

Documentation:

* `#329 <https://github.com/Kozea/WeasyPrint/issues/329>`_:
  Add documentation about installation on Windows.


Version 0.30
------------

Released on 2016-07-18.

WeasyPrint now depends on html5lib-0.999999999.

Bux fixes:

* Fix Acid2
* `#325 <https://github.com/Kozea/WeasyPrint/issues/325>`_:
  Cutting lines is broken in page margin boxes.
* `#334 <https://github.com/Kozea/WeasyPrint/issues/334>`_:
  Newest html5lib 0.999999999 breaks rendering.


Version 0.29
------------

Released on 2016-06-17.

Bug fixes:

* `#263 <https://github.com/Kozea/WeasyPrint/pull/263>`_:
  Don't crash with floats with percents in positions.
* `#323 <https://github.com/Kozea/WeasyPrint/pull/323>`_:
  Fix CairoSVG 2.0 pre-release dependency in Python 2.x.


Version 0.28
------------

Released on 2016-05-16.

Bug fixes:

* `#189 <https://github.com/Kozea/WeasyPrint/issues/189>`_:
  ``white-space: nowrap`` still wraps on hyphens
* `#305 <https://github.com/Kozea/WeasyPrint/issues/305>`_:
  Fix crashes on some tables
* Don't crash when transform matrix isn't invertible
* Don't crash when rendering ratio-only SVG images
* Fix margins and borders on some tables


Version 0.27
------------

Released on 2016-04-08.

New features:

* `#295 <https://github.com/Kozea/WeasyPrint/pull/295>`_:
  Support the 'rem' unit.
* `#299 <https://github.com/Kozea/WeasyPrint/pull/299>`_:
  Enhance the support of SVG images.

Bug fixes:

* `#307 <https://github.com/Kozea/WeasyPrint/issues/307>`_:
  Fix the layout of cells larger than their tables.

Documentation:

* The website is now on GitHub Pages, the documentation is on Read the Docs.
* `#297 <https://github.com/Kozea/WeasyPrint/issues/297>`_:
  Rewrite the CSS chapter of the documentation.
@mwangikinuthia
Copy link

so what should i do if this error arises?

@liZe
Copy link
Member

liZe commented Jan 9, 2018

so what should i do if this error arises?

Use the latest versions of WeasyPrint and html5lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Existing features not working as expected crash Problems preventing documents from being rendered
Projects
None yet
Development

No branches or pull requests

5 participants