Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In some HTML elements, attribute names need to be case-sensitive to take effect, for example, the viewBox attribute within the <svg> element. #297

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

changyy
Copy link

@changyy changyy commented Dec 17, 2023

#294

Based on the recent development of the EPUB format in recent years, more and more people are using SVG for image layout in XHTML. Currently, when using SVG, attributes within the viewBox are converted to lowercase, and during testing in Apple Books or Chrome Browser, it was found that <svg viewbox="0 0 960 1080"> is ineffective until it is adjusted to <svg viewBox="0 0 960 1080">.

Upon investigation, it was found that the issue originates from the Python lxml package. After processing with html.document_fromstring, the attributes are converted to lowercase. While this aligns with XML conventions, it is not suitable for HTML5.

Currently, in the parse_html_string function within ebooklib/utils.py, there is an attempt to perform a round of checks on the html_tree to handle the attributes of elements that need to be converted to uppercase.

The list of attributes comes from: https://www.w3.org/TR/SVG/attindex.html

Javascript code:

let targetList = {};
document.querySelectorAll("body > table > tbody > tr").forEach(function(trElement) {
    var target = trElement.querySelector("th > span > a > span");
    if (target && target.textContent && /[A-Z]/.test(target.textContent)) {
        targetList[target.textContent.toLowerCase()] = target.textContent;
    }
})
JSON.stringify(targetList, null, 2);

…ake effect, for example, the viewBox attribute within the <svg> element.

Based on the recent development of the EPUB format in recent years, more and more people are using SVG for image layout in XHTML. Currently, when using SVG, attributes within the viewBox are converted to lowercase, and during testing in `Apple Books` or `Chrome Browser`, it was found that `<svg viewbox="0 0 960 1080">` is ineffective until it is adjusted to `<svg viewBox="0 0 960 1080">`.

Upon investigation, it was found that the issue originates from the Python lxml package. After processing with `html.document_fromstring`, the attributes are converted to lowercase. While this aligns with XML conventions, it is not suitable for HTML5.

Currently, in the `parse_html_string` function within `ebooklib/utils.py`, there is an attempt to perform a round of checks on the html_tree to handle the attributes of elements that need to be converted to uppercase.

The list of attributes comes from: https://www.w3.org/TR/SVG/attindex.html

Javascript code:

```
let targetList = {};
document.querySelectorAll("body > table > tbody > tr").forEach(function(trElement) {
    var target = trElement.querySelector("th > span > a > span");
    if (target && target.textContent && /[A-Z]/.test(target.textContent)) {
        targetList[target.textContent.toLowerCase()] = target.textContent;
    }
})
JSON.stringify(targetList, null, 2);
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant