-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test removing <meta charset="UTF-8"/>
in known reading systems to see if it is really necessary
#470
Comments
I don't see any problems with that. |
Hi @martinpub and @AndersEkl Making this change should not be that complicated, and I don't have any strong opinions on the matter. I know from experience that not having explicit definitions of the charset could lead to many headaches for the developers and users of readers. Hopefully, the reading systems using these files are reading the files as XML files instead of HTML, but I don't think there is any guarantee that it would be the case. So the safes thing to do is figuring out why these meta data tags are removed in the processing. Perhaps other important information is lost as well. Best regards |
Thanks for your input @kalaspuffar. I assumed even an HTML(5) parser would interpret the data without this line as UTF-8, but perhaps that's not always true? In that case, I agree with checking the processing tool. Let's leave this open for now. |
Decided on validation group meeting on June 26 to keep this requirement. |
This issue was raised in the calibre editor bug reporting system, where the main developer pointed out the redundancy of such a declaration. As this piece of information is not mentioned in the guidelines, I argue that it be removed from the validation ruleset. |
@josteinaj If you have the time, I would be interested in your input on this issue :-) |
@martinpub sure I agree with @kalaspuffar: #470 (comment) According to the standards, there's no extra information in this meta tag.
I think this issue is mainly about what the purpose of the markup is. For production purposes, the meta tag doesn't give any extra information. However, for compliance with all reading systems, especially older ones, the meta tag might be necessary. (whops, accidentally closed. I reopened again…) |
Thanks @josteinaj! I think we can conclude that the requirement of the meta tag should be removed, as it is redundant. |
I accidentally closed and posted my incomplete comment, sorry. The issue is what the purpose of the markup is, whether it's meant for production, or also for distribution and compliance with all reading systems. |
We don't have EPUB 3 in distribution yet, so I don't really know. But it seems to me that if a reading system supports EPUB 3.2, then it should support XHTML(5) content documents. And the XML declaration will be the appropriate place to declare the character encoding. So either we can:
I would argue for proceeding with 2, if we currently do not have any known issues with reading systems (or other systems where parsing of the contents of the EPUB 3 packages are at play). |
Hi @martinpub When we have these discussions, I'm always cautious about creating issues for the end-users. However, in this case, we want to introduce a change that has a marginal impact on the producers and marginal impact on the size and readability of the epub document but might introduce an issue for an end-user whose reading system might not want to read the file outright. So I would vote for 1, making a change just because it is not a good idea, in my opinion. Best regards |
I'd vote for 1 as well. Reading systems that claim to be compliant with EPUB 3.2 (or any reading system really) will have a HTML rendering engine built in. In many cases this is Chromium or another web engine. If the rendering engine chooses to parse the document as HTML instead of XHTML, then we should have the meta tag. To make sure that the rendering engine uses XHTML and not HTML, we need to at least use the xhtml file extension instead of html, and possibly also declare the XHTML doctype. There might be other requirements for having the rendering engine choose XHTML over HTML as well, I'm not sure. Some reading systems might even just go straight for a HTML rendering engine and assume that it won't cause problems (which in most cases it won't). For instance, I don't know what e-readers does (Kindle, Kobo, etc.) or some mobile apps. When we distribute a HTML version of our books, we use the html file extension instead of xhtml as we've had problems with xhtml in the past (it was probably a Internet Explorer-thing, I don't quite remember). |
Thanks for your comments @kalaspuffar @josteinaj, and I think I agree, I'm just impatient getting our workflow going smoothly :-) Let's leave this open for now and return to next steps at the validation meeting. |
<meta charset="UTF-8"/>
optional in head of content documents?<meta charset="UTF-8"/>
in known reading systems to see if it is really necessary
Adjusted the headline of this issue to suggestion 1 in my comment #470 (comment). |
Decision from validator group meeting on October 15: Martin to test. |
As can be seen in the quoted part of the RelaxNG schema file, the line
<meta charset="UTF-8"/>
is strictly required as the first child of content documents'<head>
.At MTM we are experiencing issues with EPUB editors removing that line when the document is processed. Investigating this, I'm starting to think that the strict requirement of this line is perhaps not well motivated in the 2020-1 validator edition. 1. It is not documented in the guidelines. 2. It can be considered redundant in an XHTML 5 setting, given that a. UTF-8 is the default encoding for HTML5, and b. for the XML serialization of HTML5 used in EPUB 3, the character encoding of the document will be recorded in the XML declaration (
<?xml version="1.0" encoding="UTF-8"?>
).See also the example given in the HTML5 spec. Also, the EPUB 3.2 Content Doc specification does not mention any requirement to use the meta tag to specify the encoding of the document.
If you agree that this is redundant information, my suggestion is to make this optional in the validator for the 2020-1 guidelines. Ping @AndersEkl @kalaspuffar.
nordic-epub3-dtbook-migrator/src/main/resources/xml/schema/2020-1/nordic-html5.rng
Lines 531 to 538 in c9e59ff
The text was updated successfully, but these errors were encountered: