From 78f8712dccad773a51dc5eef31c02d523e994570 Mon Sep 17 00:00:00 2001 From: KITAITI Makoto Date: Sun, 29 Sep 2024 15:57:03 +0900 Subject: [PATCH] Fix handling with "xml:" prefixed namespace (#208) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit I found parsing XHTML documents like below fails since v3.3.3: ```xml XHTML Document

XHTML Document

この段落は日本語です。

``` [XML namespace spec][spec] is a little bit ambiguous but document above is valid according to an [article W3C serves][article]. I fixed the parsing algorithm. Can you review it? As an aside, `` style language declaration is often used in XHTML files included in EPUB files because [sample EPUB files][samples] provided by IDPF, former EPUB spec authority, use the style. [spec]: https://www.w3.org/TR/REC-xml-names/#defaulting [article]: https://www.w3.org/International/questions/qa-html-language-declarations#attributes [samples]: https://github.com/IDPF/epub3-samples --- lib/rexml/parsers/baseparser.rb | 5 +++-- test/parser/test_base_parser.rb | 35 +++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb index 89a9d0b6..a567e045 100644 --- a/lib/rexml/parsers/baseparser.rb +++ b/lib/rexml/parsers/baseparser.rb @@ -156,6 +156,7 @@ module Private default_entities.each do |term| DEFAULT_ENTITIES_PATTERNS[term] = /&#{term};/ end + XML_PREFIXED_NAMESPACE = "http://www.w3.org/XML/1998/namespace" end private_constant :Private @@ -185,7 +186,7 @@ def stream=( source ) @tags = [] @stack = [] @entities = [] - @namespaces = {} + @namespaces = {"xml" => Private::XML_PREFIXED_NAMESPACE} @namespaces_restore_stack = [] end @@ -790,7 +791,7 @@ def parse_attributes(prefixes) @source.match(/\s*/um, true) if prefix == "xmlns" if local_part == "xml" - if value != "http://www.w3.org/XML/1998/namespace" + if value != Private::XML_PREFIXED_NAMESPACE msg = "The 'xml' prefix must not be bound to any other namespace "+ "(http://www.w3.org/TR/REC-xml-names/#ns-decl)" raise REXML::ParseException.new( msg, @source, self ) diff --git a/test/parser/test_base_parser.rb b/test/parser/test_base_parser.rb index 17d01979..da169a25 100644 --- a/test/parser/test_base_parser.rb +++ b/test/parser/test_base_parser.rb @@ -23,5 +23,40 @@ def test_large_xml parser.position < xml.bytesize end end + + def test_attribute_prefixed_by_xml + xml = <<-XML + + + + + XHTML Document + + +

XHTML Document

+

この段落は日本語です。

+ + + XML + + parser = REXML::Parsers::BaseParser.new(xml) + 5.times {parser.pull} + + html = parser.pull + assert_equal([:start_element, + "html", + {"xmlns" => "http://www.w3.org/1999/xhtml", + "xml:lang" => "en", + "lang" => "en"}], + html) + + 15.times {parser.pull} + + p = parser.pull + assert_equal([:start_element, + "p", + {"xml:lang" => "ja", "lang" => "ja"}], + p) + end end end