Skip to content

Commit

Permalink
Fix performance issue caused by using repeated > characters after `…
Browse files Browse the repository at this point in the history
… <!DOCTYPE name` (#173)

A `<` is treated as a string delimiter. 
In certain cases, if `<` is used in succession, read and match are
repeated, which slows down the process. Therefore, the following is used
to read ahead to a specific part of the string in advance.
  • Loading branch information
Watson1978 authored Jul 16, 2024
1 parent 9f1415a commit c33ea49
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 1 deletion.
3 changes: 2 additions & 1 deletion lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ module Private
INSTRUCTION_TERM = "?>"
COMMENT_TERM = "-->"
CDATA_TERM = "]]>"
DOCTYPE_TERM = "]>"
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
Expand Down Expand Up @@ -384,7 +385,7 @@ def pull_event
end
return [ :comment, md[1] ] if md
end
elsif match = @source.match(/(%.*?;)\s*/um, true)
elsif match = @source.match(/(%.*?;)\s*/um, true, term: Private::DOCTYPE_TERM)
return [ :externalentity, match[1] ]
elsif @source.match(/\]\s*>/um, true)
@document_status = :after_doctype
Expand Down
14 changes: 14 additions & 0 deletions test/parse/test_document_type_declaration.rb
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
# frozen_string_literal: false
require "test/unit"
require "core_assertions"

require "rexml/document"

module REXMLTests
class TestParseDocumentTypeDeclaration < Test::Unit::TestCase
include Test::Unit::CoreAssertions

private
def parse(doctype)
REXML::Document.new(<<-XML).doctype
Expand Down Expand Up @@ -276,6 +280,16 @@ def test_notation_attlist
doctype.children.collect(&:class))
end

def test_gt_linear_performance_malformed_entity
seq = [10000, 50000, 100000, 150000, 200000]
assert_linear_performance(seq, rehearsal: 10) do |n|
begin
REXML::Document.new('<!DOCTYPE root [' + "%>" * n + ']><test/>')
rescue
end
end
end

private
def parse(internal_subset)
super(<<-DOCTYPE)
Expand Down

0 comments on commit c33ea49

Please sign in to comment.