-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix char, codepoint for single digit hex escapes
- Loading branch information
Showing
6 changed files
with
99 additions
and
90 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
108 changes: 18 additions & 90 deletions
108
lib/regexp_parser/expression/classes/escape_sequence.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Regexp::Expression::EscapeSequence::Base.class_eval do | ||
def char | ||
codepoint.chr('utf-8') | ||
end | ||
end |
68 changes: 68 additions & 0 deletions
68
lib/regexp_parser/expression/methods/escape_sequence_codepoint.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
module Regexp::Expression::EscapeSequence | ||
AsciiEscape.class_eval { def codepoint; 0x1B end } | ||
Backspace.class_eval { def codepoint; 0x8 end } | ||
Bell.class_eval { def codepoint; 0x7 end } | ||
FormFeed.class_eval { def codepoint; 0xC end } | ||
Newline.class_eval { def codepoint; 0xA end } | ||
Return.class_eval { def codepoint; 0xD end } | ||
Tab.class_eval { def codepoint; 0x9 end } | ||
VerticalTab.class_eval { def codepoint; 0xB end } | ||
|
||
Literal.class_eval { def codepoint; text[1].ord end } | ||
|
||
Octal.class_eval { def codepoint; text[/\d+/].to_i(8) end } | ||
|
||
Hex.class_eval { def codepoint; text[/\h+/].hex end } | ||
Codepoint.class_eval { def codepoint; text[/\h+/].hex end } | ||
|
||
CodepointList.class_eval do | ||
# Maybe this should be a unique top-level expression class? | ||
def char | ||
raise NoMethodError, 'CodepointList responds only to #chars' | ||
end | ||
|
||
def codepoint | ||
raise NoMethodError, 'CodepointList responds only to #codepoints' | ||
end | ||
|
||
def chars | ||
codepoints.map { |cp| cp.chr('utf-8') } | ||
end | ||
|
||
def codepoints | ||
text.scan(/\h+/).map(&:hex) | ||
end | ||
end | ||
|
||
AbstractMetaControlSequence.class_eval do | ||
private | ||
|
||
def control_sequence_to_s(control_sequence) | ||
five_lsb = control_sequence.unpack('B*').first[-5..-1] | ||
["000#{five_lsb}"].pack('B*') | ||
end | ||
|
||
def meta_char_to_codepoint(meta_char) | ||
byte_value = meta_char.ord | ||
byte_value < 128 ? byte_value + 128 : byte_value | ||
end | ||
end | ||
|
||
Control.class_eval do | ||
def codepoint | ||
control_sequence_to_s(text).ord | ||
end | ||
end | ||
|
||
Meta.class_eval do | ||
def codepoint | ||
meta_char_to_codepoint(text[-1]) | ||
end | ||
end | ||
|
||
MetaControl.class_eval do | ||
def codepoint | ||
meta_char_to_codepoint(control_sequence_to_s(text)) | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters