Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues while parsing single quotes '''' #125

Closed
ghyatzo opened this issue Oct 13, 2022 · 8 comments
Closed

Issues while parsing single quotes '''' #125

ghyatzo opened this issue Oct 13, 2022 · 8 comments
Labels

Comments

@ghyatzo
Copy link
Contributor

ghyatzo commented Oct 13, 2022

I encountered an issue where the parser would crash when trying to parse '''' and ''' in a file I was working on.

It seems that single quotes are not supported?

julia> YAML.yaml(Dict("a" => '''))
"a: '\n"
I have no idea what is happening here. Is there some escaping mishaps somewhere?

Thanks.

@kescobo
Copy link
Collaborator

kescobo commented Oct 14, 2022

Your example is behaving as I would expect? Recall that ''' in julia is the Char ':

julia> '''
'\'': ASCII/Unicode U+0027 (category Po: Punctuation, other)

Can you say more about the fields you have, and what's parsing incorrectly?

FWIW, I just used this tool to try

a: '''

which failed. According to the spec, '''' should parse as ', since the 2nd single quote is escaping the 3rd, and the 4th closes it. The web tool I linked handles that case properly, but we're not at the moment, though the example contained in the spec works correctly.

I wrote the following to test round-tripping a few different things.

julia> function read_write(s)
           open("test.yaml", "w") do io
               print(io, "a: ", s, "\n")
           end

           try
               x = YAML.load_file("test.yaml")
               @show x
           catch e
               @show e
           finally
               rm("test.yaml")
           end
       end
read_write (generic function with 1 method)

julia> read_write(""" 'here''s to "quotes"' """)
x = Dict{Any, Any}("a" => "here's to \"quotes\"")
Dict{Any, Any} with 1 entry:
  "a" => "here's to \"quotes\""

julia> read_write(""" '''' """)
e = while parsing a block mapping at line 1, column 0: expected <block end>, but found YAML.ScalarToken at line 1, column 6
while parsing a block mapping at line 1, column 0: expected <block end>, but found YAML.ScalarToken at line 1, column 6

@kescobo kescobo added the bug label Oct 14, 2022
@ghyatzo
Copy link
Contributor Author

ghyatzo commented Oct 14, 2022

I encountered issues exactly while parsing ''''.
I have to parse YAML files which contain the formatting information for a string of text. Such data contains information such as which characters to use to surround an element for example <color> <pre> <bold> name <post>.
The fields <pre> and <post> can (and frequently are) '(single quotes).

I was not sure about the YAML specs, and the source is not entirely reliable so i wasn't sure who was at fault here.
The mention to ''' was just me manually editing the file to no avail, knowing that in julia ''' is parsed correctly.

Regarding the example I wrote, I just now see that it indeed behaves as expected, I misparsed the result to be a: '\n" (missing the opening quote), sorry it was pretty late.
The issues I encounter are due to '''' and not '''.

@kescobo
Copy link
Collaborator

kescobo commented Oct 14, 2022

OK, so it seems like we're not correctly escaping a single quoted string if it occurs that the beginning, but otherwise, we are:

julia> read_write(""" 'a''' """)
x = Dict{Any, Any}("a" => "a'")
Dict{Any, Any} with 1 entry:
  "a" => "a'"

julia> read_write(""" '''a' """)
e = while parsing a block mapping at line 1, column 0: expected <block end>, but found YAML.ScalarToken at line 1, column 6
a:  '''a'
julia> read_write(""" '''a''' """)
e = while parsing a block mapping at line 1, column 0: expected <block end>, but found YAML.ScalarToken at line 1, column 6
a:  '''a'''
julia> read_write(""" 'a''''' """)
x = Dict{Any, Any}("a" => "a''")
Dict{Any, Any} with 1 entry:
  "a" => "a''"

Is it correct to say that there's no problem with writing single quotes that you've identified?

@ghyatzo
Copy link
Contributor Author

ghyatzo commented Oct 14, 2022

Yes, I edited the original issue. It was me not understanding anyhting.
I encountered no issue with writing quotes so far.
only with parsing '''' really.

@kescobo kescobo changed the title Issues while parsing (and writing) strings containing single quotes ' Issues while parsing strings containing single quotes ' Oct 14, 2022
@ghyatzo ghyatzo changed the title Issues while parsing strings containing single quotes ' Issues while parsing single quotes '''' Oct 14, 2022
@kescobo
Copy link
Collaborator

kescobo commented Oct 14, 2022

Alright, one more example for completeness:

julia> read_write(""" ' ''' """)
x = Dict{Any, Any}("a" => " '")
Dict{Any, Any} with 1 entry:
  "a" => " '"

julia> read_write(""" '''' """)
e = while parsing a block mapping at line 1, column 0: expected <block end>, but found YAML.ScalarToken at line 1, column 6

Unfortunately, while I'm technically a maintainer of this package for historical reasons, I've never actually gotten down and dirty with the parser, so might have to wait on a fix.

If it's any help, the error appears to come from here.

@ghyatzo
Copy link
Contributor Author

ghyatzo commented Oct 14, 2022

I can also provide the whole stacktrace, maybe it gives a little bit more insight. (It is quite hefty tho).

ERROR: while parsing a block collection at line 194, column 6: expected <block end>, but found YAML.ScalarToken at line 195, column 10
Stacktrace:
  [1] parse_block_sequence_entry(stream::YAML.EventStream)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/parser.jl:394
  [2] peek(stream::YAML.EventStream)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/parser.jl:54
  [3] _compose_sequence_node(start_event::YAML.SequenceStartEvent, composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:137
  [4] compose_sequence_node(composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:147
  [5] handle_event(event::YAML.SequenceStartEvent, composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:80
  [6] compose_node(composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:94
  [7] __compose_sequence_node(event::YAML.SequenceStartEvent, composer::YAML.Composer, node::YAML.SequenceNode)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:120
  [8] _compose_sequence_node(start_event::YAML.SequenceStartEvent, composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:138
  [9] compose_sequence_node(composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:147
 [10] handle_event(event::YAML.SequenceStartEvent, composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:80
 [11] compose_node(composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:94
 [12] __compose_sequence_node(event::YAML.SequenceStartEvent, composer::YAML.Composer, node::YAML.SequenceNode)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:120
 [13] _compose_sequence_node(start_event::YAML.SequenceStartEvent, composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:138
 [14] compose_sequence_node(composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:147
 [15] handle_event(event::YAML.SequenceStartEvent, composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:80
 [16] compose_node(composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:94
 [17] __compose_sequence_node(event::YAML.SequenceStartEvent, composer::YAML.Composer, node::YAML.SequenceNode)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:120
 [18] _compose_sequence_node(start_event::YAML.SequenceStartEvent, composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:138
 [19] compose_sequence_node(composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:147
 [20] handle_event(event::YAML.SequenceStartEvent, composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:80
 [21] compose_node(composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:94
 [22] __compose_mapping_node(event::YAML.ScalarEvent, composer::YAML.Composer, node::YAML.MappingNode)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:154
 [23] _compose_mapping_node(start_event::YAML.MappingStartEvent, composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:174
 [24] compose_mapping_node(composer::YAML.Composer, anchor::Nothing)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:183
 [25] handle_event(event::YAML.MappingStartEvent, composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:86
 [26] compose_node(composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:94
 [27] compose_document(composer::YAML.Composer)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:50
 [28] compose(events::YAML.EventStream)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/composer.jl:38
 [29] load(ts::YAML.TokenStream, constructor::YAML.Constructor)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/YAML.jl:38
 [30] load(ts::YAML.TokenStream, more_constructors::Nothing, multi_constructors::Dict{Any, Any}; dicttype::Type{Dict{Any, Any}}, constructorType::typeof(YAML.SafeConstructor))
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/YAML.jl:44
 [31] load
    @ ~/.local/julia/packages/YAML/IXUQ4/src/YAML.jl:44 [inlined]
 [32] #load#10
    @ ~/.local/julia/packages/YAML/IXUQ4/src/YAML.jl:47 [inlined]
 [33] load (repeats 3 times)
    @ ~/.local/julia/packages/YAML/IXUQ4/src/YAML.jl:47 [inlined]
 [34] #16
    @ ~/.local/julia/packages/YAML/IXUQ4/src/YAML.jl:96 [inlined]
 [35] open(::YAML.var"#16#17"{Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Tuple{}}, ::String, ::Vararg{String}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base ./io.jl:384
 [36] open
    @ ./io.jl:381 [inlined]
 [37] #load_file#15
    @ ~/.local/julia/packages/YAML/IXUQ4/src/YAML.jl:94 [inlined]
 [38] load_file(::String)
    @ YAML ~/.local/julia/packages/YAML/IXUQ4/src/YAML.jl:94
 [39] top-level scope
    @ REPL[3]:1

I'll try to give a look at it as well.

@ghyatzo
Copy link
Contributor Author

ghyatzo commented Oct 16, 2022

I believe I have identified the issue to be in this while block:

YAML.jl/src/scanner.jl

Lines 1232 to 1235 in fa25484

while peek(stream.input) != q
append!(chunks, scan_flow_scalar_spaces(stream, double, start_mark))
append!(chunks, scan_flow_scalar_non_spaces(stream, double, start_mark))
end

in particular the line while peek(stream.input) != q.

We reach this line after encountering a quote character in the BufferInput.
The condition while "the next character is not a quote" : do stuff breaks when you have more than one quote at the beginning of the line since, for example, for a string like '''a' we have q = '\'' and peek(buffer) == '\''.

I modified the line to

while peek(stream.input) != q || peek(stream.input, 1) == q

where the reasoning is "If the next character is not a quote, procede. But, if the next character is a quote, procede if also the next after that is a quote." We basically allow the program to recognise '' as a valid character to include in the chunks.
If we encounter a double single quote it gets eaten up by this bit

YAML.jl/src/scanner.jl

Lines 1284 to 1286 in fa25484

if !double && c == '\'' && peek(stream.input, 1) == '\''
push!(chunks, '\'')
forwardchars!(stream, 2)

This is the testing I've done so far:

julia> function read_write(s)
           open("test.yaml", "w") do io
               print(io, "a: ", s, "\n")
           end
       
           try
               x = YAML.load_file("test.yaml")
               @show x
           catch e
               @show e
           finally
               rm("test.yaml")
           end
       end
read_write (generic function with 1 method)
julia> begin
       read_write(""" 'here''s to "quotes"' """)
       read_write(""" '''' """)
       read_write(""" 'a''' """)
       read_write(""" '''a' """)
       read_write(""" '''a''' """)
       read_write(""" 'a''''' """)
       read_write(""" ' ''' """)
       end
x = Dict{Any, Any}("a" => "here's to \"quotes\"")
x = Dict{Any, Any}("a" => "'")
x = Dict{Any, Any}("a" => "a'")
x = Dict{Any, Any}("a" => "'a")
x = Dict{Any, Any}("a" => "'a'")
x = Dict{Any, Any}("a" => "a''")
x = Dict{Any, Any}("a" => " '")
Dict{Any, Any} with 1 entry:
  "a" => " '"

And

pkg > test YAML
Test Summary:     | Pass  Total  Time
test = spec-02-01 |    9      9  1.5s
Test Summary:     | Pass  Total  Time
test = spec-02-02 |    9      9  0.3s
Test Summary:     | Pass  Total  Time
test = spec-02-03 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-04 |    9      9  0.1s
Test Summary:     | Pass  Total  Time
test = spec-02-05 |    9      9  0.1s
Test Summary:     | Pass  Total  Time
test = spec-02-06 |    9      9  0.1s
Test Summary:     | Pass  Total  Time
test = spec-02-07 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-08 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-09 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-10 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-11 |    9      9  0.2s
Test Summary:     | Pass  Total  Time
test = spec-02-12 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-13 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-14 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-15 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-16 |    9      9  0.0s
WARNING: I do not test the writing of spec-02-17
Test Summary:     | Pass  Total  Time
test = spec-02-17 |    8      8  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-18 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-19 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-20 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-21 |    9      9  0.1s
Test Summary:     | Pass  Total  Time
test = spec-02-22 |    9      9  0.0s
Test Summary:     | Pass  Total  Time
test = spec-02-23 |    9      9  0.1s
Test Summary:       | Pass  Total  Time
test = empty_scalar |    9      9  0.0s
Test Summary:              | Pass  Total  Time
test = no_trailing_newline |    9      9  0.0s
Test Summary:           | Pass  Total  Time
test = windows_newlines |    9      9  0.0s
WARNING: I do not test the writing of escape_sequences
Test Summary:           | Pass  Total  Time
test = escape_sequences |    8      8  0.0s
Test Summary:  | Pass  Total  Time
test = issue15 |    9      9  0.0s
Test Summary:  | Pass  Total  Time
test = issue30 |    9      9  0.0s
Test Summary:  | Pass  Total  Time
test = issue36 |    9      9  0.0s
Test Summary:  | Pass  Total  Time
test = issue39 |    9      9  0.0s
WARNING: I do not test the writing of cartesian
Test Summary:    | Pass  Total  Time
test = cartesian |    8      8  0.3s
WARNING: I do not test the writing of ar1
Test Summary: | Pass  Total  Time
test = ar1    |    8      8  0.1s
WARNING: I do not test the writing of ar1_cartesian
Test Summary:        | Pass  Total  Time
test = ar1_cartesian |    8      8  0.0s
Test Summary:   | Pass  Total  Time
test = merge-01 |    9      9  0.0s
Test Summary:        | Pass  Total  Time
test = version-colon |    9      9  0.0s
WARNING: I do not test the writing of multi-constructor
Test Summary:            | Pass  Total  Time
test = multi-constructor |    8      8  0.0s
Test Summary:    | Pass  Total  Time
test = utf-8-bom |    9      9  0.0s
Test Summary:    | Pass  Total  Time
test = utf-32-be |    9      9  0.1s
Test Summary:    | Pass  Total  Time
encoding = UTF-8 |    4      4  0.1s
Test Summary:       | Pass  Total  Time
encoding = UTF-16BE |    4      4  0.1s
Test Summary:       | Pass  Total  Time
encoding = UTF-16LE |    4      4  0.1s
Test Summary:       | Pass  Total  Time
encoding = UTF-32BE |    4      4  0.1s
Test Summary:       | Pass  Total  Time
encoding = UTF-32LE |    4      4  0.1s
Test Summary: | Pass  Total  Time
multi_doc_bom |    4      4  0.1s
Test Summary:             | Pass  Total  Time
dicttype = Dict{Any, Any} |    4      4  0.0s
Test Summary:                | Pass  Total  Time
dicttype = Dict{String, Any} |    4      4  0.1s
Test Summary:                | Pass  Total  Time
dicttype = Dict{Symbol, Any} |    4      4  0.0s
Test Summary:                                          | Pass  Total  Time
dicttype = OrderedCollections.OrderedDict{String, Any} |    5      5  0.1s
Test Summary:  | Pass  Total  Time
dicttype = #13 |    5      5  0.1s
Test Summary:            | Pass  Total  Time
error test = invalid-tag |    1      1  0.0s
Test Summary:      | Pass  Total  Time
Custom Constructor |    2      2  0.0s
Test Summary: | Pass  Total  Time
issue114      |    5      5  0.1s
     Testing YAML tests passed 

P.S.: This package is a layered maze...

kescobo pushed a commit that referenced this issue Oct 19, 2022
* Fix single quotes parsing (#125) plus test cases.

* added test_throws cases

* Fixed bad test throws syntax, forgot the error type to be expected
@kescobo
Copy link
Collaborator

kescobo commented Oct 19, 2022

Closed by #126

@kescobo kescobo closed this as completed Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants