Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby 2.1/2.2/2.3 yaml parsing anomaly for comma separated integers #273

Closed
guidoiaquinti opened this issue Feb 25, 2016 · 7 comments
Closed

Comments

@guidoiaquinti
Copy link

A string of digits separated by comma (without quotes) is evaluated as integer. Is it intentional?

irb(main):001:0> Psych.load("key: 123,456")
=> {"key"=>123456}

irb(main):002:0> Psych.load("key: 123456,7890")
=> {"key"=>1234567890}

I understand that the first example can be seen as a number in American notation but the 2nd example is not following any notation/standard that I'm aware of. I was expecting to have both of the values evaluated as string.

With the quotes it works as expected:

irb(main):003:0> Psych.load("key: \"123,456\"")
=> {"key"=>"123,456"}

irb(main):004:0> Psych.load("key: \"123456,7890\"")
=> {"key"=>"123456,7890"}

Thanks

@tenderlove
Copy link
Member

Apparently Japan and China will separate at four digits, but to be honest Psych internals don't care, they just remove all commas and cast to a number. The only reason for this is legacy support for Syck documents. I agree that both values should evaluate to strings, but I'm not sure how to do that and support old formats.

JustinAiken added a commit to usertesting/biscuit that referenced this issue Oct 15, 2019
- Otherwise multiline values break like crazy
- Update specs that would fail otherwise, to show that you must quote
numbers containing commas
  - See ruby/psych#273 for why..
@Jell
Copy link

Jell commented Nov 4, 2020

I've hit this issue today after trying to parse some YAML output by another tool (kustomize build in my case, but not that relevant). I think as YAML becomes more and more ubiquitous it might be a good idea to provide an option for "strict" parsing according to the standard?

It seems to me that we use those regexp to decide whether or not to parse a Scalar as a Float or Integer:

# Taken from http://yaml.org/type/float.html
FLOAT = /^(?:[-+]?([0-9][0-9_,]*)?\.[0-9]*([eE][-+][0-9]+)?(?# base 10)
|[-+]?\.(inf|Inf|INF)(?# infinity)
|\.(nan|NaN|NAN)(?# not a number))$/x
# Taken from http://yaml.org/type/int.html
INTEGER = /^(?:[-+]?0b[0-1_,]+ (?# base 2)
|[-+]?0[0-7_,]+ (?# base 8)
|[-+]?(?:0|[1-9][0-9_,]*) (?# base 10)
|[-+]?0x[0-9a-fA-F_,]+ (?# base 16))$/x

I did this patch on my side as a workaround, which solved the issue I had:

Psych::ScalarScanner.send(:remove_const, "INTEGER")
Psych::ScalarScanner::INTEGER = /^(?:[-+]?0b[0-1_]+          (?# base 2)
                                    |[-+]?0[0-7_]+           (?# base 8)
                                    |[-+]?(?:0|[1-9][0-9_]*) (?# base 10)
                                    |[-+]?0x[0-9a-fA-F_]+    (?# base 16))$/x

(basically replacing the regexp with a subset of the standard one from from http://yaml.org/type/int.html)

Would it be possible to have say two constants:

INTEGER_ENRICHED=/.../
INTEGER_STANDARD=/.../

And chose one or the other based on a flag to keep backward compatibility but offer a stricter/more standard alternative? I can offer to work on a patch if that can help.

@sethboyles
Copy link
Contributor

We would love to see this fixed as well. The current parsing logic does not conform to the YAML specification, so we believe something like @Jell's proposed solution should be considered. Would the maintainers (@tenderlove) of Psych be amenable to a PR implementing the above solution or similar?

@sethboyles
Copy link
Contributor

@tenderlove Do you have a stance on whether or not you would accept a PR that adds a flag to change the behavior to something stricter? We are willing to work on this, but don't want to get started if you don't think it would be worthwhile.

@tenderlove
Copy link
Member

@tenderlove Do you have a stance on whether or not you would accept a PR that adds a flag to change the behavior to something stricter? We are willing to work on this, but don't want to get started if you don't think it would be worthwhile.

Yes, I'm definitely open to it. There was a PR submitted here for strict hash keys. I made a commit here that demonstrates how to accomplish it.

Maybe this commit should just be extended to be "strict parsing" or something (rather than just specific to hash keys).

@sethboyles
Copy link
Contributor

sethboyles commented Jan 14, 2022

@tenderlove I created a PR that implements this (#537). Let me know what you think!

@hsbt
Copy link
Member

hsbt commented Oct 20, 2022

#537 fixed this issue.

@hsbt hsbt closed this as completed Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants