Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Psych.safe_load by default #487

Merged
merged 3 commits into from
May 13, 2021
Merged

Use Psych.safe_load by default #487

merged 3 commits into from
May 13, 2021

Conversation

tenderlove
Copy link
Member

@tenderlove tenderlove commented May 10, 2021

Psych.load is not safe for use with untrusted data. Too many
applications make the mistake of using Psych.load with untrusted data
and that ends up with some kind of security vulnerability.

This commit changes the default Psych.load to use safe_load. Users
that want to parse trusted data can use Psych.unsafe_load.

Psych.load or YAML.load are susceptible to a Remote Code Execution (or RCE) flaw. The problem isn't with the load functions themselves, but they can be used to abuse other objects that are in the system to escalate to an RCE. This blog post goes in to details about how YAML can be used in conjunction with RubyGems to execute arbitrary code. If someone tries to load YAML from an untrusted source, bad actors could use this issue to execute arbitrary code in the target system.

Psych provides a method called "safe_load" which can't be used to launch such an attack. The proposal here is to change the default such that "load" is now "safe_load". Of course the drawback is that "safe_load" is much more limited than "load", and making this change could break existing code.

Also it's probably worth mentioning that the YAML load functions could be used with other objects in the system besides RubyGems to escalate to an RCE. Also worth mentioning that anything that can be used to load arbitrary objects (like Marshal.load) can be abused in the same way. So neither changing RubyGems nor changing to a different serialization scheme (like Marshal) would plug this hole.

@tenderlove tenderlove requested a review from hsbt May 10, 2021 16:52
@tenderlove
Copy link
Member Author

For context, Psych.safe_load is more close to JSON loading. It doesn't support Symbols, Dates, Objects, references, etc. The only things it supports by default are:

  • TrueClass
  • FalseClass
  • NilClass
  • Numeric
  • String
  • Array
  • Hash

@stevecrozz
Copy link

I support this change, but what do you plan to do with the version number? Feels like a major change.

@tenderlove
Copy link
Member Author

I support this change, but what do you plan to do with the version number? Feels like a major change.

Yes, definitely major version change.

@SamSaffron
Copy link

SamSaffron commented May 11, 2021

I am extremely supportive of switching to safe by default, even going through a full rename dance here.

deprecate safe_load in favor of load
introduce unsafe_load

lol ... I just read the PR looks like you are very close already, only change I recommend adding is deprecating safe_load

@tenderlove
Copy link
Member Author

lol ... I just read the PR looks like you are very close already, only change I recommend adding is deprecating safe_load

I've made load slightly different from safe_load. load allows all Symbols where safe_load doesn't. I'm not sure if it's a good idea or not. It's still possible for some symbols to never be garbage collected, but they have to go through a C extension that makes them that way. However, I suspect many (most?) people use symbols in their YAML and this would trip them up. It seems like a very rare issue, so maybe allowing symbols is "safe enough"? cc @evanphx

@SamSaffron
Copy link

It seems like a very rare issue, so maybe allowing symbols is "safe enough"?

Yeah sounds like at worst this is allowing some cascade of components to cause memory bloat. Enormously better place to be than RCE friendly by default.

I guess people like this kind of stuff and tend to use it?

---
:a: 1

One concern is that the "load" then is less cross compatible with other frameworks. :a: 1 in golang/js/rust will be treated as the string :a. Maybe load(str, allow_symbols: true) and disable by default? Not sure. Would be a far bigger breaking change.

@tenderlove
Copy link
Member Author

One concern is that the "load" then is less cross compatible with other frameworks. 🅰️ 1 in golang/js/rust will be treated as the string :a.

That's true, but it's already the current behavior. For example:

YAML.load(<<-eostr)
---
:a: 1
eostr

The above code will result in the same data structure before and after this PR is merged. Allowing symbols is a compromise for compatibility / non-surprising behavior (as it's what people expect today). Also, interlanguage compatibility hasn't really been a priority for the library.

I'm leaning towards allowing symbols for YAML.load (but not YAML.safe_load) as I think that will give the best "backwards compatibility plus safety" bang for our buck.

@tenderlove
Copy link
Member Author

hashie/hashie#473

@k0kubun
Copy link
Member

k0kubun commented May 12, 2021

+1 for the interface change.

How about releasing unsafe_* variants alone first though? Being able to explicitly rewrite load to safe_load or unsafe_load (where unsafe_ is really necessary) before switching the behavior of load would be useful to minimize the impact of breaking changes by the upgrade.

lib/psych.rb Outdated Show resolved Hide resolved
@tenderlove
Copy link
Member Author

How about releasing unsafe_* variants alone first though? Being able to explicitly rewrite load to safe_load or unsafe_load (where unsafe_ is really necessary) before switching the behavior of load would be useful to minimize the impact of breaking changes by the upgrade.

Yes, that is a great idea. I will do it!

Psych.load is not safe for use with untrusted data.  Too many
applications make the mistake of using `Psych.load` with untrusted data
and that ends up with some kind of security vulnerability.

This commit changes the default `Psych.load` to use `safe_load`.  Users
that want to parse trusted data can use Psych.unsafe_load.
@tenderlove tenderlove merged commit 216f94f into master May 13, 2021
@tenderlove tenderlove deleted the default-unsafe branch May 13, 2021 18:09
@tenderlove
Copy link
Member Author

I shipped this in v4.0.0. Thanks!

@lzap
Copy link

lzap commented May 17, 2021

Friday, our CI is red. Only if there was a better week day to push big releases out, hmmm 🤔

Well, a mistake in our Gemfile, we will fix it, and the code. Thank you for taking care of safety of all of us! Cheese, I mean, cheers! :-)

koic added a commit to koic/rubocop that referenced this pull request May 17, 2021
Follow ruby/psych#487.

This PR fixes the following Ruby 3.1.0-dev build matrix error.

```console
% ruby -ryaml -ve 'p Psych::VERSION'
ruby 3.1.0dev (2021-05-17T10:51:51Z master ee611341c9) [x86_64-darwin19]
"4.0.0"

% bundle exec rspec ./spec/rubocop/cli/options_spec.rb
(snip)

Failures:

  1) RuboCop::CLI options --show-cops with no args prints the current
  configuration
     Failure/Error: printed_config = YAML.load(out.join) #
  rubocop:disable Security/YAMLLoad

     Psych::DisallowedClass:
       Tried to load unspecified class: Regexp
     Shared Example Group: "prints config" called from
       ./spec/rubocop/cli/options_spec.rb:1104
     # (eval):2:in `regexp'
     # ./spec/rubocop/cli/options_spec.rb:1027:in `block (4 levels) in
       <top (required)>'
     # ./spec/support/cli_spec_behavior.rb:26:in `block (2 levels) in
       <top (required)>'
     # ./lib/rubocop/rspec/shared_contexts.rb:30:in `block (4 levels) in
       <top (required)>'
     # ./lib/rubocop/rspec/shared_contexts.rb:30:in `chdir'
     # ./lib/rubocop/rspec/shared_contexts.rb:30:in `block (3 levels) in
       <top (required)>'
     # ./lib/rubocop/rspec/shared_contexts.rb:7:in `block (2 levels) in
       <top (required)>'

Finished in 25.97 seconds (files took 1.02 seconds to load)
117 examples, 1 failure

Failed examples:

rspec ./spec/rubocop/cli/options_spec.rb[1:13:1:4] # RuboCop::CLI
options --show-cops with no args prints the current configuration
```

https://app.circleci.com/pipelines/github/rubocop/rubocop/4637/workflows/d6722067-28c1-4740-a1b8-abea06edab0d/jobs/185620
byroot added a commit to rails/webpacker that referenced this pull request May 18, 2021
Psych 4 load in safe mode by default:
ruby/psych#487

So aliases need to be explicitly allowed
byroot added a commit to rails/webpacker that referenced this pull request May 18, 2021
Psych 4 load in safe mode by default:
ruby/psych#487

So aliases need to be explicitly allowed
kevindew added a commit to alphagov/govuk-connect that referenced this pull request Feb 22, 2022
Ruby 3.1 contains Psych version 4 as the default YAML parsing library.
This has a backwards compatibility break where YAML.load_file behaviour
has changed from an unsafe load to a safe load [1]. This causes an exception
to occur when using GOV.UK Puppet Hieradata YAML files are parsed, as
these make use of aliases - which is a feature that needs to be
enabled for YAML in safe load.

To resolve this issue I've set a conditional to check if the safe_load
method exists on the YAML object. I did consider whether it could be
resoled by pinning the psych dependency, but this seemed a bit delicate
as this tool has been authored to have no external dependencies and is
installed explicitly without them [2].

I couldn't identify an approach to apply a test for this that was
consistent with this repo's code and test conventions. Due to the
fluctuating standard library it'd have been a little contrived.

Before:

```
➜  ~ govuk-connect app-console -e integration asset-manager
Connecting to the app console for asset-manager,   in the integration environment
The relevant hosting provider is aws
/usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:430:in `visit_Psych_Nodes_Alias': Unknown alias: node_class (Psych::BadAlias)
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/visitor.rb:30:in `visit'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/visitor.rb:6:in `accept'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:35:in `accept'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:345:in `block in revive_hash'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:343:in `each'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:343:in `each_slice'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:343:in `revive_hash'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:167:in `visit_Psych_Nodes_Mapping'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/visitor.rb:30:in `visit'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/visitor.rb:6:in `accept'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:35:in `accept'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:345:in `block in revive_hash'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:343:in `each'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:343:in `each_slice'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:343:in `revive_hash'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:167:in `visit_Psych_Nodes_Mapping'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/visitor.rb:30:in `visit'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/visitor.rb:6:in `accept'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:35:in `accept'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:318:in `visit_Psych_Nodes_Document'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/visitor.rb:30:in `visit'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/visitor.rb:6:in `accept'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych/visitors/to_ruby.rb:35:in `accept'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych.rb:335:in `safe_load'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych.rb:370:in `load'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych.rb:671:in `block in load_file'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych.rb:670:in `open'
	from /usr/local/Cellar/ruby/3.1.0/lib/ruby/3.1.0/psych.rb:670:in `load_file'
```

After:

```
➜  ~ govuk-connect app-console -e integration asset-manager
Connecting to the app console for asset-manager,   in the integration environment
The relevant hosting provider is aws
The relevant node class is backend
```

[1]: ruby/psych#487
[2]: https://github.com/alphagov/homebrew-gds/blob/de1fb95fbda6219ad7158ab0c71253dcb55822d9/Formula/govuk-connect.rb#L12
alpaca-tc added a commit to alpaca-tc/figaro that referenced this pull request Jul 13, 2022
Ruby master ships with Psych 4.0.0 which makes YAML.load
defaults to safe mode (ruby/psych#487).

Keep compatibility by using unsafe_load.
charger added a commit to charger/settingslogic that referenced this pull request Jul 26, 2022
defaults to safe mode (ruby/psych#487).

Keep compatibility by using unsafe_load.

From: settingslogic#21
Also: https://stackoverflow.com/a/71192990
dgmstuart added a commit to dgmstuart/swing-out-london that referenced this pull request Aug 3, 2022
This isn't used and no longer works after upgrading to Ruby 3.1, since
that involves an upgrade to Psych, which no longer allows deserializing
from arbitrary classes:

  ruby/psych#487
KingTiger001 added a commit to KingTiger001/Rails-web-pack-project that referenced this pull request Jan 15, 2023
Psych 4 load in safe mode by default:
ruby/psych#487

So aliases need to be explicitly allowed
stanhu added a commit to stanhu/kramdown that referenced this pull request Mar 13, 2023
Passing in the permitted classes as the second argument was deprecated
in Psych 3 (ruby/psych#378) and removed in
Psych 4 (ruby/psych#487).

Ruby 3.1 ships with Psych 4, so to make the code work for Ruby 2.7 and
up, use the `permitted_classes` keyword argument.
krauselukas added a commit to krauselukas/json_refs that referenced this pull request Apr 26, 2023
With ruby 3.1.0, psych version 4.0.0 got introduced.
Starting psych version 4.0.0 the `load` method defaults
to `safe_load`.

See ruby/psych#487
See https://stdgems.org/psych/

This got introduced to prevent users from accidentally
load yaml content from untrusted user input. But
now breaks certain patterns, when using the gem.

See https://bugs.ruby-lang.org/issues/17866

To still allow users to load trusted documents,
the `unsafe_load` method got introduced.

See ruby/psych#488

I added a ruby version check for backwards
compatibility.
nobu added a commit to nobu/psych that referenced this pull request Jul 1, 2023
nobu added a commit to nobu/psych that referenced this pull request Jul 1, 2023
nobu added a commit that referenced this pull request Jul 1, 2023
Remove private methods unused since #487
smartech7 pushed a commit to smartech7/ruby-webpacker that referenced this pull request Aug 4, 2023
Psych 4 load in safe mode by default:
ruby/psych#487

So aliases need to be explicitly allowed
lavika pushed a commit to lavika/activerecord-sqlserver-adapter that referenced this pull request Sep 26, 2023
See ruby/psych#487 for more information.
tientt-holistics pushed a commit to holistics/committee that referenced this pull request Sep 26, 2024
Ruby master ships with Psych 4.0.0 which makes `YAML.load_file`
defaults to safe mode (ruby/psych#487).

Keep compatibility by using `unsafe_load_file`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants