Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump libxml version, deprecate libxml_disable_entity_loader() #5867

Closed

Conversation

dtakken
Copy link

@dtakken dtakken commented Jul 16, 2020

In libxml < 2.9.0 external entity loading was enabled by default. Many applications, including PHP ones, have been vulnerable to XXE processing attacks because of this.

The libxml_disable_entity_loader() function is used to mitigate the problem. It allows applications to disable loading of external entities, but in a rather crude way: Not only loading of external entities is blocked, loading of XML data and validation schemas from anything but strings is blocked, crippling the functionality of XML parsing in PHP.

The behavior of libxml_disable_entity_loader() is generally perceived as being broken. There are several open bug reports about this:

https://bugs.php.net/bug.php?id=62577
https://bugs.php.net/bug.php?id=73328
https://bugs.php.net/bug.php?id=69910

Since the release of libxml 2.9.0 in 2012 external entity loading is disabled in libxml by default. This means that using libxml_disable_entity_loader() is no longer needed. Applications that wish to load external entities need to explicitly indicate this by using the LIBXML_NOENT constant when parsing XML data. Loading XML from resources like files is unaffected this way.

This PR will do two things:

  • Change the required libxml version for building PHP to >= 2.9.0, making XXE processing in PHP secure by default.
  • Deprecate libxml_disable_entity_loader(). It is no longer needed and has unwanted side effects.

The PR includes a unit test that verifies that external entity loading is indeed disabled by default.

Question: Will this require an RFC? And if so, would it need to accepted before the feature freeze in order to make it into 8.0?

@Girgias
Copy link
Member

Girgias commented Jul 17, 2020

Seems good to me as CentOS 7 comes with libxml 2.9.1, I also don't think this needs an RFC.

@beberlei
Copy link
Contributor

@dtakken can LIBXML_NOENT be passed to DOMDocument to allow external entities? This question sort of extends to everything xml, is there a way to use this new flag?

@dtakken
Copy link
Author

dtakken commented Jul 17, 2020

@dtakken can LIBXML_NOENT be passed to DOMDocument to allow external entities? This question sort of extends to everything xml, is there a way to use this new flag?

The various XML processing mechanisms in PHP have a variety of ways to enable XXE processing. Most have an $options parameter that accepts the LIBXML_NOENT constant, which includes the load() method of DOMDocument. This class also has properties like resolveExternals and substituteEntities that can be set. These unit tests demonstrate use of both the constant and the class properties:

https://github.com/php/php-src/blob/e575ebdb2975b4189f42dce6b36c80300320eb98/ext/libxml/tests/libxml_disable_entity_loader.phpt
https://github.com/php/php-src/blob/f116484cf1cd8a796f05c493ace4deca84bc8fdc/ext/dom/tests/DOMDocument_load_variation4.phpt

This PR does not change the fact that XXE processing was already disabled by default when PHP was built using an up to date libxml. I expect that any code that fails to load external entities after the change would already have failed on most OS versions that are currently in use.

I don't know if all XML processing mechanisms support enabling XXE processing (the ones I know do). However I think any issues in that area should be addressed on their own.

Copy link
Contributor

@beberlei beberlei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with this change.

@dtakken dtakken force-pushed the libxml-deprecate-disable-ext-ref-loader branch from e575ebd to e23fe9c Compare July 22, 2020 19:36
@dtakken dtakken requested a review from kocsismate July 23, 2020 16:21
@beberlei
Copy link
Contributor

@cmb69 this looks good from my POV, libxml 2.9 is from 2012 so that is a long time ago. i would merge it, can you give a second opinion?

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @dtakken, this is a very welcome improvement to me!

@nikic
Copy link
Member

nikic commented Jul 27, 2020

This looks reasonable to me, but I'd suggest sending a mail to internals@ about this. The libxml version bump part of this is definitely fine. Two questions for the rest:

  • Is there any other legitimate reason to disable entity loading completely? Outside XXE prevention that is?
  • What is the recommended way to conditionally call libxml_disable_entity_loader() after this change?

@cmb69
Copy link
Member

cmb69 commented Jul 27, 2020

With libxml >= 2.9.0, loading of external entities is disabled, even if libxml_disable_entity_loader(false) is called. You have to explicitly pass LIBXML_NOENT to the respective XML loading functions to load (and resolve) external entities, anyway.

<?php
if (!extension_loaded('libxml')) die('skip libxml extension not available');
if (!extension_loaded('dom')) die('skip dom extension not available');
if (defined('PHP_WINDOWS_VERSION_MAJOR')) die('skip not for Windows'); ?>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test not for Windows? Could it be modified (or split) to test the functionality on Windows as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know to be honest, I just copied another libxml unit test and used it to create a new one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! I just enabled that other test on Windows (24495ba); the same could be applied to this test case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rebase and apply the same changes to my test tonight.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@nikic
Copy link
Member

nikic commented Jul 27, 2020

Welp, I just saw that you did post to internals about this. However, your mail went into spam, like all your mails do. Can you please either relax your mail configuration, or use a different address (e.g. gmail) to post to the list? As is, most likely nobody saw this.

@dtakken
Copy link
Author

dtakken commented Aug 2, 2020

Welp, I just saw that you did post to internals about this. However, your mail went into spam, like all your mails do. Can you please either relax your mail configuration, or use a different address (e.g. gmail) to post to the list? As is, most likely nobody saw this.

Hi Nikita,

Multiple people have contacted me about this problem lately. I am aware of it and will try to sort this out.

Thanks for letting me know!

@dtakken
Copy link
Author

dtakken commented Aug 2, 2020

This looks reasonable to me, but I'd suggest sending a mail to internals@ about this. The libxml version bump part of this is definitely fine. Two questions for the rest:

  • Is there any other legitimate reason to disable entity loading completely? Outside XXE prevention that is?

Well, the only thing I can think of is an application trying to make sure that no part of an application can load XML data from URLs, making it harder for an attacker to exploit a vulnerability in libxml. However, libxml_disable_entity_loader() is not really suitable for that purpose, because any code can use that same method to switch it on again. So my guess is that the answer is 'no'.

  • What is the recommended way to conditionally call libxml_disable_entity_loader() after this change?

I'm not sure what the customary way of dealing with deprecated functions is. Apart from checking the PHP version, are there other means that are commonly provided to make this easier?

@nikic
Copy link
Member

nikic commented Aug 2, 2020

I'm not sure what the customary way of dealing with deprecated functions is. Apart from checking the PHP version, are there other means that are commonly provided to make this easier?

I'm mainly asking for a specific piece of recommended code. I'd expect that checking the PHP version is not the right thing in this context, but that you'd rather want to check the libxml version. Is that correct? On PHP 8, the observable libxml versions would coincide with always disabling the call.

@dtakken
Copy link
Author

dtakken commented Aug 3, 2020

I'm mainly asking for a specific piece of recommended code. I'd expect that checking the PHP version is not the right thing in this context, but that you'd rather want to check the libxml version. Is that correct? On PHP 8, the observable libxml versions would coincide with always disabling the call.

Ah, yes. I would go for this simple check:

if (LIBXML_VERSION < 20900) {
    libxml_disable_entity_loader();
}

@nikic nikic added this to the PHP 8.0 milestone Aug 3, 2020
Dik Takken added 3 commits August 3, 2020 20:33
Since libxml version 2.9.0 external entity loading is disabled by default.
Bumping the version requirement means that XML processing in PHP is no
longer vulnerable to XXE processing attacks by default.
This method was used to protect code against XXE processing attacks.
Since PHP now requires libxml >= 2.9.0 external entity loading no longer
needs to be disabled to prevent these attacks. It is disabled by default.
Also, the method has an unwanted side effect that causes a lot of
confusion: Parsing XML data from resources like files is no longer possible.
@dtakken dtakken force-pushed the libxml-deprecate-disable-ext-ref-loader branch from cd64be8 to 0930578 Compare August 3, 2020 18:36
@dtakken
Copy link
Author

dtakken commented Aug 3, 2020

Note that the version I just pushed adds one more commit that deletes a unit test that was only run on libxml < 2.9.0. I hope that's OK.

@php-pulls php-pulls closed this in e0fa48f Aug 3, 2020
ADmad added a commit to cakephp/cakephp that referenced this pull request Aug 8, 2020
With libxml >= 2.9.0, loading of external entities is disabled,
even if libxml_disable_entity_loader(false) is called.
Instead the flag LIBXML_NOENT needs to be used to enable it.

See php/php-src#5867
ADmad added a commit to cakephp/cakephp that referenced this pull request Aug 8, 2020
With libxml >= 2.9.0, loading of external entities is disabled,
even if libxml_disable_entity_loader(false) is called.
Instead the flag LIBXML_NOENT needs to be used to enable it.

See php/php-src#5867
markstory pushed a commit to cakephp/utility that referenced this pull request Aug 9, 2020
With libxml >= 2.9.0, loading of external entities is disabled,
even if libxml_disable_entity_loader(false) is called.
Instead the flag LIBXML_NOENT needs to be used to enable it.

See php/php-src#5867
fabpot added a commit to symfony/symfony that referenced this pull request Aug 10, 2020
This PR was merged into the 3.4 branch.

Discussion
----------

Fix deprecated libxml_disable_entity_loader

| Q             | A
| ------------- | ---
| Branch?       | 3.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | /
| License       | MIT
| Doc PR        | /

Fix deprecation `Function libxml_disable_entity_loader() is deprecated` triggered by php/php-src#5867 in PHP8

Commits
-------

1f19da3 Fix deprecated libxml_disable_entity_loader
symfony-splitter pushed a commit to symfony/translation that referenced this pull request Aug 10, 2020
This PR was merged into the 3.4 branch.

Discussion
----------

Fix deprecated libxml_disable_entity_loader

| Q             | A
| ------------- | ---
| Branch?       | 3.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | /
| License       | MIT
| Doc PR        | /

Fix deprecation `Function libxml_disable_entity_loader() is deprecated` triggered by php/php-src#5867 in PHP8

Commits
-------

1f19da3936 Fix deprecated libxml_disable_entity_loader
symfony-splitter pushed a commit to symfony/config that referenced this pull request Aug 10, 2020
This PR was merged into the 3.4 branch.

Discussion
----------

Fix deprecated libxml_disable_entity_loader

| Q             | A
| ------------- | ---
| Branch?       | 3.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | /
| License       | MIT
| Doc PR        | /

Fix deprecation `Function libxml_disable_entity_loader() is deprecated` triggered by php/php-src#5867 in PHP8

Commits
-------

1f19da3936 Fix deprecated libxml_disable_entity_loader
symfony-splitter pushed a commit to symfony/dependency-injection that referenced this pull request Aug 10, 2020
This PR was merged into the 3.4 branch.

Discussion
----------

Fix deprecated libxml_disable_entity_loader

| Q             | A
| ------------- | ---
| Branch?       | 3.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | /
| License       | MIT
| Doc PR        | /

Fix deprecation `Function libxml_disable_entity_loader() is deprecated` triggered by php/php-src#5867 in PHP8

Commits
-------

1f19da3936 Fix deprecated libxml_disable_entity_loader
jrfnl added a commit to PHPCompatibility/PHPCompatibility that referenced this pull request Aug 10, 2020
> LibXML:
> - libxml_disable_entity_loader() has been deprecated. As libxml 2.9.0 is now
>   required, external entity loading is guaranteed to be disabled by default,
>   and this function is no longer needed to protect against XXE attacks.

Refs:
* https://github.com/php/php-src/blob/71bfa5344ab207072f4cd25745d7023096338385/UPGRADING#L808-L811
* php/php-src#5867
* php/php-src@e0fa48f

Includes unit tests.
@ausi
Copy link

ausi commented Aug 30, 2020

Is there a replacement for safely using DOMDocument::$substituteEntities now?

AFAIK the following code is/was considered “safe”:

libxml_disable_entity_loader(true);
$dom = new DOMDocument;
$dom->substituteEntities = true;
$dom->loadXML($untrustedData);
echo $dom->saveXML();

Without the call to libxml_disable_entity_loader() external entities get loaded, example: https://3v4l.org/9Z5bd

If there is no replacement I think we should document that setting substituteEntities to true can lead to XXE vulnerabilities.

@cmb69
Copy link
Member

cmb69 commented Aug 31, 2020

AFAIK the following code is/was considered “safe”:

Yes, but that code doesn't make sense. Anyhow, documented as http://svn.php.net/viewvc?view=revision&revision=350442.

@ausi
Copy link

ausi commented Aug 31, 2020

Yes, but that code doesn't make sense.

Thanks for checking!
Recap: never enable entity substitution for untrusted XML documents.
Did I understand that correctly?

@cmb69
Copy link
Member

cmb69 commented Aug 31, 2020

Yes, that seems to be the most simple, sane solution for untrusted XML documents.

Mark-H added a commit to Mark-H/revolution that referenced this pull request Sep 18, 2020
…e libxml entity loader [modxcms#15237]

The libxml_disable_entity_loader function is deprecated in PHP8, and the entity loader is automatically enabled on v2.9.0+ of libxml which may have been used pre-PHP8 as well. PHP8 comes with at least v2.9.0+ of libxml bundled, so this conditional covers both scenarios.

Ref: php/php-src#5867
@Mark-H
Copy link

Mark-H commented Sep 18, 2020

This change does not yet seem documented in the docs for libxml_disable_entity_loader, which probably should be done: https://www.php.net/manual/en/function.libxml-disable-entity-loader.php

It looks like there's an online editor for the docs that anyone (even me!) could use, but I'm not sure about conventions or the process for adding deprecations.

@cmb69
Copy link
Member

cmb69 commented Sep 18, 2020

The documentation is not quite ready for PHP 8 related info; we want to remove PHP 5 specific stuff first. Anyhow, instead of using the online editor, IMO PRs are preferable.

wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Sep 21, 2020
As per https://www.php.net/manual/en/function.libxml-disable-entity-loader.php
this is technically unnecessary.

>However, as of libxml 2.9.0 entity substitution is disabled by default,
>so there is no need to disable the loading of external entities.

See also php/php-src#5867

>Since the release of libxml 2.9.0 in 2012 external entity loading is
>disabled in libxml by default. This means that using
>libxml_disable_entity_loader() is no longer needed.

Hopefully helps prevent false positive reports from security scanning tools.

Change-Id: I8a09d62a9920fd0bf4a388baa5544a02323bb541
wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Sep 21, 2020
As per https://www.php.net/manual/en/function.libxml-disable-entity-loader.php
this is technically unnecessary.

>However, as of libxml 2.9.0 entity substitution is disabled by default,
>so there is no need to disable the loading of external entities.

See also php/php-src#5867

>Since the release of libxml 2.9.0 in 2012 external entity loading is
>disabled in libxml by default. This means that using
>libxml_disable_entity_loader() is no longer needed.

Hopefully helps prevent false positive reports from security scanning tools.

Change-Id: I8a09d62a9920fd0bf4a388baa5544a02323bb541
(cherry picked from commit 64ea157)
wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Sep 21, 2020
As per https://www.php.net/manual/en/function.libxml-disable-entity-loader.php
this is technically unnecessary.

>However, as of libxml 2.9.0 entity substitution is disabled by default,
>so there is no need to disable the loading of external entities.

See also php/php-src#5867

>Since the release of libxml 2.9.0 in 2012 external entity loading is
>disabled in libxml by default. This means that using
>libxml_disable_entity_loader() is no longer needed.

Hopefully helps prevent false positive reports from security scanning tools.

Change-Id: I8a09d62a9920fd0bf4a388baa5544a02323bb541
(cherry picked from commit 64ea157)
wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Sep 21, 2020
As per https://www.php.net/manual/en/function.libxml-disable-entity-loader.php
this is technically unnecessary.

>However, as of libxml 2.9.0 entity substitution is disabled by default,
>so there is no need to disable the loading of external entities.

See also php/php-src#5867

>Since the release of libxml 2.9.0 in 2012 external entity loading is
>disabled in libxml by default. This means that using
>libxml_disable_entity_loader() is no longer needed.

Hopefully helps prevent false positive reports from security scanning tools.

Change-Id: I8a09d62a9920fd0bf4a388baa5544a02323bb541
(cherry picked from commit 64ea157)
@smalyshev
Copy link
Contributor

wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Nov 15, 2020
As per https://www.php.net/manual/en/function.libxml-disable-entity-loader.php
this is technically unnecessary.

>However, as of libxml 2.9.0 entity substitution is disabled by default,
>so there is no need to disable the loading of external entities.

See also php/php-src#5867

>Since the release of libxml 2.9.0 in 2012 external entity loading is
>disabled in libxml by default. This means that using
>libxml_disable_entity_loader() is no longer needed.

Hopefully helps prevent false positive reports from security scanning tools.

Change-Id: I7cabc5b8d44813d709a11db2f219ae16260542c7
wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Nov 15, 2020
As per https://www.php.net/manual/en/function.libxml-disable-entity-loader.php
this is technically unnecessary.

>However, as of libxml 2.9.0 entity substitution is disabled by default,
>so there is no need to disable the loading of external entities.

See also php/php-src#5867

>Since the release of libxml 2.9.0 in 2012 external entity loading is
>disabled in libxml by default. This means that using
>libxml_disable_entity_loader() is no longer needed.

Hopefully helps prevent false positive reports from security scanning tools.

Change-Id: I7cabc5b8d44813d709a11db2f219ae16260542c7
wmfgerrit pushed a commit to wikimedia/mediawiki that referenced this pull request Nov 15, 2020
As per https://www.php.net/manual/en/function.libxml-disable-entity-loader.php
this is technically unnecessary.

>However, as of libxml 2.9.0 entity substitution is disabled by default,
>so there is no need to disable the loading of external entities.

See also php/php-src#5867

>Since the release of libxml 2.9.0 in 2012 external entity loading is
>disabled in libxml by default. This means that using
>libxml_disable_entity_loader() is no longer needed.

Hopefully helps prevent false positive reports from security scanning tools.

Change-Id: I7cabc5b8d44813d709a11db2f219ae16260542c7
@andrewnicols
Copy link

Just noting that the deprecation of this is not sufficiently documented.

I've raised php/doc-en#1408 regarding this.

Essentially:

  • the various LIBXML2 flags are poorly documented to begin with (both upstream libxml2, and in php.net docs) - many descriptions are poor and do not adequately detail the behaviours
  • the LIBXML_NOENT constant has an XXE warning, but the LIBXML_DTDVALID constant does not and is also susceptible
  • use of the LIBXML_DTDLOAD or LIBXML_DTDATTR constants will cause dtds to be fetched (but are not susceptible to XXE without the NOENT or DTDVALID flags) - where the system is not available (i.e. it is loaded from w3.org) there is a 60 second timeout
  • the documentation for libxml_disable_entity_loader only mentions the use of LIBXML_NOENT to resolve internal entity references, when the libXML2 source also processes these on the DTDVALID flag

Many projects have essentially just dropped the use of the libxml_disable_entiy_loader() function if the php version is >= 800, but may now be susceptible to various attacks and/or performance issues if they do not fully understand the impact of these flags. Previously they were protected by the disabling of the entity loader.

SerhiyMytrovtsiy added a commit to SerhiyMytrovtsiy/translation that referenced this pull request Oct 19, 2022
This PR was merged into the 3.4 branch.

Discussion
----------

Fix deprecated libxml_disable_entity_loader

| Q             | A
| ------------- | ---
| Branch?       | 3.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | /
| License       | MIT
| Doc PR        | /

Fix deprecation `Function libxml_disable_entity_loader() is deprecated` triggered by php/php-src#5867 in PHP8

Commits
-------

1f19da3936 Fix deprecated libxml_disable_entity_loader
krohnden pushed a commit to ild-thl/mediawiki that referenced this pull request Nov 23, 2022
As per https://www.php.net/manual/en/function.libxml-disable-entity-loader.php
this is technically unnecessary.

>However, as of libxml 2.9.0 entity substitution is disabled by default,
>so there is no need to disable the loading of external entities.

See also php/php-src#5867

>Since the release of libxml 2.9.0 in 2012 external entity loading is
>disabled in libxml by default. This means that using
>libxml_disable_entity_loader() is no longer needed.

Hopefully helps prevent false positive reports from security scanning tools.

Change-Id: I8a09d62a9920fd0bf4a388baa5544a02323bb541
(cherry picked from commit 64ea157)
krohnden pushed a commit to ild-thl/mediawiki that referenced this pull request Nov 23, 2022
As per https://www.php.net/manual/en/function.libxml-disable-entity-loader.php
this is technically unnecessary.

>However, as of libxml 2.9.0 entity substitution is disabled by default,
>so there is no need to disable the loading of external entities.

See also php/php-src#5867

>Since the release of libxml 2.9.0 in 2012 external entity loading is
>disabled in libxml by default. This means that using
>libxml_disable_entity_loader() is no longer needed.

Hopefully helps prevent false positive reports from security scanning tools.

Change-Id: I7cabc5b8d44813d709a11db2f219ae16260542c7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants