Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore libxml state when changed #423

Merged
merged 4 commits into from
Sep 13, 2019

Conversation

billtomczak
Copy link

Disabling the Entity Loader was a great idea
a7e08334

However, this wreaked havoc for me for other code on the site that tries to load xml files.

This patch may be a bit ham-fisted, but it corrects the problem on my sites and I hope you'll consider including it or a similar fix in future versions.

@bhelx
Copy link
Contributor

bhelx commented Sep 4, 2019

@billtomczak I did a little research, this does not appear to be threadsafe. I couldn't find a definitive answer. My concern is a race condition would catch the parser with the entity loader enabled. Do you know if this is the case?

@aaron-junot
Copy link

@billtomczak In addition to the thread safety issue @bhelx pointed out above, leaving external entities turned on is a known security vulnerability (see the OWASP Top 10 and these three CWEs). I understand that some users (perhaps you) are legitimately accepting entities in your code and that's up to you to determine the risk of such a decision. For our purposes, most of our users are not using entities and therefore they should be disabled.

I would be more ok with logic like this if it's behind a configuration that requires the user to explicitly enable entities and acknowledge that they're taking a risk by doing so. I'm envisioning something like an environment variable called $RECURLY_UNSAFE_ENTITIES. If they set that as true in their environment, it can enable entities for them. This way, folks such as yourself that want to enable entities can accept that risk on a case by case basis, but the library is still secure by default.

From a security perspective, I would be fine if you took out all the back and forth toggling and just do something like the following in all the places where libxml_disable_entity_loader is called:

$ent = getenv("RECURLY_UNSAFE_ENTITIES");
libxml_disable_entity_loader(!$ent);

I don't particularly like this solution with the difficult to see ! in there, but you could clean it up somehow, I was just trying to demonstrate the suggestion.

Check in with @bhelx about whether that type of solution will suffice from an engineering perspective, as my comments are only regarding security. As far as I can tell, cutting out the toggling would also avoid the race condition, but I'm not sure.

I'll check back in to review this before it gets merged, but you can discuss the finer details with @bhelx from here on out.

@billtomczak
Copy link
Author

@bhelx @aaron-suarez -- I'd have to look into the details and I appreciate your concerns. I'm really old school, and the reason I took the approach in this patch is the old rule that if you change your environment inside a method/function, you need to clean up after yourself.

Our application is a Joomla extension and when I upgraded to the latest client version, a basic call in core Joomla like this: simplexml_load_file($file); where $file is a path to a perfectly valid xml file fails. So I'm kind of caught between your reasonable caution and Joomla's standard procedures.

Your thoughts would be appreciated. I will do a bit more research to see if I can come up with a more acceptable solution.

@billtomczak
Copy link
Author

@bhelx @aaron-suarez -- I can see now the error of my ways. libxml_disable_entity_loader is definitely not thread safe. Which makes me question placing this call in the client library. As I have seen in my situation, this creates problems for any subsequent code. Any attempt to use simplexml_load_file or simplexml_load_string for example will fail on return from any call to the client library that invokes this function. In fact, you've created an interesting situation here

https://github.com/recurly/recurly-client-php/blob/master/lib/recurly/push_notification.php#L102

From what I've seen, simplexml_load_string in this code right after the libxml call invalidates everything after it since it will always fail.

    if (!@simplexml_load_string ($post_xml)) {
      return;
    }

I can adjust my own code for now to deal with what I consider a problematic change, but it seems like @aaron-suarez's suggestion for a custom environment variable should probably be implemented. I would suggest a public static function to manage this since it's no longer a simple matter of just automatically disabling entity loading. Either in Recurly_Client or Recurly_Base? Or am I getting too far off base now?

@bhelx
Copy link
Contributor

bhelx commented Sep 4, 2019

I'd have to look into the details and I appreciate your concerns. I'm really old school, and the reason I took the approach in this patch is the old rule that if you change your environment inside a method/function, you need to clean up after yourself. Which makes me question placing this call in the client library.

I don't think you are alone there. I agree with you and I'm sorry this is causing you problems. I was not aware of the global implications of the way we implemented this; however, I cannot in my research find a localized way to do it. Perhaps we are bound by the limitations of PHP here? If you know a way we can implement this and only have it affect our XML parser, I'd prefer that solution.

I can adjust my own code for now to deal with what I consider a problematic change, but it seems like @aaron-suarez's suggestion for a custom environment variable should probably be implemented. I would suggest a public static function to manage this since it's no longer a simple matter of just automatically disabling entity loading. Either in Recurly_Client or Recurly_Base? Or am I getting too far off base now?

I would also agree that an environment variable is probably the best backup plan here. As a rule of thumb, we don't make disabling security features convenient, so I do not recommend adding a static function call. If you change this PR to load depending on the presence of an environment variable I'd merge and release that for you.

What do you think?

@billtomczak
Copy link
Author

The reason I was suggesting a static function was to centralize the inspection of that environment variable rather than copying and pasting what would be the same code for each of those libxml calls. This would allow me to address the issue in my code with a single putenv and by default wouldn't change what you've done here at all.

And when someone gets nailed by this change like I did, at least they'd have a simple way to cope.

@bhelx
Copy link
Contributor

bhelx commented Sep 4, 2019

@billtomczak Sorry maybe I misunderstood, It's fine to have this in a function to share the code as long as it isn't publicly exposed.

@billtomczak
Copy link
Author

@bhelx here is my new proposal. I don't see how to manage this without a public function. However, as you'll see it's only a wrapper to libxml_disable_entity_loader that adds a check for an environment variable. So nothing new has really been exposed.

In my code all I need to do at the start of my script script is:

putenv('RECURLY_DISABLE_ENTITY_LOADING=0');

to fix the problems this creates for me.

@bhelx bhelx self-requested a review September 4, 2019 23:51
@bhelx
Copy link
Contributor

bhelx commented Sep 4, 2019

@billtomczak That's okay. I think since, as you pointed out, the feature can't actually be toggled from it. I'll review it tomorrow. In the mean time, any idea why the build is failing on HHVM ? We may need a special case for those users. https://travis-ci.org/recurly/recurly-client-php/jobs/580966659

@billtomczak
Copy link
Author

My call to getenv assumed a minimum php v5.6. see latest commit

@billtomczak
Copy link
Author

@bhelx -- just fyi - the reason I used the second argument for getenv was this warning:

If PHP is running in a SAPI such as Fast CGI, this function will always return the value of an environment variable set by the SAPI, even if putenv() has been used to set a local environment variable of the same name. Use the local_only parameter to return the value of locally-set environment variables.

But if you guys have to support hhvm 3.27 and 3.30 there isn't much to be done I suppose. For my purposes I may have to deal with this in my own code after all just to be safe.

Copy link
Contributor

@bhelx bhelx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bhelx bhelx merged commit 4c594f7 into recurly:master Sep 13, 2019
@bhelx
Copy link
Contributor

bhelx commented Sep 13, 2019

Thanks again for contributing @billtomczak
A release is going out now

bhelx added a commit that referenced this pull request Sep 13, 2019
* Support billing_info on Subscription, Invoice & Gift card [PR](#424)
* PSD2 billing info changes [PR](#426)
* Restore libxml state when changed [PR](#423)
@bhelx bhelx mentioned this pull request Sep 13, 2019
@billtomczak billtomczak deleted the fix-libxml-disable-loader branch September 13, 2019 21:01
@bhelx bhelx added the V2 V2 Client label Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V2 V2 Client
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants