Arbitrary code execution in LLMMathChain #8363

jan-kubena · 2023-07-27T16:00:56Z

hinthornw · 2023-07-27T16:09:15Z

Thanks for raising! We are working to move the LLMMathChain out of the langchain core library, for reasons such as this

dosubot · 2023-07-27T16:18:58Z

Answer generated by a 🤖

Answer

I understand your concern about the potential security vulnerability in the LLMMathChain class of LangChain. You've correctly identified that the numexpr.evaluate function, which is used in the _evaluate_expression method, could potentially execute arbitrary code if a malicious expression is passed to it. This is due to the fact that numexpr.evaluate uses Python's eval function, which can execute arbitrary Python code.

Here's the code snippet in question:

def _evaluate_expression(self, expression: str) -> str:
    try:
        local_dict = {"pi": math.pi, "e": math.e}
        output = str(
            numexpr.evaluate(
                expression.strip(),
                global_dict={},  # restrict access to globals
                local_dict=local_dict,  # add common mathematical functions
            )
        )
    except Exception as e:
        raise ValueError(
            f'LLMMathChain._evaluate("{expression}") raised error: {e}.'
            " Please try again with a valid numerical expression"
        )

    # Remove any leading and trailing brackets from the output
    return re.sub(r"^\[|\]$", "", output)

In this code, numexpr.evaluate is called with the expression argument, which could potentially contain malicious code. Although the global_dict argument is set to an empty dictionary to restrict access to global variables, and the local_dict argument is set to a dictionary containing common mathematical functions, this does not prevent the execution of arbitrary code.

This means that a malicious user could potentially execute arbitrary code on the server where LangChain is running by passing a specially crafted expression to the LLMMathChain's run method. This is indeed a serious security issue that needs to be addressed.

However, please note that this is just one potential interpretation of the issue. If you believe this interpretation is correct and that this is a bug that could impact other users, I encourage you to make a pull request to address this issue. If your experience contradicts this interpretation, I would suggest upgrading to the latest version of LangChain to see if the issue persists.

Sources

libs/langchain/langchain/chains/llm_math/base.py

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

zoeqevans · 2023-07-31T19:32:54Z

@hinthornw As an alternative, we could use sympy.sympify? It's a common alternative to eval / numexpr (see this thread on vulnerabilities in numexpr). As well as avoiding that vulnerability, sympy also supports a much larger range of mathematical operations (i.e. rounding, factorials, etc.) and is about 10x as common as numexpr (as measured by github stars), so I'd guess its easier for an LLM to generate syntactically correct input (in fact, I spent about 10mins trying to get GPT-3 to generate a mathematical input that sympy couldn't process, and failed).

I was testing some multi-step QA workflows inspired by a Langchain multi-step agent QA workflow, and was using the llm-math tool in an Agent chain, with prompts like: "Who is Zoe Saldana's partner? What is their current age raised to the 0.43 power, rounded to 3sf?". This would cause numexpr to throw an error, since the chain would try and fail to pass it some kind of rounding instruction. This also happens with the factorial function in #3071. Both of these use cases are fixed by backing LLMMathChain with a sympify call.

Happy to attach some examples / create a PR to scope out the replacement if people agree. It's what I've been using locally and it works great!

hinthornw · 2023-07-31T23:49:43Z

That makes sense to me. I'd be happy to review a PR! Thank you for being proactive about this!

jan-kubena · 2023-08-01T07:09:40Z

I'd just like to point out that there still needs to be some kind of validation/protection because sympy.sympify uses eval and shouldn't be used on unsanitized input as per official docs. I think it could also be worth looking at this langchain PR that implemented sympy in RestrictedPython as an inspiration.

zoeqevans · 2023-08-01T09:02:12Z

Ah damn, yeah I missed that: was foolishly going off the StackOverflow thread and a memory of a still-unresolved effort within Sympify to remove eval / add a safe-mode.

sympify still offers a lot more functionality than numexpr, so I'd still be in favour of swapping anyway, but agree that there aren't any extra security benefits vs. numexpr on the eval front.

zoeqevans · 2023-08-02T11:48:47Z

@hinthornw PR here

jan-kubena · 2023-09-13T14:26:23Z

Hi all, input sanitization has been added in numexpr version 2.8.6 for evaluate by default (see this issue I mentioned previously). As far as I can tell, it looks to be pretty secure. It'd be nice for the expression execution to be done in a secure container/environment, but I think that for now, the sanitization that numexpr does is surely better than nothing.

Since that is the case, I'd like to ask @hinthornw to advise on the preferable next course of action, that is whether to just bump the numexpr version, replace numexpr completely with Sympy and a secure container/environment (since it'd be vulnerable by itself) or other solutions to this. Thank you!

tabdunabi · 2023-09-26T17:59:14Z

Hi @hinthornw, Any ETA on resolving this issue?
Thanks

elmegatan26 · 2023-09-28T11:10:38Z

This vulnerability is getting flagged by InfoSec teams. Any idea on when the update is being released?

eyurtsev · 2023-10-02T19:08:04Z

@elmegatan26 , @tabdunabi , @jan-kubena , numexpr is now (on master) an optional dependency, we've also added a constraint to specify that code only works with >=2.8.6 (which has input sanitization).

Is this enough to address concerns flagged by InfoSec teams?
Also do you mind sharing if you're using LLMathChain yourself and if so which mathematical operations do you rely on?

tabdunabi · 2023-10-02T20:25:03Z

@eyurtsev, for our use case, we do not use LLMathChain, but our security scans detect the numexpr vulnerability. This is a critical issue for us and blocker to publish our solutions, used by our customers. If making numexpr optional (not installed by default with LangChian), then this is enough for us.

elmegatan26 · 2023-10-02T20:36:44Z

@eyurtsev We are not using LLMathChain but similar to @tabdunabi the vulnerability is automatically detected and flagged. Example: https://security.snyk.io/package/pip/langchain and https://security.snyk.io/package/pip/langchain/0.0.306

tabdunabi · 2023-10-04T17:37:43Z

@eyurtsev just installed v0.0.308 (with optional numexpr), but python-pipaudit still reports the vulnerability. I've verified numexpr was not installed. It seems the CVE database still has it as a dependency. Anything can be done from your side to update the database?

eyurtsev · 2023-10-04T20:14:08Z

@tabdunabi we're taking a look.

@elmegatan26 @tabdunabi Are the other CVEs a blocker for you right now?

tabdunabi · 2023-10-04T20:50:33Z

Yes @eyurtsev they are blockers. Interestingly, for v0.0.308, python-pipaudit did not report https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAIN-5850009 and https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAIN-5843727. safety only reported https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAIN-5843727.

elmegatan26 · 2023-10-04T21:30:40Z

@eyurtsev Yes, any CVE ranked High or Critical is a blocker. Any vulnerabilities found by most infosec teams are flagged and Devs are required to patch or prove the issue is not exploitable.

dvirginz · 2023-10-05T07:31:55Z

Hi everyone,

Thank you for opening the issue.

Just to clarify the status:
Is there any version or installation of Langchain that doesn't currently contain high or critical CVEs?

From the Snyk website, it seems no version of Langchain is completely free from critical issues.
https://security.snyk.io/package/pip/langchain

eyurtsev · 2023-10-05T16:22:10Z

@tabdunabi Information about LLMathChain should now be reflected.

@dvirginz Not at the moment. We're working on addressing all the CVEs.

https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAIN-5850009 -- will be resolved next week (Getting rid of more CVEs #11352)
https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAIN-5843727 -- will be resolved over the next few weeks. This CVE stems from a python ast repl tool used by some agents. You're at risk if your code uses this tool directly or else indirectlty via an agent that relies on it.

dvirginz · 2023-10-05T17:10:38Z

Hi! Thank you for the update and detailed response. We look forward to a solution, as it will help us a lot. Message ID: ***@***.***>

ᐧ

tabdunabi · 2023-10-10T22:14:12Z

hi @eyurtsev, first apologies for pining you again. We are delaying our release for the CVEs to be patched.

I see https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAIN-5850009 has been patched in v0.0.312 (PR #10252 ).

Any chance you can accelerate PR #5640, referenced in issue #7700 as providing a fix for https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAIN-5843727 .

eyurtsev · 2023-10-11T01:47:10Z

Hi @tabdunabi, realistic timeline is 1-3 weeks. Are you relying on any of the agents; i.e., the pandas agent, xorbits agent or spark agent (which dependent on the python ast tool?).

dvirginz · 2023-10-11T10:23:19Z

If we aren't using agents or LLMMathChain, is there a clear version we can already use?

tabdunabi · 2023-10-11T17:25:16Z

Thank you @eyurtsev for the update.

Currently, we are not using LangChain agents. However, our build pipeline fails because of any security vulnerabilities in the libraries shipped with our solutions. So, even though we are not using LangChain agents, we need to go through a formal approval process with our security team, and prove the vulnerable code is not actually used by our solutions, to be able to get an exception to release. Additionally, once we publish our code on GitHub, GitHub Dependabot will flag these security vulnerabilities, and we need to address auto-cut tickets for our team.

So, it would be much easier, and safer, to ship code with zero security vulnerabilities in downstream dependencies.

eyurtsev · 2023-10-11T17:54:57Z

Targeting end of month 10/28 (will announce in a bit with expected changes) to resolve the CVE to allow existing users to migrate. In the meantime, are you able to fork and remove affected code?

tabdunabi · 2023-10-11T18:04:52Z

Thank you @eyurtsev!.
Creating a fork is not an option for us. We want our customers to be able to build our code by themselves using published libraries, available on Pypi/NPM. We do not want to maintain forks of external libraries.

eyurtsev · 2023-10-11T21:13:24Z

#11680 -- announcement

eyurtsev · 2023-10-27T19:15:18Z

Python AST tool CVE was resolved here: #12427 cc @tabdunabi / @dvirginz

(The original CVE for LLMathChain was resolved a while back -- closing this issue.)

As of release: https://github.com/langchain-ai/langchain/releases/tag/v0.0.325

tabdunabi · 2023-10-27T19:18:03Z

Thank you @eyurtsev !. We will immediately upgrade LangChain version, used by our code, to v0.0.325

dvirginz · 2023-10-27T19:20:40Z

Thank you. It seems that Snyk still identifies 1 high-risk CVE in LangChain. Any thoughts? Thanks. https://security.snyk.io/package/pip/langchain

eyurtsev · 2023-10-28T01:45:45Z

@dvirginz that's the CVE that got patched with this release. It can take up to several days for the CVE to information to be updated in the relevant databases.

mschirmer84 · 2023-11-14T19:48:13Z

Can this [CVE-2023-39631] (http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2023-39631) be closed off with the above fix?

eyurtsev · 2024-02-12T16:13:26Z

That CVE was resolved as well a while back: GHSA-f73w-4m7g-ch9x

Locking conversation

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jul 27, 2023

zoeqevans mentioned this issue Aug 2, 2023

Swap in Sympy instead of numexpr for more expressive LLMMath tool #8627

Closed

eyurtsev self-assigned this Oct 2, 2023

eyurtsev mentioned this issue Oct 4, 2023

Bump min version of numexpr #11302

Merged

eyurtsev closed this as completed Oct 27, 2023

langchain-ai locked as resolved and limited conversation to collaborators Feb 12, 2024

eyurtsev added the 🤖:security Related to security issues, CVEs label Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arbitrary code execution in LLMMathChain #8363

Arbitrary code execution in LLMMathChain #8363

jan-kubena commented Jul 27, 2023

hinthornw commented Jul 27, 2023

dosubot bot commented Jul 27, 2023

zoeqevans commented Jul 31, 2023 •

edited

Loading

hinthornw commented Jul 31, 2023

jan-kubena commented Aug 1, 2023 •

edited

Loading

zoeqevans commented Aug 1, 2023

zoeqevans commented Aug 2, 2023

jan-kubena commented Sep 13, 2023

tabdunabi commented Sep 26, 2023

elmegatan26 commented Sep 28, 2023

eyurtsev commented Oct 2, 2023 •

edited

Loading

tabdunabi commented Oct 2, 2023

elmegatan26 commented Oct 2, 2023

tabdunabi commented Oct 4, 2023 •

edited

Loading

eyurtsev commented Oct 4, 2023

tabdunabi commented Oct 4, 2023 •

edited

Loading

elmegatan26 commented Oct 4, 2023

dvirginz commented Oct 5, 2023 •

edited

Loading

eyurtsev commented Oct 5, 2023

dvirginz commented Oct 5, 2023 via email

tabdunabi commented Oct 10, 2023 •

edited

Loading

eyurtsev commented Oct 11, 2023

dvirginz commented Oct 11, 2023

tabdunabi commented Oct 11, 2023

eyurtsev commented Oct 11, 2023

tabdunabi commented Oct 11, 2023

eyurtsev commented Oct 11, 2023

eyurtsev commented Oct 27, 2023 •

edited

Loading

tabdunabi commented Oct 27, 2023

dvirginz commented Oct 27, 2023 via email •

edited

Loading

eyurtsev commented Oct 28, 2023

mschirmer84 commented Nov 14, 2023

eyurtsev commented Feb 12, 2024

Arbitrary code execution in LLMMathChain #8363

Arbitrary code execution in LLMMathChain #8363

Comments

jan-kubena commented Jul 27, 2023

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

hinthornw commented Jul 27, 2023

dosubot bot commented Jul 27, 2023

Answer

Sources

zoeqevans commented Jul 31, 2023 • edited Loading

hinthornw commented Jul 31, 2023

jan-kubena commented Aug 1, 2023 • edited Loading

zoeqevans commented Aug 1, 2023

zoeqevans commented Aug 2, 2023

jan-kubena commented Sep 13, 2023

tabdunabi commented Sep 26, 2023

elmegatan26 commented Sep 28, 2023

eyurtsev commented Oct 2, 2023 • edited Loading

tabdunabi commented Oct 2, 2023

elmegatan26 commented Oct 2, 2023

tabdunabi commented Oct 4, 2023 • edited Loading

eyurtsev commented Oct 4, 2023

tabdunabi commented Oct 4, 2023 • edited Loading

elmegatan26 commented Oct 4, 2023

dvirginz commented Oct 5, 2023 • edited Loading

eyurtsev commented Oct 5, 2023

dvirginz commented Oct 5, 2023 via email

tabdunabi commented Oct 10, 2023 • edited Loading

eyurtsev commented Oct 11, 2023

dvirginz commented Oct 11, 2023

tabdunabi commented Oct 11, 2023

eyurtsev commented Oct 11, 2023

tabdunabi commented Oct 11, 2023

eyurtsev commented Oct 11, 2023

eyurtsev commented Oct 27, 2023 • edited Loading

tabdunabi commented Oct 27, 2023

dvirginz commented Oct 27, 2023 via email • edited Loading

eyurtsev commented Oct 28, 2023

mschirmer84 commented Nov 14, 2023

eyurtsev commented Feb 12, 2024

zoeqevans commented Jul 31, 2023 •

edited

Loading

jan-kubena commented Aug 1, 2023 •

edited

Loading

eyurtsev commented Oct 2, 2023 •

edited

Loading

tabdunabi commented Oct 4, 2023 •

edited

Loading

tabdunabi commented Oct 4, 2023 •

edited

Loading

dvirginz commented Oct 5, 2023 •

edited

Loading

tabdunabi commented Oct 10, 2023 •

edited

Loading

eyurtsev commented Oct 27, 2023 •

edited

Loading

dvirginz commented Oct 27, 2023 via email •

edited

Loading