-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GHSA-45pg-36p6-83v9] Langchain SQL Injection vulnerability #5025
[GHSA-45pg-36p6-83v9] Langchain SQL Injection vulnerability #5025
Conversation
Hey there. It looks like the cvss 3.1 score we have comes from the huntr.dev report which I'm inclined to believe over the nvd score. Reading from the huntr thread it seems like langchain does document that access control is the responsibility of the api user. Can you expand on why you disagree? Ref: https://huntr.com/bounties/8f4ad910-7fdc-4089-8f0a-b5df5f32e7c5 |
As noted in the Huntr thread, I reported the issue with the same severity
that NVD ultimately assigned. I had an extensive discussion there about it,
and they acknowledged that the vulnerability has a significant impact.
LangChain includes a dangerous functionality opt-in called
"allow_dangerous_requests", but it is implemented in numerous chains but
was not consistently across all text2query chains/tools. After my
disclosure, they mitigated these vulnerabilities and added
"allow_dangerous_requests" to the affected chains/tools.
For example, one of the chains mitigated because of my report is referenced
here:
LangChain GitHub
<https://github.com/langchain-ai/langchain/blame/abaea28417adb63dae0cfad3c60dff5297e3ce0d/libs/community/langchain_community/chains/graph_qa/neptune_sparql.py#L116>
.
Following the "Secure By Design" and "Secure By Default" principles
outlined in CISA’s guidelines, all code should be secure by default. As I
further highlighted in the Huntr thread, neither Neo4j nor LangChain's
official blogs or documentation reference the security implications of this
vulnerability. The only mentions are in general guidelines or as a source
code comment.
From a technical perspective, the CVSS score assigned to this vulnerability
is incorrect. This is not a local vulnerability but a *network
vulnerability*. An attacker does not need internal access to exploit it,
and the impact on integrity and availability is *High*, not Low, as they
indicated. *The vulnerability enables full database compromise and could
even impact multi-tenant setups. Moreover, the attack complexity is
extremely low—just a single line of text is enough to exploit it.*
Any SQL injection can be mitigated by placing a least privileged user and
also this won't remediate this vulnerability fully as the user can still
READ unintended database data in a raw way. After all, I think this is a
classic showcase how a vendor "decides" severity score based on business
objective and not pure technical perspective as the CVSS score they
reported is false no matter if it had or didn't have a security note (and
if a security note). Apache could also claim in Log4shell not to log "user
input" and if so, it is the responsibility of the user but this is not how
application security and vulnerability works. This is a joint effort and
all should take full "ownership" as threat actors are just waiting to
exploit the weakness.
CISA REF of NVD score acceptance:
https://www.cisa.gov/news-events/bulletins/sb24-309
Anyways, I would like to have your thoughts about it.
Liad
בתאריך יום ד׳, 20 בנוב׳ 2024 ב-22:39 מאת Jon ***@***.***
>:
… Hey there. It looks like the cvss 3.1 score we have comes from the
huntr.dev report which I'm inclined to believe over the nvd score.
Reading from the huntr thread it seems like langchain does document that
access control is the responsibility of the api user.
Ref: https://huntr.com/bounties/8f4ad910-7fdc-4089-8f0a-b5df5f32e7c5
—
Reply to this email directly, view it on GitHub
<#5025 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOF3AD6CZD2KNKSE2U2R3YD2BTXP3AVCNFSM6AAAAABSFLFQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGUYDANRYG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
That reads like it should be a considered a vuln in the chain that uses the function to me. It seems somewhat analogous to a database exposing full sql capacity. One would not say that mySQL or postgresql or whatever is vulnerable to sql injection, but rather some application using it could be depending on the context of the application.
Is that the case? Does langchain accept prompts from the network by default? |
I’m not sure I follow your analogy. Databases that expose SQL queries are
explicitly designed to run SQL, whereas LangChain is not intended as a SQL
engine to run queries like other packages, alembic or sqlalchemy. Instead,
it takes user text input and generates SQL (or Cypher) queries under the
hood as langchain is a "Build context-aware reasoning applications".
If you compare this to the alternative LlamaIndex, you’ll notice they avoid
this vulnerability by separating the logic for generating a Cypher query
from executing it.
Here’s the difference:
- *LlamaIndex*: User input (text) → Cypher query generation → Explicit
action to execute the query on the database (2-step process) → System
response (text).
- *LangChain*: User input (text) → Cypher query automatically executed
on the database (single-step process) → System response (text).
This distinction is critical because LangChain’s approach combines query
generation and execution into a single action, increasing the risk of
vulnerabilities and there isn't a way to validate a query before it runs on
the database.
*Is this a network vulnerability?*
LangChain doesn’t accept prompts directly from the network by default, but
it is almost always integrated into APIs or applications that do (This is
how RAG applications work and is useless if not). This makes it effectively
a network vulnerability because the system becomes exposed through typical
usage. as "Local" vulnerability means the attacker has already access to
the machine / component usually seen in PE vulnerabilities or LFI
vulnerabilities. For example, Log4j itself does not expose a network
interface but became a network vulnerability due to its handling of
untrusted input, such as logging user-provided data that could trigger
malicious behavior like JNDI lookups.
Furthermore, you can even look at the integrity, availability and
complexity as they are reported false as well, as How complex is it to
write "Delete all my nodes in database" and after this action database
can't serve nothing as all was deleted by the attacker (availability).
I hope my detailed answer gives more sense of the vulnerability and its
impact.
*Liad*
בתאריך יום ה׳, 21 בנוב׳ 2024 ב-0:34 מאת Jon ***@***.***
>:
… LangChain includes a dangerous functionality opt-in called
"allow_dangerous_requests", but it is implemented in numerous chains but
was not consistently across all text2query chains/tools. After my
disclosure, they mitigated these vulnerabilities and added
"allow_dangerous_requests" to the affected chains/tools.
That reads like it should be a considered a vuln in the chain that uses
the function to me. It seems somewhat analogous to a database exposing full
sql capacity. One would not say that mySQL or postgresql or whatever is
vulnerable to sql injection, but rather some application using it could be
depending on the context of the application.
From a technical perspective, the CVSS score assigned to this
vulnerability is incorrect. This is not a local vulnerability but a *network
vulnerability*
Is that the case? Does langchain accept prompts from the network by
default?
—
Reply to this email directly, view it on GitHub
<#5025 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOF3AD5YT2LIAR53D77AMXD2BUE63AVCNFSM6AAAAABSFLFQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGY3TCOJZG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
The analogy is that an api from one project is created to be used in an open ended way by another project.
Would that not be a problem with the use of langchain as opposed to langchain itself? |
I am not sure why you keep focusing only on a specific sentence without the
context of the paragraph.
1. Langchain and in specific GraphCypherQAChain is designed to get user
input, this is the way it works also in any blog you open from the
official Docs of langchain or neo4j they explain exactly how to use this
functionality and in most or all cases the input passed is user input. *Does
log4shell a network vulnerability or a local one? *
*2. Again, please address all other params in the CVSS score.*
3. The problem lies within LangChain itself, as they fall short in
implementing security measures at the API level, unlike other Text2Action
chains they provide. LangChain is not intended to execute raw SQL queries
over a database, and this is not its purpose. Instead, it takes user input
as text and returns a response for "context-aware reasoning applications,"
as stated in the first line of the GitHub project. It is their
responsibility to provide a secure way to use this functionality.
A comparable example is the Log4Shell vulnerability. The purpose of Log4j
is to serve as a logging framework, not a JNDI and LDAP tool. However,
Log4j used these functionalities to enhance its logging system,
inadvertently creating a vulnerability, and had these features enabled by
default. Similarly, LangChain's purpose is to return "smarter" answers to
questions using "Cypher" as a tool to improve the quality of the responses.
Unfortunately, this functionality is also enabled by default, posing
significant risks.
Thanks,
*Liad*
בתאריך יום ה׳, 21 בנוב׳ 2024 ב-1:41 מאת Jon ***@***.***
>:
… I’m not sure I follow your analogy.
The analogy is that an api from one project is created to be used in an
open ended way by another project.
LangChain doesn’t accept prompts directly from the network by default, but
it is almost always integrated into APIs or applications that do.
Would that not be a problem with the use of langchain as opposed to
langchain itself?
—
Reply to this email directly, view it on GitHub
<#5025 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOF3AD3VOJZCDMGXIXW7ZKL2BUMZLAVCNFSM6AAAAABSFLFQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZG43DEMRVGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I'm not seeing evidence that convinces me to change the cvss so I am not accepting this PR. We believe huntr did an appropriate job scoring this vulnerability and we would advise you to reach out to applications using langchain inappropriately in order to help secure them. |
I really not sure about what evidence are you trying to find. It is what it
is.
Langchain has business objective to have as lowest CVSS score.
You claim you “trust” huntr for scoring correctly but in fact and it’s not
debatable about availability, integrity and complexity.
Is this complex to exploit? No, so how complexity is high?
Is this vulnerability can delete or modify the data in the database? Yes,
so how its Low?
Furthermore you did not approach any of my questions and you don’t explain
why you think differently.
I did not score the NVD and they scored it as 9.8 not me.
Anyways, I may release this thread to the public to hear their thoughts
about it.
Liad
בתאריך יום ה׳, 21 בנוב׳ 2024 ב-20:16 מאת Jon ***@***.***>:
… I'm not seeing evidence that convinces me to change the cvss so I am not
accepting this PR. We believe huntr did an appropriate job scoring this
vulnerability and we would advise you to reach out to applications using
langchain inappropriately in order to help secure them.
—
Reply to this email directly, view it on GitHub
<#5025 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOF3AD6XRXKIXMWDMM7B5YT2BYPQPAVCNFSM6AAAAABSFLFQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJRHE2DSNZYGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Updates
Comments
I am the security researcher that found the vulnerability. NVD CVSS score is of 9.8 CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
This vulnerability is easy to exploit, exists in the default configuration, and results in full database exposure. The affected database is commonly used in scenarios where this functionality's interface exposes user input to the package. + credit would be nice ;) Thanks.