-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark SQL support #4602
Add Spark SQL support #4602
Conversation
template=QUERY_CHECKER, input_variables=["query"] | ||
), | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we force users to use the default prompt, I think it makes sense here and then I would change the error message below to be more specific.
@skcoirz Thanks for help updating this one! |
yeah, sure thing! I tested this. The new query checker is really powerful! It solved the previous concern of AnalysisException. Thank you so much for adding this! During the test, I noticed a few more opportunities. I have added them to our spreadsheet. Happy to chat more when you have time! Have a good weekend! :D |
from langchain.spark_sql import SparkSQL | ||
|
||
|
||
def create_spark_analytics_agent_verified( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skcoirz Having two methods for creating agents looks confusing.
which is verified during development. | ||
""" | ||
spark_sql = SparkSQL(schema=schema) | ||
llm = ChatOpenAI(temperature=0, model_name="gpt-4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not bind gpt-4 here. Langchain is supposed to be general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skcoirz I am reverting the related changes. You can still find the code in the git history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I’m good with it. There is a tradeoff. We can chat more later.
Moved the rest new features to a new PR on top of this branch. (#4672) |
cc @vowelparrot @hwchase17 could you review this one? The new agent is helpful for the Apache Spark community. |
…angchain-ai#4926) # Load specific file types from Google Drive (issue langchain-ai#4878) Add the possibility to define what file types you want to load from Google Drive. ``` loader = GoogleDriveLoader( folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5", file_types=["document", "pdf"] recursive=False ) ``` Fixes #langchain-ai#4878 ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: DataLoaders - @eyurtsev Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) | Discord: RicChilligerDude#7589 --------- Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
# API update: Engines -> Models see: https://community.openai.com/t/api-update-engines-models/18597 Co-authored-by: assert <zhangchengming@kkguan.com>
…exceptions (langchain-ai#4927) # TextLoader auto detect encoding and enhanced exception handling - Add an option to enable encoding detection on `TextLoader`. - The detection is done using `chardet` - The loading is done by trying all detected encodings by order of confidence or raise an exception otherwise. ### New Dependencies: - `chardet` Fixes langchain-ai#4479 ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: - @eyurtsev --------- Co-authored-by: blob42 <spike@w530>
# Fix bilibili api import error bilibili-api package is depracated and there is no sync module. <!-- Thank you for contributing to LangChain! Your PR will appear in our next release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. --> <!-- Remove if not applicable --> Fixes langchain-ai#2673 langchain-ai#2724 ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @vowelparrot @liaokongVFX <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 -->
# docs: updated `Supabase` notebook - the title of the notebook was inconsistent (included redundant "Vectorstore"). Removed this "Vectorstore" - added `Postgress` to the title. It is important. The `Postgres` name is much more popular than `Supabase`. - added description for the `Postrgress` - added more info to the `Supabase` description
…angchain-ai#4938) Correct typo in APIChain example notebook (Farenheit -> Fahrenheit)
Updated the docs from "An agent consists of three parts:" to "An agent consists of two parts:" since there are only two parts in the documentation
# docs: added `ecosystem/dependents` page Added `ecosystem/dependents` page. Can we propose a better page name?
# docs: vectorstores, different updates and fixes Multiple updates: - added/improved descriptions - fixed header levels - added headers - fixed headers
This reverts commit 1f3e54f.
7445cd7
to
9c90161
Compare
the output parser form chat conversational agent now raises `OutputParserException` like the rest. The `raise OutputParserExeption(...) from e` form also carries through the original error details on what went wrong. I added the `ValueError` as a base class to `OutputParserException` to avoid breaking code that was relying on `ValueError` as a way to catch exceptions from the agent. So catching ValuError still works. Not sure if this is a good idea though ?
I just did a final check before merging. |
# Zep Retriever - Vector Search Over Chat History with the Zep Long-term Memory Service More on Zep: https://github.com/getzep/zep Note: This PR is related to and relies on langchain-ai#4834. I did not want to modify the `pyproject.toml` file to add the `zep-python` dependency a second time. Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
The Anthropic classes used `BaseLanguageModel.get_num_tokens` because of an issue with multiple inheritance. Fixed by moving the method from `_AnthropicCommon` to both its subclasses. This change will significantly speed up token counting for Anthropic users.
… above. (langchain-ai#2675) Co-authored-by: Dev 2049 <dev.dev2049@gmail.com> Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
@@ -0,0 +1,174 @@ | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lets move this to utilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as in the whole file should go to langchain/utilities/spark_sql, not that specific line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that whole file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. This is done.
…n-ai#4761) - simplify the validation check a little bit. - re-tested in jupyter notebook. Reviewer: @hwchase17
# Add Spark SQL support * Add Spark SQL support. It can connect to Spark via building a local/remote SparkSession. * Include a notebook example I tried some complicated queries (window function, table joins), and the tool works well. Compared to the [Spark Dataframe agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/spark.html), this tool is able to generate queries across multiple tables. --------- # Your PR Title (What it does) <!-- Thank you for contributing to LangChain! Your PR will appear in our next release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: Gengliang Wang <gengliang@apache.org> Co-authored-by: Mike W <62768671+skcoirz@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com> Co-authored-by: 张城铭 <z@hyperf.io> Co-authored-by: assert <zhangchengming@kkguan.com> Co-authored-by: blob42 <spike@w530> Co-authored-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: Richard He <he.yucheng@outlook.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com> Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com> Co-authored-by: Alexey Nominas <60900649+Chae4ek@users.noreply.github.com> Co-authored-by: elBarkey <elbarkey@gmail.com> Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com> Co-authored-by: Jeffrey D <1289344+verygoodsoftwarenotvirus@users.noreply.github.com> Co-authored-by: so2liu <yangliu35@outlook.com> Co-authored-by: Viswanadh Rayavarapu <44315599+vishwa-rn@users.noreply.github.com> Co-authored-by: Chakib Ben Ziane <contact@blob42.xyz> Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com> Co-authored-by: Daniel Chalef <daniel.chalef@private.org> Co-authored-by: Jari Bakken <jari.bakken@gmail.com> Co-authored-by: escafati <scafatieugenio@gmail.com>
@hwchase17 @dev2049 @skcoirz Thanks for reviewing this! |
Add Spark SQL support
I tried some complicated queries (window function, table joins), and the tool works well.
Compared to the Spark Dataframe agent, this tool is able to generate queries across multiple tables.