Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hugging Face chat wrapper #14040

Closed
wants to merge 208 commits into from

Conversation

andrewrreed
Copy link
Contributor

@andrewrreed andrewrreed commented Nov 29, 2023

Issue

There previously has been no easy way to make use of models hosted on Hugging Face (via Inference API or Inference Endpoints) in combination with LangChains ChatModel abstraction.

Description

This PR introduces a new chat_model integration that creates a wrapper around the BaseChatModel class that interfaces between LangChain's and the hosted LLM by leveraging Hugging Face's Chat Templates.

To do

  • Add wrapper class aroundBaseChatModel to interface with HF LLM integrations
  • Create a notebook docs/integrations/chat that demonstrates its use
  • Add unit test

Tag maintainer

@hwchase17

Twitter handle

@andrewrreed

Copy link

vercel bot commented Nov 29, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 14, 2023 5:29pm

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. Ɑ: models Related to LLMs or chat model modules 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Nov 29, 2023
@andrewrreed andrewrreed changed the title Add Hugging Face chat wrapper [WIP] Add Hugging Face chat wrapper Nov 29, 2023
@andrewrreed
Copy link
Contributor Author

@hwchase17 When running the unit tests locally, there are 22 tests that are failing, all of which are inside these two files:

  • tests/unit_tests/callbacks/tracers/test_base_tracer.py
  • tests/unit_tests/callbacks/tracers/test_langchain_v1.py

However, when I run pytest on each of those files individually with poetry run pytest --disable-socket --allow-unix-socket <test-file> they all pass.... Not sure why this is happening?

Copy link
Collaborator

@baskaryan baskaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the unit test error/failure you're seeing?

@andrewrreed
Copy link
Contributor Author

andrewrreed commented Nov 30, 2023

@baskaryan The unit tests were failing due to that global import of an optional dependency. Fixed that so everything is passing locally now.

Let me know if there's anything else I need to add here (like integration tests)? Thanks!

@andrewrreed andrewrreed changed the title [WIP] Add Hugging Face chat wrapper Add Hugging Face chat wrapper Dec 2, 2023
@andrewrreed
Copy link
Contributor Author

@baskaryan Looks like huggingface_chat_wrapper is failing some of the linting checks from mypy. I'm not sure what is wrong here, but the failures seem to be stemming from these three issues:

  • langchain/chat_models/huggingface_chat_wrapper.py:43: error: Variable "langchain.llms.HuggingFaceTextGenInference" is not valid as a type [valid-type]
  • langchain/chat_models/huggingface_chat_wrapper.py:43: error: Variable "langchain.llms.HuggingFaceEndpoint" is not valid as a type [valid-type]
  • langchain/chat_models/huggingface_chat_wrapper.py:43: error: Variable "langchain.llms.HuggingFaceHub" is not valid as a type [valid-type]

When you get the chance, could you help advise on what might be wrong? Thanks!

- **Description:** Added a tool called RedditSearchRun and an
accompanying API wrapper, which searches Reddit for posts with support
for time filtering, post sorting, query string and subreddit filtering.
  - **Issue:** langchain-ai#13891 
  - **Dependencies:** `praw` module is used to search Reddit
- **Tag maintainer:** @baskaryan , and any of the other maintainers if
needed
  - **Twitter handle:** None.

  Hello,

This is our first PR and we hope that our changes will be helpful to the
community. We have run `make format`, `make lint` and `make test`
locally before submitting the PR. To our knowledge, our changes do not
introduce any new errors.

Our PR integrates the `praw` package which is already used by
RedditPostsLoader in LangChain. Nonetheless, we have added integration
tests and edited unit tests to test our changes. An example notebook is
also provided. These changes were put together by me, @Anika2000,
@CharlesXu123, and @Jeremy-Cheng-stack

Thank you in advance to the maintainers for their time.

---------

Co-authored-by: What-Is-A-Username <49571870+What-Is-A-Username@users.noreply.github.com>
Co-authored-by: Anika2000 <anika.sultana@mail.utoronto.ca>
Co-authored-by: Jeremy Cheng <81793294+Jeremy-Cheng-stack@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
baskaryan and others added 13 commits December 11, 2023 16:20
This reverts commit 38813d7. This is a
temporary fix, as I don't see a clear way on how to use multiple keys
with `Qdrant.from_texts`.

Context: langchain-ai#14378
The namespaces like `langchain.agents.format_scratchpad` clogging the
API Reference sidebar.
This change removes those 3-level namespaces from sidebar (this issue
was discussed with @efriis )

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Many jupyter notebooks didn't pass linting. List of these files are
presented in the [tool.ruff.lint.per-file-ignores] section of the
pyproject.toml . Addressed these bugs:
- fixed bugs; added missed imports; updated pyproject.toml
 Only the `document_loaders/tensorflow_datasets.ipyn`,
`cookbook/gymnasium_agent_simulation.ipynb` are not completely fixed.
I'm not sure about imports.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Updated provider page by adding LLM and ChatLLM references; removed a
content that is duplicate text from the LLM referenced page.
Updated the collback page
@andrewrreed
Copy link
Contributor Author

Thanks for your help on this @A-Roucher!

@baskaryan This PR is now passing all tests and linting checks. Please let us know if anything else is needed to get this merged!

@aymeric-roucher
Copy link
Contributor

aymeric-roucher commented Dec 12, 2023

No problem @andrewrreed, looking forward to start using this integration!

baskaryan pushed a commit that referenced this pull request Dec 21, 2023
Builds on #14040 with community refactor merged and notebook updated.

Note that with this refactor, models will be imported from
`langchain_community.chat_models.huggingface` rather than the main
`langchain` repo.

---------

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Co-authored-by: Andrew Reed <andrew.reed.r@gmail.com>
Co-authored-by: Andrew Reed <areed1242@gmail.com>
Co-authored-by: A-Roucher <aymeric.roucher@gmail.com>
Co-authored-by: Aymeric Roucher <69208727+A-Roucher@users.noreply.github.com>
@baskaryan
Copy link
Collaborator

landed in #14736

@baskaryan baskaryan closed this Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features Ɑ: models Related to LLMs or chat model modules size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.