-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Trello document loader #4767
Merged
dev2049
merged 10 commits into
langchain-ai:master
from
GMartin-dev:trello_document_loader
May 30, 2023
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
94eff6c
Added TrelloLoader class and updated init file.
GMartin-dev d7f6966
Added documentation in trello.ipynb file.
GMartin-dev 7fbe260
Moving configuration into initializer, updating docs.
GMartin-dev a361e75
Merge remote-tracking branch 'lg/master' into trello_document_loader
GMartin-dev 2184eb1
Tweaks. Removing redundant code.
GMartin-dev 31bdbb6
Add dummy test data, py-trello fixture and unit test cases.
GMartin-dev 5507063
cr
dev2049 d98f6c1
poetry
dev2049 d2a05fb
merge
dev2049 7fe7720
dep
dev2049 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
184 changes: 184 additions & 0 deletions
184
docs/modules/indexes/document_loaders/examples/trello.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Trello\n", | ||
"\n", | ||
">[Trello](https://www.atlassian.com/software/trello) is a web-based project management and collaboration tool that allows individuals and teams to organize and track their tasks and projects. It provides a visual interface known as a \"board\" where users can create lists and cards to represent their tasks and activities.\n", | ||
"\n", | ||
"The TrelloLoader allows you to load cards from a Trello board and is implemented on top of [py-trello](https://pypi.org/project/py-trello/)\n", | ||
"\n", | ||
"This currently supports `api_key/token` only.\n", | ||
"\n", | ||
"1. Credentials generation: https://trello.com/power-ups/admin/\n", | ||
"\n", | ||
"2. Click in the manual token generation link to get the token.\n", | ||
"\n", | ||
"To specify the API key and token you can either set the environment variables ``TRELLO_API_KEY`` and ``TRELLO_TOKEN`` or you can pass ``api_key`` and ``token`` directly into the `from_credentials` convenience constructor method.\n", | ||
"\n", | ||
"This loader allows you to provide the board name to pull in the corresponding cards into Document objects.\n", | ||
"\n", | ||
"Notice that the board \"name\" is also called \"title\" in oficial documentation:\n", | ||
"\n", | ||
"https://support.atlassian.com/trello/docs/changing-a-boards-title-and-description/\n", | ||
"\n", | ||
"You can also specify several load parameters to include / remove different fields both from the document page_content properties and metadata.\n", | ||
"\n", | ||
"## Features\n", | ||
"- Load cards from a Trello board.\n", | ||
"- Filter cards based on their status (open or closed).\n", | ||
"- Include card names, comments, and checklists in the loaded documents.\n", | ||
"- Customize the additional metadata fields to include in the document.\n", | ||
"\n", | ||
"By default all card fields are included for the full text page_content and metadata accordinly.\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"#!pip install py-trello beautifulsoup4" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 11, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"········\n", | ||
"········\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# If you have already set the API key and token using environment variables,\n", | ||
"# you can skip this cell and comment out the `api_key` and `token` named arguments\n", | ||
"# in the initialization steps below.\n", | ||
"from getpass import getpass\n", | ||
"\n", | ||
"API_KEY = getpass()\n", | ||
"TOKEN = getpass()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Review Tech partner pages\n", | ||
"Comments:\n", | ||
"{'title': 'Review Tech partner pages', 'id': '6475357890dc8d17f73f2dcc', 'url': 'https://trello.com/c/b0OTZwkZ/1-review-tech-partner-pages', 'labels': ['Demand Marketing'], 'list': 'Done', 'closed': False, 'due_date': ''}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from langchain.document_loaders import TrelloLoader\n", | ||
"\n", | ||
"# Get the open cards from \"Awesome Board\"\n", | ||
"loader = TrelloLoader.from_credentials(\n", | ||
" \"Awesome Board\",\n", | ||
" api_key=API_KEY,\n", | ||
" token=TOKEN,\n", | ||
" card_filter=\"open\",\n", | ||
" )\n", | ||
"documents = loader.load()\n", | ||
"\n", | ||
"print(documents[0].page_content)\n", | ||
"print(documents[0].metadata)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Review Tech partner pages\n", | ||
"Comments:\n", | ||
"{'title': 'Review Tech partner pages', 'id': '6475357890dc8d17f73f2dcc', 'url': 'https://trello.com/c/b0OTZwkZ/1-review-tech-partner-pages', 'list': 'Done'}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# Get all the cards from \"Awesome Board\" but only include the\n", | ||
"# card list(column) as extra metadata.\n", | ||
"loader = TrelloLoader.from_credentials(\n", | ||
" \"Awesome Board\",\n", | ||
" api_key=API_KEY,\n", | ||
" token=TOKEN,\n", | ||
" extra_metadata=(\"list\"),\n", | ||
")\n", | ||
"documents = loader.load()\n", | ||
"\n", | ||
"print(documents[0].page_content)\n", | ||
"print(documents[0].metadata)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Get the cards from \"Another Board\" and exclude the card name,\n", | ||
"# checklist and comments from the Document page_content text.\n", | ||
"loader = TrelloLoader.from_credentials(\n", | ||
" \"test\",\n", | ||
" api_key=API_KEY,\n", | ||
" token=TOKEN,\n", | ||
" include_card_name= False,\n", | ||
" include_checklist= False,\n", | ||
" include_comments= False,\n", | ||
")\n", | ||
"documents = loader.load()\n", | ||
"\n", | ||
"print(\"Document: \" + documents[0].page_content)\n", | ||
"print(documents[0].metadata)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.3" | ||
}, | ||
"vscode": { | ||
"interpreter": { | ||
"hash": "cc99336516f23363341912c6723b01ace86f02e26b4290be1efc0677e2e2ec24" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
"""Loader that loads cards from Trello""" | ||
from __future__ import annotations | ||
|
||
from typing import TYPE_CHECKING, Any, List, Literal, Optional, Tuple | ||
|
||
from langchain.docstore.document import Document | ||
from langchain.document_loaders.base import BaseLoader | ||
from langchain.utils import get_from_env | ||
|
||
if TYPE_CHECKING: | ||
from trello import Board, Card, TrelloClient | ||
|
||
|
||
class TrelloLoader(BaseLoader): | ||
"""Trello loader. Reads all cards from a Trello board.""" | ||
|
||
def __init__( | ||
self, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @GDrupal Thanks for the contribution! Document loaders require an argumentless load method. Could you push the configuration into the initializer? I've commented on the types of some of the parameters in |
||
client: TrelloClient, | ||
board_name: str, | ||
*, | ||
include_card_name: bool = True, | ||
include_comments: bool = True, | ||
include_checklist: bool = True, | ||
card_filter: Literal["closed", "open", "all"] = "all", | ||
extra_metadata: Tuple[str, ...] = ("due_date", "labels", "list", "closed"), | ||
): | ||
"""Initialize Trello loader. | ||
|
||
Args: | ||
client: Trello API client. | ||
board_name: The name of the Trello board. | ||
include_card_name: Whether to include the name of the card in the document. | ||
include_comments: Whether to include the comments on the card in the | ||
document. | ||
include_checklist: Whether to include the checklist on the card in the | ||
document. | ||
card_filter: Filter on card status. Valid values are "closed", "open", | ||
"all". | ||
extra_metadata: List of additional metadata fields to include as document | ||
metadata.Valid values are "due_date", "labels", "list", "closed". | ||
|
||
""" | ||
self.client = client | ||
self.board_name = board_name | ||
self.include_card_name = include_card_name | ||
self.include_comments = include_comments | ||
self.include_checklist = include_checklist | ||
self.extra_metadata = extra_metadata | ||
self.card_filter = card_filter | ||
|
||
@classmethod | ||
def from_credentials( | ||
cls, | ||
board_name: str, | ||
*, | ||
api_key: Optional[str] = None, | ||
token: Optional[str] = None, | ||
**kwargs: Any, | ||
) -> TrelloLoader: | ||
"""Convenience constructor that builds TrelloClient init param for you. | ||
|
||
Args: | ||
board_name: The name of the Trello board. | ||
api_key: Trello API key. Can also be specified as environment variable | ||
TRELLO_API_KEY. | ||
token: Trello token. Can also be specified as environment variable | ||
TRELLO_TOKEN. | ||
include_card_name: Whether to include the name of the card in the document. | ||
include_comments: Whether to include the comments on the card in the | ||
document. | ||
include_checklist: Whether to include the checklist on the card in the | ||
document. | ||
card_filter: Filter on card status. Valid values are "closed", "open", | ||
"all". | ||
extra_metadata: List of additional metadata fields to include as document | ||
metadata.Valid values are "due_date", "labels", "list", "closed". | ||
""" | ||
|
||
try: | ||
from trello import TrelloClient # type: ignore | ||
except ImportError as ex: | ||
raise ImportError( | ||
"Could not import trello python package. " | ||
"Please install it with `pip install py-trello`." | ||
) from ex | ||
api_key = api_key or get_from_env("api_key", "TRELLO_API_KEY") | ||
token = token or get_from_env("token", "TRELLO_TOKEN") | ||
client = TrelloClient(api_key=api_key, token=token) | ||
return cls(client, board_name, **kwargs) | ||
|
||
def load(self) -> List[Document]: | ||
"""Loads all cards from the specified Trello board. | ||
|
||
You can filter the cards, metadata and text included by using the optional | ||
parameters. | ||
|
||
Returns: | ||
A list of documents, one for each card in the board. | ||
""" | ||
try: | ||
from bs4 import BeautifulSoup # noqa: F401 | ||
except ImportError as ex: | ||
raise ImportError( | ||
"`beautifulsoup4` package not found, please run" | ||
" `pip install beautifulsoup4`" | ||
) from ex | ||
|
||
board = self._get_board() | ||
# Create a dictionary with the list IDs as keys and the list names as values | ||
list_dict = {list_item.id: list_item.name for list_item in board.list_lists()} | ||
# Get Cards on the board | ||
cards = board.get_cards(card_filter=self.card_filter) | ||
return [self._card_to_doc(card, list_dict) for card in cards] | ||
|
||
def _get_board(self) -> Board: | ||
# Find the first board with a matching name | ||
board = next( | ||
(b for b in self.client.list_boards() if b.name == self.board_name), None | ||
) | ||
if not board: | ||
raise ValueError(f"Board `{self.board_name}` not found.") | ||
return board | ||
|
||
def _card_to_doc(self, card: Card, list_dict: dict) -> Document: | ||
from bs4 import BeautifulSoup # type: ignore | ||
|
||
text_content = "" | ||
if self.include_card_name: | ||
text_content = card.name + "\n" | ||
if card.description.strip(): | ||
text_content += BeautifulSoup(card.description, "lxml").get_text() | ||
if self.include_checklist: | ||
# Get all the checklist items on the card | ||
for checklist in card.checklists: | ||
if checklist.items: | ||
items = [ | ||
f"{item['name']}:{item['state']}" for item in checklist.items | ||
] | ||
text_content += f"\n{checklist.name}\n" + "\n".join(items) | ||
|
||
if self.include_comments: | ||
# Get all the comments on the card | ||
comments = [ | ||
BeautifulSoup(comment["data"]["text"], "lxml").get_text() | ||
for comment in card.comments | ||
] | ||
text_content += "Comments:" + "\n".join(comments) | ||
|
||
# Default metadata fields | ||
metadata = { | ||
"title": card.name, | ||
"id": card.id, | ||
"url": card.url, | ||
} | ||
|
||
# Extra metadata fields. Card object is not subscriptable. | ||
if "labels" in self.extra_metadata: | ||
metadata["labels"] = [label.name for label in card.labels] | ||
if "list" in self.extra_metadata: | ||
if card.list_id in list_dict: | ||
metadata["list"] = list_dict[card.list_id] | ||
if "closed" in self.extra_metadata: | ||
metadata["closed"] = card.closed | ||
if "due_date" in self.extra_metadata: | ||
metadata["due_date"] = card.due_date | ||
|
||
return Document(page_content=text_content, metadata=metadata) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any sample data (can be fake data) to help test that the massaging code inside the loader is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you feedback!
I updated most of it already. Now, about the fake data....
what do you need exactly?
Are you testing loaders (with external Api request) with mockups endpoints?
Asking because I could not find examples in the codebase.
In this case we have 2 abstraction layers the actual trello api and the objects returned by py-trello, that last one is the one we interact directly from langchain.
We could create a Trello "Lang Chain Test" board from a dummy email, since the free plan gives you api access too.
But I guess you will have to keep those safe, langchain maintainers side.
No sure if that is even possible for your current test setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we could test everything that follows usage of the trello client.
The client can be patched, to have
board.get_cards
return a fixtureA fixture would allow testing all the code that follows that statement, without having to rely on an internet connection or configuring trello accounts etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just attached a comment in the main methods that retrieve data. Each comment has a JSON example of the data structure that method returns.
I hope that's enough. The object from py-trello are quite obscure, I just simplified them for this use case.