-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Trello document loader #4767
Changes from 6 commits
94eff6c
d7f6966
7fbe260
a361e75
2184eb1
31bdbb6
5507063
d98f6c1
d2a05fb
7fe7720
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Trello\n", | ||
"\n", | ||
">[Trello](https://www.atlassian.com/software/trello) is a web-based project management and collaboration tool that allows individuals and teams to organize and track their tasks and projects. It provides a visual interface known as a \"board\" where users can create lists and cards to represent their tasks and activities.\n", | ||
"\n", | ||
"The TrelloLoader allows you to load cards from a Trello board and is implemented on top of [py-trello](https://pypi.org/project/py-trello/)\n", | ||
"\n", | ||
"This currently supports `api_key/token` only.\n", | ||
"\n", | ||
"1. Credentials generation: https://trello.com/power-ups/admin/\n", | ||
"\n", | ||
"2. Click in the manual token generation link to get the token.\n", | ||
"\n", | ||
"This loader allows you to provide the board name to pull in the corresponding cards into Document objects.\n", | ||
"\n", | ||
"Notice that the board \"name\" is also called \"title\" in oficial documentation:\n", | ||
"\n", | ||
"https://support.atlassian.com/trello/docs/changing-a-boards-title-and-description/\n", | ||
"\n", | ||
"You can also specify several load parameters to include / remove different fields both from the document page_content properties and metadata.\n", | ||
"\n", | ||
"## Features\n", | ||
"- Load cards from a Trello board.\n", | ||
"- Filter cards based on their status (open or closed).\n", | ||
"- Include card names, comments, and checklists in the loaded documents.\n", | ||
"- Customize the additional metadata fields to include in the document.\n", | ||
"\n", | ||
"By default all card fields are included for the full text page_content and metadata accordinly.\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"#!pip install py-trello beautifulsoup4" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.document_loaders import TrelloLoader\n", | ||
"\n", | ||
"# Get the open cards from \"Awesome Board\"\n", | ||
"loader = TrelloLoader(\n", | ||
" api_key, \n", | ||
" api_token,\n", | ||
" board_name=\"Awesome Board\",\n", | ||
" card_filter = \"open\",\n", | ||
" )\n", | ||
"documents = loader.load()\n", | ||
"\n", | ||
"# Get all the cards from \"Awesome Board\" but only include the\n", | ||
"# card list(column) as extra metadata.\n", | ||
"loader = TrelloLoader(\n", | ||
" api_key, \n", | ||
" api_token,\n", | ||
" board_name=\"Another Board\",\n", | ||
" extra_metadata=(\"list\"),\n", | ||
" )\n", | ||
"documents = loader.load()\n", | ||
"\n", | ||
"# Get the closed cards from \"Awesome Board\" and exclude the card name,\n", | ||
"# checklist and comments from the Document page_content text.\n", | ||
"loader = TrelloLoader(\n", | ||
" api_key, \n", | ||
" api_token,\n", | ||
" board_name=\"Another Board\",\n", | ||
" card_filter = \"closed\",\n", | ||
" include_card_name= False,\n", | ||
" include_checklist= False,\n", | ||
" include_comments= False,\n", | ||
" )\n", | ||
"documents = loader.load()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.6" | ||
}, | ||
"vscode": { | ||
"interpreter": { | ||
"hash": "cc99336516f23363341912c6723b01ace86f02e26b4290be1efc0677e2e2ec24" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
"""Loader that loads cards from Trello""" | ||
from typing import List, Optional | ||
|
||
from langchain.docstore.document import Document | ||
from langchain.document_loaders.base import BaseLoader | ||
|
||
|
||
class TrelloLoader(BaseLoader): | ||
"""Trello loader. Reads all cards from a Trello board. | ||
|
||
Args: | ||
api_key (str): Trello API key. | ||
api_token (str): Trello API token. | ||
board_name (str): The name of the Trello board. | ||
include_card_name (bool): Whether to include the name of the card in the document. Defaults to False. | ||
include_comments (bool): Whether to include the comments on the card in the document. Defaults to False. | ||
include_checklist (bool): Whether to include the checklist on the card in the document. Defaults to False. | ||
card_filter (str, optional): Use "closed" / "open". Defaults to "all". | ||
extra_metadata (tuple[str]): List of additional metadata fields to include as document metadata. Defaults to ["due_date", "labels", "list", "is_closed"]. | ||
|
||
""" | ||
|
||
def __init__( | ||
self, | ||
api_key: str, | ||
api_token: str, | ||
board_name: str, | ||
include_card_name: bool = True, | ||
include_comments: bool = True, | ||
include_checklist: bool = True, | ||
card_filter: str = "all", | ||
extra_metadata: tuple[str, ...] = ("due_date", "labels", "list", "is_closed"), | ||
): | ||
"""Initialize Trello loader.""" | ||
self.api_key = api_key | ||
self.api_token = api_token | ||
self.board_name = board_name | ||
self.include_card_name = include_card_name | ||
self.include_comments = include_comments | ||
self.include_checklist = include_checklist | ||
self.extra_metadata = extra_metadata | ||
self.card_filter = card_filter | ||
|
||
def load(self) -> List[Document]: | ||
"""Loads all cards from the specified Trello board. | ||
You can filter the cards, metadata and text included by using the optional parameters. | ||
|
||
Returns: | ||
A list of documents, one for each card in the board. | ||
""" | ||
|
||
try: | ||
from trello import TrelloClient # type: ignore | ||
except ImportError as ex: | ||
raise ImportError( | ||
"Could not import trello python package. " | ||
"Please install it with `pip install py-trello`." | ||
) from ex | ||
try: | ||
from bs4 import BeautifulSoup # type: ignore | ||
except ImportError as ex: | ||
raise ImportError( | ||
"`beautifulsoup4` package not found, please run" | ||
" `pip install beautifulsoup4`" | ||
) from ex | ||
|
||
docs: List[Document] = [] | ||
client = TrelloClient(api_key=self.api_key, token=self.api_token) | ||
|
||
# Find the board with the matching name | ||
board = next( | ||
(b for b in client.list_boards() if b.name == self.board_name), None | ||
) | ||
if not board: | ||
raise ValueError(f"Board `{self.board_name}` not found.") | ||
|
||
# Create a dictionary with the list IDs as keys and the list names as values | ||
list_dict = {list_item.id: list_item.name for list_item in board.list_lists()} | ||
|
||
# Get Cards on the board | ||
cards = board.get_cards(card_filter=self.card_filter) | ||
for card in cards: | ||
text_content = "" | ||
if self.include_card_name: | ||
text_content = card.name + "\n" | ||
description = card.description.strip() | ||
if description: | ||
text_content += BeautifulSoup(card.description, "lxml").get_text() | ||
|
||
if self.include_checklist: | ||
# Get all the checklit items on the card | ||
items = [] | ||
for checklist in card.checklists: | ||
if checklist.items: | ||
items.extend( | ||
[ | ||
f"{item['name']}:{item['state']}" | ||
for item in checklist.items | ||
] | ||
) | ||
text_content += f"\n{checklist.name}\n" + "\n".join(items) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. guessing only new items should be part of the join? otherwise you'll be reading all the items added from a previous checklists There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! |
||
|
||
if self.include_comments: | ||
# Get all the comments on the card | ||
comments = [ | ||
BeautifulSoup(comment["data"]["text"], "lxml").get_text() | ||
for comment in card.comments | ||
] | ||
text_content += "Comments:" + "\n".join(comments) | ||
|
||
# Default metadata fields | ||
metadata = { | ||
"title": card.name, | ||
"id": card.id, | ||
"url": card.url, | ||
} | ||
|
||
# Extra metadata fields. Card object is not subscriptable. | ||
if "labels" in self.extra_metadata: | ||
metadata["labels"] = [label.name for label in card.labels] | ||
if "list" in self.extra_metadata: | ||
if card.list_id in list_dict: | ||
metadata["list"] = list_dict[card.list_id] | ||
if "is_closed" in self.extra_metadata: | ||
metadata["is_closed"] = card.is_closed | ||
if "due_date" in self.extra_metadata: | ||
metadata["due_date"] = card.due_date | ||
|
||
doc = Document(page_content=text_content, metadata=metadata) | ||
docs.append(doc) | ||
return docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any sample data (can be fake data) to help test that the massaging code inside the loader is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you feedback!
I updated most of it already. Now, about the fake data....
what do you need exactly?
Are you testing loaders (with external Api request) with mockups endpoints?
Asking because I could not find examples in the codebase.
In this case we have 2 abstraction layers the actual trello api and the objects returned by py-trello, that last one is the one we interact directly from langchain.
We could create a Trello "Lang Chain Test" board from a dummy email, since the free plan gives you api access too.
But I guess you will have to keep those safe, langchain maintainers side.
No sure if that is even possible for your current test setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we could test everything that follows usage of the trello client.
The client can be patched, to have
board.get_cards
return a fixtureA fixture would allow testing all the code that follows that statement, without having to rely on an internet connection or configuring trello accounts etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just attached a comment in the main methods that retrieve data. Each comment has a JSON example of the data structure that method returns.
I hope that's enough. The object from py-trello are quite obscure, I just simplified them for this use case.