Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Trello document loader #4767

Merged
merged 10 commits into from
May 30, 2023
117 changes: 117 additions & 0 deletions docs/modules/indexes/document_loaders/examples/trello.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Trello\n",
"\n",
">[Trello](https://www.atlassian.com/software/trello) is a web-based project management and collaboration tool that allows individuals and teams to organize and track their tasks and projects. It provides a visual interface known as a \"board\" where users can create lists and cards to represent their tasks and activities.\n",
"\n",
"The TrelloLoader allows you to load cards from a Trello board and is implemented on top of [py-trello](https://pypi.org/project/py-trello/)\n",
"\n",
"This currently supports `api_key/token` only.\n",
"\n",
"1. Credentials generation: https://trello.com/power-ups/admin/\n",
"\n",
"2. Click in the manual token generation link to get the token.\n",
"\n",
"This loader allows you to provide the board name to pull in the corresponding cards into Document objects.\n",
"\n",
"Notice that the board \"name\" is also called \"title\" in oficial documentation:\n",
"\n",
"https://support.atlassian.com/trello/docs/changing-a-boards-title-and-description/\n",
"\n",
"You can also specify several load parameters to include / remove different fields both from the document page_content properties and metadata.\n",
"\n",
"## Features\n",
"- Load cards from a Trello board.\n",
"- Filter cards based on their status (open or closed).\n",
"- Include card names, comments, and checklists in the loaded documents.\n",
"- Customize the additional metadata fields to include in the document.\n",
"\n",
"By default all card fields are included for the full text page_content and metadata accordinly.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!pip install py-trello beautifulsoup4"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TrelloLoader\n",
"\n",
"# Get the open cards from \"Awesome Board\"\n",
"loader = TrelloLoader(\n",
" api_key, \n",
" api_token,\n",
" board_name=\"Awesome Board\",\n",
" card_filter = \"open\",\n",
" )\n",
"documents = loader.load()\n",
"\n",
"# Get all the cards from \"Awesome Board\" but only include the\n",
"# card list(column) as extra metadata.\n",
"loader = TrelloLoader(\n",
" api_key, \n",
" api_token,\n",
" board_name=\"Another Board\",\n",
" extra_metadata=(\"list\"),\n",
" )\n",
"documents = loader.load()\n",
"\n",
"# Get the closed cards from \"Awesome Board\" and exclude the card name,\n",
"# checklist and comments from the Document page_content text.\n",
"loader = TrelloLoader(\n",
" api_key, \n",
" api_token,\n",
" board_name=\"Another Board\",\n",
" card_filter = \"closed\",\n",
" include_card_name= False,\n",
" include_checklist= False,\n",
" include_comments= False,\n",
" )\n",
"documents = loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"vscode": {
"interpreter": {
"hash": "cc99336516f23363341912c6723b01ace86f02e26b4290be1efc0677e2e2ec24"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}
2 changes: 2 additions & 0 deletions langchain/document_loaders/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
from langchain.document_loaders.text import TextLoader
from langchain.document_loaders.tomarkdown import ToMarkdownLoader
from langchain.document_loaders.toml import TomlLoader
from langchain.document_loaders.trello import TrelloLoader
from langchain.document_loaders.twitter import TwitterTweetLoader
from langchain.document_loaders.unstructured import (
UnstructuredAPIFileIOLoader,
Expand Down Expand Up @@ -197,6 +198,7 @@
"StripeLoader",
"TextLoader",
"TomlLoader",
"TrelloLoader",
"TwitterTweetLoader",
"UnstructuredAPIFileIOLoader",
"UnstructuredAPIFileLoader",
Expand Down
131 changes: 131 additions & 0 deletions langchain/document_loaders/trello.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
"""Loader that loads cards from Trello"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any sample data (can be fake data) to help test that the massaging code inside the loader is correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you feedback!
I updated most of it already. Now, about the fake data....
what do you need exactly?
Are you testing loaders (with external Api request) with mockups endpoints?
Asking because I could not find examples in the codebase.
In this case we have 2 abstraction layers the actual trello api and the objects returned by py-trello, that last one is the one we interact directly from langchain.
We could create a Trello "Lang Chain Test" board from a dummy email, since the free plan gives you api access too.
But I guess you will have to keep those safe, langchain maintainers side.
No sure if that is even possible for your current test setup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we could test everything that follows usage of the trello client.

The client can be patched, to have board.get_cards return a fixture

cards = board.get_cards(card_filter=self.card_filter)  # <-- from a fixture instead of the web

A fixture would allow testing all the code that follows that statement, without having to rely on an internet connection or configuring trello accounts etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just attached a comment in the main methods that retrieve data. Each comment has a JSON example of the data structure that method returns.
I hope that's enough. The object from py-trello are quite obscure, I just simplified them for this use case.

from typing import List, Optional

from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader


class TrelloLoader(BaseLoader):
"""Trello loader. Reads all cards from a Trello board.

Args:
api_key (str): Trello API key.
api_token (str): Trello API token.
board_name (str): The name of the Trello board.
include_card_name (bool): Whether to include the name of the card in the document. Defaults to False.
include_comments (bool): Whether to include the comments on the card in the document. Defaults to False.
include_checklist (bool): Whether to include the checklist on the card in the document. Defaults to False.
card_filter (str, optional): Use "closed" / "open". Defaults to "all".
extra_metadata (tuple[str]): List of additional metadata fields to include as document metadata. Defaults to ["due_date", "labels", "list", "is_closed"].

"""

def __init__(
self,
api_key: str,
api_token: str,
board_name: str,
include_card_name: bool = True,
include_comments: bool = True,
include_checklist: bool = True,
card_filter: str = "all",
extra_metadata: tuple[str, ...] = ("due_date", "labels", "list", "is_closed"),
):
"""Initialize Trello loader."""
self.api_key = api_key
self.api_token = api_token
self.board_name = board_name
self.include_card_name = include_card_name
self.include_comments = include_comments
self.include_checklist = include_checklist
self.extra_metadata = extra_metadata
self.card_filter = card_filter

def load(self) -> List[Document]:
"""Loads all cards from the specified Trello board.
You can filter the cards, metadata and text included by using the optional parameters.

Returns:
A list of documents, one for each card in the board.
"""

try:
from trello import TrelloClient # type: ignore
except ImportError as ex:
raise ImportError(
"Could not import trello python package. "
"Please install it with `pip install py-trello`."
) from ex
try:
from bs4 import BeautifulSoup # type: ignore
except ImportError as ex:
raise ImportError(
"`beautifulsoup4` package not found, please run"
" `pip install beautifulsoup4`"
) from ex

docs: List[Document] = []
client = TrelloClient(api_key=self.api_key, token=self.api_token)

# Find the board with the matching name
board = next(
(b for b in client.list_boards() if b.name == self.board_name), None
)
if not board:
raise ValueError(f"Board `{self.board_name}` not found.")

# Create a dictionary with the list IDs as keys and the list names as values
list_dict = {list_item.id: list_item.name for list_item in board.list_lists()}

# Get Cards on the board
cards = board.get_cards(card_filter=self.card_filter)
for card in cards:
text_content = ""
if self.include_card_name:
text_content = card.name + "\n"
description = card.description.strip()
if description:
text_content += BeautifulSoup(card.description, "lxml").get_text()

if self.include_checklist:
# Get all the checklit items on the card
items = []
for checklist in card.checklists:
if checklist.items:
items.extend(
[
f"{item['name']}:{item['state']}"
for item in checklist.items
]
)
text_content += f"\n{checklist.name}\n" + "\n".join(items)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guessing only new items should be part of the join? otherwise you'll be reading all the items added from a previous checklists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
I saw that you did a little refactor, thanks!


if self.include_comments:
# Get all the comments on the card
comments = [
BeautifulSoup(comment["data"]["text"], "lxml").get_text()
for comment in card.comments
]
text_content += "Comments:" + "\n".join(comments)

# Default metadata fields
metadata = {
"title": card.name,
"id": card.id,
"url": card.url,
}

# Extra metadata fields. Card object is not subscriptable.
if "labels" in self.extra_metadata:
metadata["labels"] = [label.name for label in card.labels]
if "list" in self.extra_metadata:
if card.list_id in list_dict:
metadata["list"] = list_dict[card.list_id]
if "is_closed" in self.extra_metadata:
metadata["is_closed"] = card.is_closed
if "due_date" in self.extra_metadata:
metadata["due_date"] = card.due_date

doc = Document(page_content=text_content, metadata=metadata)
docs.append(doc)
return docs
Loading