Skip to content

Commit

Permalink
Implements support for Personal Access Token Authentication in the Co…
Browse files Browse the repository at this point in the history
…nfluenceLoader (langchain-ai#5385)

# Implements support for Personal Access Token Authentication in the
ConfluenceLoader

Fixes langchain-ai#5191

Implements a new optional parameter for the ConfluenceLoader: `token`.
This allows the use of personal access authentication when using the
on-prem server version of Confluence.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@eyurtsev @Jflick58 

Twitter Handle: felipe_yyc

---------

Co-authored-by: Felipe <feferreira@ea.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
  • Loading branch information
3 people authored and Undertone0809 committed Jun 19, 2023
1 parent 5ca0644 commit 6003ea2
Show file tree
Hide file tree
Showing 2 changed files with 86 additions and 7 deletions.
72 changes: 67 additions & 5 deletions docs/modules/indexes/document_loaders/examples/confluence.ipynb
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"source": [
"# Confluence\n",
"\n",
">[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities. \n",
"\n",
"A loader for `Confluence` pages currently supports both `username/api_key` and `Oauth2 login`.\n",
"See [instructions](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).\n",
"A loader for `Confluence` pages.\n",
"\n",
"\n",
"This currently supports `username/api_key`, `Oauth2 login`. Additionally, on-prem installations also support `token` authentication. \n",
"\n",
"\n",
"Specify a list `page_id`-s and/or `space_key` to load in the corresponding pages into Document objects, if both are specified the union of both sets will be returned.\n",
Expand All @@ -20,9 +23,17 @@
"Hint: `space_key` and `page_id` can both be found in the URL of a page in Confluence - https://yoursite.atlassian.com/wiki/spaces/<space_key>/pages/<page_id>\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Before using ConfluenceLoader make sure you have the latest version of the atlassian-python-api package installed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {
"tags": []
},
Expand All @@ -31,6 +42,29 @@
"#!pip install atlassian-python-api"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Examples"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Username and Password or Username and API Token (Atlassian Cloud only)\n",
"\n",
"This example authenticates using either a username and password or, if you're connecting to an Atlassian Cloud hosted version of Confluence, a username and an API Token.\n",
"You can generate an API token at: https://id.atlassian.com/manage-profile/security/api-tokens.\n",
"\n",
"The `limit` parameter specifies how many documents will be retrieved in a single call, not how many documents will be retrieved in total.\n",
"By default the code will return up to 1000 documents in 50 documents batches. To control the total number of documents use the `max_pages` parameter. \n",
"Plese note the maximum value for the `limit` parameter in the atlassian-python-api package is currently 100. "
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -46,6 +80,34 @@
")\n",
"documents = loader.load(space_key=\"SPACE\", include_attachments=True, limit=50)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Personal Access Token (Server/On-Prem only)\n",
"\n",
"This method is valid for the Data Center/Server on-prem edition only.\n",
"For more information on how to generate a Personal Access Token (PAT) check the official Confluence documentation at: https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html.\n",
"When using a PAT you provide only the token value, you cannot provide a username. \n",
"Please note that ConfluenceLoader will run under the permissions of the user that generated the PAT and will only be able to load documents for which said user has access to. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import ConfluenceLoader\n",
"\n",
"loader = ConfluenceLoader(\n",
" url=\"https://yoursite.atlassian.com/wiki\",\n",
" token=\"12345\"\n",
")\n",
"documents = loader.load(space_key=\"SPACE\", include_attachments=True, limit=50, max_pages=50)"
]
}
],
"metadata": {
Expand All @@ -64,7 +126,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.9.13"
},
"vscode": {
"interpreter": {
Expand Down
21 changes: 19 additions & 2 deletions langchain/document_loaders/confluence.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
class ConfluenceLoader(BaseLoader):
"""
Load Confluence pages. Port of https://llamahub.ai/l/confluence
This currently supports both username/api_key and Oauth2 login.
This currently supports username/api_key, Oauth2 login or personal access token
authentication.
Specify a list page_ids and/or space_key to load in the corresponding pages into
Document objects, if both are specified the union of both sets will be returned.
Expand Down Expand Up @@ -53,6 +54,8 @@ class ConfluenceLoader(BaseLoader):
:type username: str, optional
:param oauth2: _description_, defaults to {}
:type oauth2: dict, optional
:param token: _description_, defaults to None
:type token: str, optional
:param cloud: _description_, defaults to True
:type cloud: bool, optional
:param number_of_retries: How many times to retry, defaults to 3
Expand All @@ -73,14 +76,17 @@ def __init__(
api_key: Optional[str] = None,
username: Optional[str] = None,
oauth2: Optional[dict] = None,
token: Optional[str] = None,
cloud: Optional[bool] = True,
number_of_retries: Optional[int] = 3,
min_retry_seconds: Optional[int] = 2,
max_retry_seconds: Optional[int] = 10,
confluence_kwargs: Optional[dict] = None,
):
confluence_kwargs = confluence_kwargs or {}
errors = ConfluenceLoader.validate_init_args(url, api_key, username, oauth2)
errors = ConfluenceLoader.validate_init_args(
url, api_key, username, oauth2, token
)
if errors:
raise ValueError(f"Error(s) while validating input: {errors}")

Expand All @@ -101,6 +107,10 @@ def __init__(
self.confluence = Confluence(
url=url, oauth2=oauth2, cloud=cloud, **confluence_kwargs
)
elif token:
self.confluence = Confluence(
url=url, token=token, cloud=cloud, **confluence_kwargs
)
else:
self.confluence = Confluence(
url=url,
Expand All @@ -116,6 +126,7 @@ def validate_init_args(
api_key: Optional[str] = None,
username: Optional[str] = None,
oauth2: Optional[dict] = None,
token: Optional[str] = None,
) -> Union[List, None]:
"""Validates proper combinations of init arguments"""

Expand Down Expand Up @@ -147,6 +158,12 @@ def validate_init_args(
"`['access_token', 'access_token_secret', 'consumer_key', 'key_cert']`"
)

if token and (api_key or username or oauth2):
errors.append(
"Cannot provide a value for `token` and a value for `api_key`, "
"`username` or `oauth2`"
)

if errors:
return errors
return None
Expand Down

0 comments on commit 6003ea2

Please sign in to comment.