Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add content_format param to ConfluenceLoader.load() #5922

Merged

Conversation

haoqixu
Copy link
Contributor

@haoqixu haoqixu commented Jun 9, 2023

Confluence API supports difference format of page content. The storage format is the raw XML representation for storage. The view format is the HTML representation for viewing with macros rendered as though it is viewed by users.

Add the content_format parameter to ConfluenceLoader.load() to specify the content format, this is
set to ContentFormat.STORAGE by default.

Who can review?

Tag maintainers/contributors who might be interested: @eyurtsev

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - thanks!

@hwchase17 hwchase17 added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jun 11, 2023
@haoqixu haoqixu requested a review from hwchase17 June 13, 2023 03:29
@hwchase17 hwchase17 merged commit 7ad13cd into langchain-ai:master Jun 14, 2023
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this pull request Jun 19, 2023
…ai#5922)

Confluence API supports difference format of page content. The storage
format is the raw XML representation for storage. The view format is the
HTML representation for viewing with macros rendered as though it is
viewed by users.

Add the `content_format` parameter to `ConfluenceLoader.load()` to
specify the content format, this is
set to `ContentFormat.STORAGE` by default.

#### Who can review?

Tag maintainers/contributors who might be interested: @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
@Natanhel
Copy link

Natanhel commented Jun 20, 2023

Soup get_text() has a strip=True flag and will strip every xml tag, why even have the STORAGE option?
This does not work, and we need to use xml in order to get links from the page.

This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants