Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update web_base.py to have verify option #6107

Merged
merged 4 commits into from
Jun 17, 2023

Conversation

jackfrost1411
Copy link
Contributor

@jackfrost1411 jackfrost1411 commented Jun 13, 2023

We propose an enhancement to the web-based loader initialize method by introducing a "verify" option. This enhancement addresses the issue of SSL verification errors encountered on certain web pages. By providing users with the option to set the verify parameter to False, we offer greater flexibility and control.

Fixes #6079

Who can review?

@eyurtsev @hwchase17

Chnage web base loader initialize method to have "verify" option. For example, before it gave SSL verification error in some webpages. This gives option to the users to set verify option to False if they want.
We propose an enhancement to the web-based loader initialize method by introducing a "verify" option. This enhancement addresses the issue of SSL verification errors encountered in certain web pages. By providing users with the option to set the verify parameter to False, we offer greater flexibility and control.
@jackfrost1411 jackfrost1411 changed the title Jackfrost1411 patch 2 update web_base.py to have verify option Jun 13, 2023
@jackfrost1411
Copy link
Contributor Author

jackfrost1411 commented Jun 13, 2023

And by adding verify option: you can finally pass in headers such as

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

to bypass the SSL verification.

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
loader = WebBaseLoader(web_path="https://SO_AND_SO.com", header_template=headers, verify=False)
data = loader.load()

This solves a lot of issues that I faced in the recent past.

@jackfrost1411
Copy link
Contributor Author

jackfrost1411 commented Jun 13, 2023

The older version of web_base.py gives errors:
image

The newer version of web_base.py is working just fine:
Screen Shot 2023-06-13 at 1 27 12 PM

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - seems great

@hwchase17 hwchase17 added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jun 17, 2023
@vercel
Copy link

vercel bot commented Jun 17, 2023

@hwchase17 is attempting to deploy a commit to the LangChain Team on Vercel.

A member of the Team first needs to authorize it.

@hwchase17 hwchase17 merged commit 2eec687 into langchain-ai:master Jun 17, 2023
@zomchak-code zomchak-code mentioned this pull request Jun 21, 2023
This was referenced Jun 25, 2023
rlancemartin pushed a commit that referenced this pull request Jul 5, 2023
Fix for bug in SitemapLoader

`aiohttp` `get` does not accept `verify` argument, and currently throws
error, so SitemapLoader is not working

This PR fixes it by removing `verify` param for `get` function call

Fixes #6107

#### Who can review?

Tag maintainers/contributors who might be interested:

@eyurtsev

---------

Co-authored-by: techcenary <127699216+techcenary@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue: Can't load a public webpage
2 participants