Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SSL: CERTIFICATE_VERIFY_FAILED] while load from SitemapLoader #5483

Closed
2 of 14 tasks
sevendark opened this issue May 31, 2023 · 0 comments · Fixed by #6256
Closed
2 of 14 tasks

[SSL: CERTIFICATE_VERIFY_FAILED] while load from SitemapLoader #5483

sevendark opened this issue May 31, 2023 · 0 comments · Fixed by #6256

Comments

@sevendark
Copy link
Contributor

sevendark commented May 31, 2023

System Info

langchain: 0.0.181
platform: windows
python: 3.11.3

Who can help?

@eyurtsev

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

site_loader = SitemapLoader(web_path="https://help.glueup.com/sitemap_index.xml")

docs = site_loader.load()
print(docs[0])

# ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)

Expected behavior

print the frist doc

hwchase17 pushed a commit that referenced this issue May 31, 2023
# Add param `requests_kwargs` for WebBaseLoader

Fixes # (issue)

#5483 

## Who can review?

@eyurtsev
hwchase17 added a commit that referenced this issue Jun 19, 2023
A must-include for SiteMap Loader to avoid the SSL verification error.
Setting the 'verify' to False by ``` sitemap_loader.requests_kwargs =
{"verify": False}``` does not bypass the SSL verification in some
websites.

There are websites (https:// researchadmin.asu.edu/ sitemap.xml) where
setting "verify" to False as shown below would not work:
sitemap_loader.requests_kwargs = {"verify": False} 

We need this merge to tell the Session to use a connector with a
specific argument about SSL:
 \# For SiteMap SSL verification
if not self.request_kwargs['verify']:
    connector = aiohttp.TCPConnector(ssl=False)
else:
    connector = None
 
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

Fixes #5483 

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

@hwchase17 
@eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this issue Jun 19, 2023
# Add param `requests_kwargs` for WebBaseLoader

Fixes # (issue)

langchain-ai#5483 

## Who can review?

@eyurtsev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant