Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Enable crawling of websites that require credentials via SSO or 2FA #1040

Open
1 of 2 tasks
touma-I opened this issue Feb 11, 2025 · 0 comments
Open
1 of 2 tasks
Assignees
Labels
enhancement New feature or request

Comments

@touma-I
Copy link
Collaborator

touma-I commented Feb 11, 2025

Search before asking

  • I searched the issues and found no similar issues.

Component

Transforms/Other, Other

Feature

Adde credential support to web2parquet transform: Currently, web2parquet transform fails if the site that is being crawled requires any sort of credentials. This use case is very relevant to RAG, Fine-Tuning and/or Search and Retrieval use cases where customers would want to access their own internal websites for retrieving internal document to use as part of their LLM application.

cc: @hmtbr Do you know if data-prep-connector supports credentials ? In which case we would need to extend the web2parquet transform. If not, is it possible to extend the data-prep-connector to support credentials?

cc: @Qiragg

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@touma-I touma-I added the enhancement New feature or request label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants