You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it
Feature request
Currently O365BaseLoader (and consequently both derived loaders) are limited to pdf, doc, docx files. In this PR I'm introducing handlers argument that allows to process any file type with suitable parser implemented. User can also implement custom parser and pass that in. Please like the PR if that's something you'd like to see merged in.
In follow-up PR I also want to introduce a wrapper for DocumentLoaders to be converted into parsers. This is because there are implemented many Document Loaders without any corresponding parser. Such wrapper would open many additional file types to be used here.
Motivation
Sharepoint / Onedrive libraries can contain any type of documents. Being able to process only pdf, doc and docx documents is inadequate. Additionally these loaders currently hard-code the parser to be used. It is crucial to enable user to decide which parser they want to use (only pdf filetype currently has 7 different parser implementations). Allowing user to customize the parser used is also very useful feature.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Checked
Feature request
Currently O365BaseLoader (and consequently both derived loaders) are limited to
pdf
,doc
,docx
files. In this PR I'm introducinghandlers
argument that allows to process any file type with suitable parser implemented. User can also implement custom parser and pass that in. Please like the PR if that's something you'd like to see merged in.In follow-up PR I also want to introduce a wrapper for
DocumentLoader
s to be converted into parsers. This is because there are implemented many Document Loaders without any corresponding parser. Such wrapper would open many additional file types to be used here.Motivation
Sharepoint / Onedrive libraries can contain any type of documents. Being able to process only
pdf
,doc
anddocx
documents is inadequate. Additionally these loaders currently hard-code the parser to be used. It is crucial to enable user to decide which parser they want to use (onlypdf
filetype currently has 7 different parser implementations). Allowing user to customize the parser used is also very useful feature.Proposal (If applicable)
No response
Beta Was this translation helpful? Give feedback.
All reactions