We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It is quite an issue to not be able to at least tag PDFs when importing with connectors.
What about changing this function: def pdf_to_text(file: IO[Any], pdf_pass: str | None = None) -> str:
to return the metadata the same way as:
https://github.com/danswer-ai/danswer/blob/1bc899cc6787d948db2d207b3332f38ada19557d/backend/danswer/file_processing/extract_file_text.py#L148-L153****
pypdf as a feature to extract them: https://pypdf.readthedocs.io/en/stable/user/metadata.html
pypdf
That could allow to pre-process the PDFs to add the metadata we need.
The text was updated successfully, but these errors were encountered:
Great idea! Added in #2278
Sorry, something went wrong.
No branches or pull requests
It is quite an issue to not be able to at least tag PDFs when importing with connectors.
What about changing this function:
def pdf_to_text(file: IO[Any], pdf_pass: str | None = None) -> str:
to return the metadata the same way as:
https://github.com/danswer-ai/danswer/blob/1bc899cc6787d948db2d207b3332f38ada19557d/backend/danswer/file_processing/extract_file_text.py#L148-L153****
pypdf
as a feature to extract them:https://pypdf.readthedocs.io/en/stable/user/metadata.html
That could allow to pre-process the PDFs to add the metadata we need.
The text was updated successfully, but these errors were encountered: