-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Please make it possibel to injest PDFs #69
Comments
Don't mind me, i'll just leave this here. It was in my tabs somewhere so... thought it would be helpful: Edit (added another link. Both use mPLUG-Owl : https://github.com/X-PLUG/mPLUG-Owl) |
@dm3h @MrXandbadas just added support for PDFs in the latest commit! Let us know if you run into any issues! |
Example usage: python3 main.py --archival_storage_files_compute_embeddings="memgpt_arxiv.pdf" --persona=memgpt_doc Example output:
|
the speed of these things is TERRIFYING! |
Wow... I cannot believe this was added so fast. Thank you so much! I will try this out tomorrow and let you know if there's anything valuable to share from the results. |
Can we ingest PDFs in real-time? |
Saw this request in the discord, but wasn't added here yet, so thought I would.
kevin | weaksauce.eth — Today at 08:04
Is it possible to have it ingest PDF docs
cpacker — Today at 08:09
at the moment .pdf isn't officially supported so we'd recommend converting from pdf to txt first with some OCR software (eg https://github.com/tesseract-ocr/tesseract#installing-tesseract), then follow the README examples that use .txt input
but def open an issue about this and we'll add support for it! shouldn't be too hard to automate this for you
The text was updated successfully, but these errors were encountered: