Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] - Add support for .ppt / .pptx #253

Closed
nenb opened this issue Dec 17, 2023 · 4 comments
Closed

[ENH] - Add support for .ppt / .pptx #253

nenb opened this issue Dec 17, 2023 · 4 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed type: enhancement 💅 New feature or request

Comments

@nenb
Copy link
Contributor

nenb commented Dec 17, 2023

Feature description

(Largely a copy/paste from #225)

Powerpoint documents are everywhere in the corporate world. We should support them out of the box. There are multiple Python packages out there that provide this functionality. We should do a light comparison of them to get a clearer picture. Optimally, we find a package that can give us page information. Not sure if that is possible in these data formats though.

Value and/or benefit

Users with corporate documents will have a lower barrier of entry.

Anything else?

No response

@nenb nenb added the type: enhancement 💅 New feature or request label Dec 17, 2023
@pmeier
Copy link
Member

pmeier commented Dec 17, 2023

Let's go for it!

@pmeier pmeier added help wanted Extra attention is needed good first issue Good for newcomers labels Dec 17, 2023
@davidedigrande
Copy link
Contributor

Hi!

I'd like to take this challenge as my first issue!

I have looked into issue #225 and #221 and I think the feature can be achieved using the library python-pptx which has similar API to python-docx, already used for #221

I've installed Ragna and looked at the codebase, but currently I can only work with RagnaDemo, as I don't have a vector DB or an API key for a LLM, but I think as long as we correctly define a DocumentHandler implementing

  • requirements(cls) -> list[Requirement]:
  • supported_suffixes(cls) -> list[str]
  • extract_pages(self, document: Document) -> Iterator[Page]

we should be fine, right? Of course I'll add testing and documentation.

Let me know if I can be assigned to this issue. Thanks!

@pmeier
Copy link
Member

pmeier commented Jan 22, 2024

I'd like to take this challenge as my first issue!

Hey Davide and welcome to Ragna 👋 Sure go ahead!

I've installed Ragna and looked at the codebase, but currently I can only work with RagnaDemo, as I don't have a vector DB or an API key for a LLM

You can install local Vector DBS, i.e. Chroma and LanceDB. Either do it manually, pip install

ragna/pyproject.toml

Lines 58 to 59 in 61e8d5f

"chromadb>=0.4.13",
"lancedb>=0.2",

or simply use the catch-all key pip install -e '.[all]'.

But yes, without an LLM API key, there is only the demo assistant for now.

we should be fine, right? Of course I'll add testing and documentation.

Yes, that is sufficient. You can implement and test the document handler without having access to everything else.

@pmeier
Copy link
Member

pmeier commented Jan 30, 2024

Closed in #296.

@pmeier pmeier closed this as completed Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed type: enhancement 💅 New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants