[ENH] - Add support for .ppt / .pptx #253

nenb · 2023-12-17T17:30:18Z

Feature description

(Largely a copy/paste from #225)

Powerpoint documents are everywhere in the corporate world. We should support them out of the box. There are multiple Python packages out there that provide this functionality. We should do a light comparison of them to get a clearer picture. Optimally, we find a package that can give us page information. Not sure if that is possible in these data formats though.

Value and/or benefit

Users with corporate documents will have a lower barrier of entry.

Anything else?

No response

pmeier · 2023-12-17T21:39:29Z

Let's go for it!

davidedigrande · 2024-01-21T17:32:58Z

Hi!

I'd like to take this challenge as my first issue!

I have looked into issue #225 and #221 and I think the feature can be achieved using the library python-pptx which has similar API to python-docx, already used for #221

I've installed Ragna and looked at the codebase, but currently I can only work with RagnaDemo, as I don't have a vector DB or an API key for a LLM, but I think as long as we correctly define a DocumentHandler implementing

requirements(cls) -> list[Requirement]:
supported_suffixes(cls) -> list[str]
extract_pages(self, document: Document) -> Iterator[Page]

we should be fine, right? Of course I'll add testing and documentation.

Let me know if I can be assigned to this issue. Thanks!

pmeier · 2024-01-22T07:50:33Z

I'd like to take this challenge as my first issue!

Hey Davide and welcome to Ragna 👋 Sure go ahead!

I've installed Ragna and looked at the codebase, but currently I can only work with RagnaDemo, as I don't have a vector DB or an API key for a LLM

You can install local Vector DBS, i.e. Chroma and LanceDB. Either do it manually, pip install

ragna/pyproject.toml

Lines 58 to 59 in 61e8d5f

    
           "chromadb>=0.4.13", 
        
           "lancedb>=0.2",

or simply use the catch-all key pip install -e '.[all]'.

But yes, without an LLM API key, there is only the demo assistant for now.

we should be fine, right? Of course I'll add testing and documentation.

Yes, that is sufficient. You can implement and test the document handler without having access to everything else.

pmeier · 2024-01-30T08:39:59Z

Closed in #296.

nenb added the type: enhancement 💅 New feature or request label Dec 17, 2023

pmeier added help wanted Extra attention is needed good first issue Good for newcomers labels Dec 17, 2023

pmeier assigned davidedigrande Jan 22, 2024

davidedigrande mentioned this issue Jan 23, 2024

Pptx support #296

Merged

pmeier closed this as completed Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] - Add support for .ppt / .pptx #253

[ENH] - Add support for .ppt / .pptx #253

nenb commented Dec 17, 2023

pmeier commented Dec 17, 2023

davidedigrande commented Jan 21, 2024

pmeier commented Jan 22, 2024

pmeier commented Jan 30, 2024

[ENH] - Add support for .ppt / .pptx #253

[ENH] - Add support for .ppt / .pptx #253

Comments

nenb commented Dec 17, 2023

Feature description

Value and/or benefit

Anything else?

pmeier commented Dec 17, 2023

davidedigrande commented Jan 21, 2024

pmeier commented Jan 22, 2024

pmeier commented Jan 30, 2024