Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support using .docx files #281

Merged
merged 11 commits into from
Jan 22, 2024
Merged

Support using .docx files #281

merged 11 commits into from
Jan 22, 2024

Conversation

paskett
Copy link
Contributor

@paskett paskett commented Jan 18, 2024

@pmeier If you could check out the test I added, let me know if you think that's sufficient.

Resolves #225

@paskett
Copy link
Contributor Author

paskett commented Jan 18, 2024

Looks like type checks failed because of the import of python-docx (imported as just docx) dependency. I could add a # type: ignore comment.

@pmeier pmeier linked an issue Jan 18, 2024 that may be closed by this pull request
Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paskett for the PR and for the discipline to add a test. We are very much aware that our test suite is let's call it "minimal" at the moment. Good thing you are not following our example. I promise, we'll get better at that. 😇

I've left a bunch of minor comments, but overall LGTM!

ragna/core/_document.py Show resolved Hide resolved
ragna/core/_document.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
tests/core/test_document.py Outdated Show resolved Hide resolved
tests/core/test_document.py Outdated Show resolved Hide resolved
tests/core/test_document.py Outdated Show resolved Hide resolved
tests/core/test_document.py Outdated Show resolved Hide resolved
@pmeier pmeier changed the title Support using .docx files (#225) Support using .docx files Jan 18, 2024
@pmeier
Copy link
Member

pmeier commented Jan 18, 2024

Looks like type checks failed because of the import of python-docx (imported as just docx) dependency. I could add a # type: ignore comment.

This is a quite common thing and the best solution here is to ignore this package globally. YOu can add it here

ragna/pyproject.toml

Lines 138 to 146 in c08a223

[[tool.mypy.overrides]]
module = [
"fitz",
"lancedb",
"param",
"pyarrow",
"sentence_transformers",
]
ignore_missing_imports = true

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
@paskett
Copy link
Contributor Author

paskett commented Jan 18, 2024

I'll get some of those suggestions committed when I have a chance later tonight (~20:00 MST)

@paskett
Copy link
Contributor Author

paskett commented Jan 19, 2024

@pmeier Got all the suggestions put in and pipeline passing, let's ship it! 😇

Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paskett. The PR is almost ready with one question left below.

tests/core/test_document.py Outdated Show resolved Hide resolved
Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @paskett, somehow my actual comment got lost.

tests/core/test_document.py Show resolved Hide resolved
Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paskett!

@pmeier pmeier merged commit 008c458 into Quansight:main Jan 22, 2024
10 checks passed
@davidedigrande davidedigrande mentioned this pull request Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] - Add support for .doc / .docx
2 participants