Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修复PDF旋转的BUG (Issues #2792) #2816

Merged
merged 16 commits into from
Jan 30, 2024
Merged

Conversation

songpb
Copy link
Contributor

@songpb songpb commented Jan 28, 2024

问题描述
对于有旋转的PDF(page.rotation!=0),会直接用旋转的图片进行OCR,得到的文本内容和格式均不正确,导致问答结果不理想

解决方法
修改RapidOCRPDFLoader类,检测到page.rotation!=0时,对img_list中的图片做相应角度旋转后再送入ocr

hzg0601 and others added 13 commits January 12, 2024 16:58
新功能:
- 优化 PDF 文件的 OCR,过滤无意义的小图片 by @liunux4odoo chatchat-space#2525
- 支持 Gemini 在线模型 by @yhfgyyf chatchat-space#2630
- 支持 GLM4 在线模型 by @zRzRzRzRzRzRzR
- elasticsearch更新https连接 by @xldistance chatchat-space#2390
- 增强对PPT、DOC知识库文件的OCR识别 by @596192804 chatchat-space#2013
- 更新 Agent 对话功能 by @zRzRzRzRzRzRzR
- 每次创建对象时从连接池获取连接,避免每次执行方法时都新建连接 by @Lijia0 chatchat-space#2480
- 实现 ChatOpenAI 判断token有没有超过模型的context上下文长度 by @glide-the
- 更新运行数据库报错和项目里程碑 by @zRzRzRzRzRzRzR chatchat-space#2659
- 更新配置文件/文档/依赖 by @imClumsyPanda @zRzRzRzRzRzRzR
- 添加日文版 readme by @eltociear chatchat-space#2787

修复:
- langchain 更新后,PGVector 向量库连接错误 by @HALIndex chatchat-space#2591
- Minimax's model worker 错误 by @xyhshen 
- ES库无法向量检索.添加mappings创建向量索引 by MSZheng20 chatchat-space#2688
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 28, 2024
@zRzRzRzRzRzRzR
Copy link
Collaborator

zRzRzRzRzRzRzR commented Jan 29, 2024

投到dev分支吧,下一版应该是0.3.0了

@songpb songpb changed the base branch from master to dev January 30, 2024 00:48
@songpb
Copy link
Contributor Author

songpb commented Jan 30, 2024

投到dev分支吧,下一版应该是0.3.0了

改了

@zRzRzRzRzRzRzR
Copy link
Collaborator

额,冲突了,你能解决一下冲突吗

@songpb
Copy link
Contributor Author

songpb commented Jan 30, 2024

额,冲突了,你能解决一下冲突吗

done

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR merged commit 22ee1a0 into chatchat-space:dev Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants