Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

文档解析模块可否支持doc、xls #3063

Open
SDAIer opened this issue Nov 4, 2024 · 1 comment
Open

文档解析模块可否支持doc、xls #3063

SDAIer opened this issue Nov 4, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@SDAIer
Copy link

SDAIer commented Nov 4, 2024

文档解析模块可否优化下功能,能够支持doc、xls

@SDAIer SDAIer added the bug Something isn't working label Nov 4, 2024
@SDAIer
Copy link
Author

SDAIer commented Nov 4, 2024

我通过python将doc、xls、ppt转换格式为docx xlsx pptx。通过API来调用AI。
doc和xls文件,通过API调用AI然后调用文档解析,都可以正常解析。
但是ppt转换成pptx以后(转换的pptx可以正常打开),fastgpt文档解析模块报错,如下:

[xmldom error]  invalid doc source
@#[line:0,col:undefined]
TypeError: Cannot read properties of undefined (reading 'getElementsByTagName')
    at /app/projects/app/.next/server/worker/readFile.js:34:1635
    at Array.forEach (<anonymous>)
    at g (/app/projects/app/.next/server/worker/readFile.js:34:1613)
    at async m (/app/projects/app/.next/server/worker/readFile.js:34:2176)
    at async Object.s (/app/projects/app/.next/server/worker/readFile.js:32:387)
    at async MessagePort.<anonymous> (/app/projects/app/.next/server/worker/readFile.js:34:808)
error => Cannot read properties of undefined (reading 'getElementsByTagName')

python日志如下,成功将doc、xls、ppt转换为docx\xlsx\pptx,而且转换的文件都可以正常打开。

172.19.0.5 - - [04/Nov/2024 18:49:11] "POST /process_data1 HTTP/1.1" 200 -
convert /usr/share/nginx/test/测试3.xls -> /usr/share/nginx/test//测试3.xlsx using filter : Calc Office Open XML
172.19.0.5 - - [04/Nov/2024 18:49:16] "POST /process_data1 HTTP/1.1" 200 -
convert /usr/share/nginx/test/123.ppt -> /usr/share/nginx/test//123.pptx using filter : Impress MS PowerPoint 2007 XML
172.19.0.5 - - [04/Nov/2024 18:50:21] "POST /process_data1 HTTP/1.1" 200 -


172.19.0.5 - - [04/Nov/2024 18:54:11] "POST /process_data1 HTTP/1.1" 200 -
convert /usr/share/nginx/test/22.doc -> /usr/share/nginx/test//22.docx using filter : MS Word 2007 XML
172.19.0.5 - - [04/Nov/2024 18:54:59] "POST /process_data1 HTTP/1.1" 200 -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant