Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for extracting info from image in the docs #120

Merged
merged 10 commits into from
Jun 26, 2024

Conversation

XinyuYe-Intel
Copy link
Collaborator

@XinyuYe-Intel XinyuYe-Intel commented May 31, 2024

Description

Added support for extracting info from image in the docs.

Issues

n/a.

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

cairosvg
docx2txt
markdown
python-docx
python-pptx
unstructured

Tests

test each changed functions.

Conflicts:
	comps/dataprep/redis/README.md
	comps/dataprep/redis/langchain/requirements.txt
	comps/dataprep/redis/requirements.txt
	comps/dataprep/requirements.txt
	comps/dataprep/utils.py
@kevinintel kevinintel linked an issue Jun 17, 2024 that may be closed by this pull request
@kevinintel kevinintel requested a review from lvliang-intel June 18, 2024 08:25
@chensuyue chensuyue added this to the v0.7 milestone Jun 24, 2024
@kevinintel kevinintel requested a review from XuhuiRen June 25, 2024 08:20
@chensuyue chensuyue merged commit e237454 into main Jun 26, 2024
10 checks passed
@chensuyue chensuyue deleted the xinyuye/dataprep branch June 26, 2024 16:18
jinjunzh pushed a commit to jinjunzh/GenAIComps that referenced this pull request Jun 28, 2024
…t#120)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>
jinjunzh pushed a commit to jinjunzh/GenAIComps that referenced this pull request Jun 28, 2024
…t#120)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>
ftian1 pushed a commit that referenced this pull request Jul 4, 2024
* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* DataPrep extract info from table in the docs (#146)

* Add microservice for table extraction

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update license copyright

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* DataPrep extract info from table in the docs

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* Update prepare_doc_redis.py

* Update prepare_doc_qdrant.py

* Update prepare_doc_milvus.py

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* Remove sensitive info logs (#251)

Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* Added support for extracting info from image in the docs (#120)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* enhance statistics ut coverage (#252)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* bump version (#253)

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* support file upload feature for milvus service

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* update embedding with MOSEC_EMBEDDING_ENDPOINT

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* support file upload feature for milvus service

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* fix duplicate ci test (#256)

* fix duplicate test

Signed-off-by: chensuyue <suyue.chen@intel.com>

* for test only

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Revert "for test only"

This reverts commit a7718aa.

---------

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* DataPrep extract info from table in the docs (#146)

* Add microservice for table extraction

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update license copyright

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* DataPrep extract info from table in the docs

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* Update prepare_doc_redis.py

* Update prepare_doc_qdrant.py

* Update prepare_doc_milvus.py

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com>
Co-authored-by: Sihan Chen <39623753+Spycsh@users.noreply.github.com>
sharanshirodkar7 pushed a commit to sharanshirodkar7/GenAIComps that referenced this pull request Jul 9, 2024
…t#120)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: sharanshirodkar7 <ssharanshirodkar7@gmail.com>
yogeshmpandey pushed a commit to yogeshmpandey/GenAIComps that referenced this pull request Jul 10, 2024
…t#120)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: Yogesh Pandey <yogesh.pandey@intel.com>
yogeshmpandey pushed a commit to yogeshmpandey/GenAIComps that referenced this pull request Jul 10, 2024
* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* DataPrep extract info from table in the docs (opea-project#146)

* Add microservice for table extraction

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update license copyright

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* DataPrep extract info from table in the docs

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* Update prepare_doc_redis.py

* Update prepare_doc_qdrant.py

* Update prepare_doc_milvus.py

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* Remove sensitive info logs (opea-project#251)

Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* Added support for extracting info from image in the docs (opea-project#120)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* enhance statistics ut coverage (opea-project#252)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* bump version (opea-project#253)

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* support file upload feature for milvus service

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* update embedding with MOSEC_EMBEDDING_ENDPOINT

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* support file upload feature for milvus service

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* fix duplicate ci test (opea-project#256)

* fix duplicate test

Signed-off-by: chensuyue <suyue.chen@intel.com>

* for test only

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Revert "for test only"

This reverts commit a7718aa.

---------

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* DataPrep extract info from table in the docs (opea-project#146)

* Add microservice for table extraction

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update license copyright

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* DataPrep extract info from table in the docs

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* Update prepare_doc_redis.py

* Update prepare_doc_qdrant.py

* Update prepare_doc_milvus.py

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com>
Co-authored-by: Sihan Chen <39623753+Spycsh@users.noreply.github.com>
Signed-off-by: Yogesh Pandey <yogesh.pandey@intel.com>
dwhitena pushed a commit to predictionguard/GenAIComps that referenced this pull request Jul 24, 2024
…t#120)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: Daniel Whitenack <whitenack.daniel@gmail.com>
dwhitena pushed a commit to predictionguard/GenAIComps that referenced this pull request Jul 24, 2024
* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* DataPrep extract info from table in the docs (opea-project#146)

* Add microservice for table extraction

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update license copyright

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* DataPrep extract info from table in the docs

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* Update prepare_doc_redis.py

* Update prepare_doc_qdrant.py

* Update prepare_doc_milvus.py

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* Remove sensitive info logs (opea-project#251)

Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* Added support for extracting info from image in the docs (opea-project#120)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* enhance statistics ut coverage (opea-project#252)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* bump version (opea-project#253)

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* support file upload feature for milvus service

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* update embedding with MOSEC_EMBEDDING_ENDPOINT

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* support file upload feature for milvus service

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* fix duplicate ci test (opea-project#256)

* fix duplicate test

Signed-off-by: chensuyue <suyue.chen@intel.com>

* for test only

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Revert "for test only"

This reverts commit a7718aa.

---------

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>

* DataPrep extract info from table in the docs (opea-project#146)

* Add microservice for table extraction

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update license copyright

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* DataPrep extract info from table in the docs

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* refine

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>

* Update prepare_doc_redis.py

* Update prepare_doc_qdrant.py

* Update prepare_doc_milvus.py

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>

---------

Signed-off-by: Liangyx2 <yuxiang.liang@intel.com>
Signed-off-by: jinjunzh <jasper.zhu@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Liangyx2 <yuxiang.liang@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com>
Co-authored-by: Sihan Chen <39623753+Spycsh@users.noreply.github.com>
Signed-off-by: Daniel Whitenack <whitenack.daniel@gmail.com>
lkk12014402 pushed a commit that referenced this pull request Aug 8, 2024
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: letonghan <letong.han@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataPrep  extract info from image in the docs
3 participants