Skip to content

Latest commit

 

History

History
21 lines (14 loc) · 482 Bytes

README.md

File metadata and controls

21 lines (14 loc) · 482 Bytes

extract-chinese

Extract Chinese and English from 2 documents and matching them by same meaning sentences.

Getting Started

This project is a python project to extract two chinese and english sentences text from 2 PDFs. And to match the sentences by cosine score created embedding values.

pip install pdfplumber pip install nltk pip install jieba pip install sentence_transformers ...

Open python console

import nltk nltk.download('punkt')

and set some env values