Skip to content

Latest commit

ย 

History

History
135 lines (109 loc) ยท 7.22 KB

README.md

File metadata and controls

135 lines (109 loc) ยท 7.22 KB

๐Ÿ“ย Project Struecture (develop branch -> OCR)

final-project-level3-cv-16
โ”œโ”€ api_folder
โ”‚   โ”œโ”€ .streamlit
|   |   โ””โ”€ config.toml
โ”‚   โ”œโ”€ backend
|   |   โ”œโ”€ epillid_benchmark(cloned from Link)
|   |   โ”œโ”€ Dockerfile
|   |   โ”œโ”€ Backend.py
|   |   โ””โ”€ requirements.txt
โ”‚   โ”œโ”€ frontend
|   |   โ”œโ”€ Dockerfile
|   |   โ”œโ”€ frontend.py
|   |   โ””โ”€ requirements.txt
โ”‚   โ””โ”€ Docker
|       โ””โ”€ docker-compose.yml  
โ””โ”€ image_classification
|   โ”œโ”€ data_preprocessing
|   |   โ”œโ”€ download_pill_data.py 
|   |   โ””โ”€ normalize_pill_data.py
|   โ”œโ”€ image_concatenation
|   |   โ””โ”€ concatenation_images.py
|   โ”œโ”€ kaggle_pill_data_preprocessing
|   |   โ”œโ”€ 1_annotation_file_name_to_txt.py
|   |   โ”œโ”€ 2_edit_xml_path.py
|   |   โ””โ”€ 3_xml_to_json.py
|   โ”œโ”€ pill_excel_data
|   |   โ””โ”€ README.md
|   โ”œโ”€ .gitignore
|   โ”œโ”€ data.py
|   โ”œโ”€ dataset.py
|   โ”œโ”€ log.py
|   โ””โ”€ train.py
โ””โ”€ ocr
    โ”œโ”€ CRAFT-pytorch
    |   โ”œโ”€ file_utils.py
    |   โ””โ”€ test.py 
    โ”œโ”€ deep-text-recognition-benchmark
    |   โ”œโ”€ train.py
    |   โ”œโ”€ demo.py
    |   โ”œโ”€ train_KOR_ENG_data.py
    |   โ””โ”€ create_lmdb_dataset.py
    โ”œโ”€ text_classificaiton_exp 
    |   โ””โ”€ efficientnet_mark_text_nan.ipynb
    โ”œโ”€ text_classificaiton_evaluation
    |   โ””โ”€ text_recog_analysis.ipynb
    โ””โ”€ ocr_text_crop.py

๐Ÿค”ย How to start?

  1. Clone to CRAFT-pytroch (ยฉย ClovaAI)

    git clone https://github.com/clovaai/CRAFT-pytorch.git
    pip install -r requirements.txt
  2. Clone to deep-text-recognition-benchmark (ยฉย ClovaAI)

    git clone https://github.com/clovaai/deep-text-recognition-benchmark.git
    pip install lmdb pillow torchvision nltk natsort
  3. https://drive.google.com/file/d/1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ/view ์—์„œ pth file์„ ๋‹ค์šด ๋ฐ›์Šต๋‹ˆ๋‹ค.

  4. Cloneํ•œ CRAFT-pytroch ํด๋” ๋‚ด์— ์œ„์—์„œ ๋‹ค์šด๋ฐ›์€ pth file์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

  5. ํ•˜๋‹จ ์ฝ”๋“œ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ CRAFT-pytorch๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

    python test.py --trained_model=[weightfile] --test_folder=[folder path to test images]
  6. ํ•˜๋‹จ ์ฝ”๋“œ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ text๋ถ€๋ถ„๋งŒ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

    python ocr_text_crop.py
  7. https://www.dropbox.com/sh/j3xmli4di1zuv3s/AAArdcPgz7UFxIHUuKNOeKv_a?dl=0 ์—์„œ TPS-ResNet-BiLSTM-Attn-case-sensitive.pth file์„ ๋‹ค์šด๋กœ๋“œ ๋ฐ›์Šต๋‹ˆ๋‹ค.

  8. Clone ํ•œ deep-text-recognition-benchmark ํด๋” ๋‚ด์— saved_model ํด๋”๋ฅผ ์ƒ์„ฑ ํ›„ ์œ„์—์„œ ๋‹ค์šด๋ฐ›์€ pth file์„ ํ•ด๋‹น ๋””๋ ‰ํ† ๋ฆฌ์— ๋„ฃ์Šต๋‹ˆ๋‹ค.

  9. ํ•˜๋‹จ ์ฝ”๋“œ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ deep-text-recognition-benchmark๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

    python demo.py --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --image_folder demo_image/ --saved_model TPS-ResNet-BiLSTM-Attn.pth

๐Ÿค”์ž์‹ ๋งŒ์˜ ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ ํ•™์Šตํ•˜๊ฑฐ๋‚˜, ์˜์–ด๊ฐ€ ์•„๋‹Œ ๋ฐ์ดํ„ฐ ์…‹์„ ํ•™์Šตํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ

  1. Create your own lmdb dataset
pip install fire

# Please enter the path and file name according to the situation.
python create_lmdb_dataset.py --inputPath data/ --gtFile data/gt.txt --outputPath result/
  1. The structure of the data folder should be as below.

    data
    โ”œโ”€โ”€ gt.txt
    โ””โ”€โ”€ test
        โ”œโ”€โ”€ word_1.png
        โ”œโ”€โ”€ word_2.png
        โ”œโ”€โ”€ word_3.png
        โ””โ”€โ”€ ...
    
  2. At this time, gt.txt should be {imagepath}\t{label}\n For example

    test/word_1.png Tiredness
    test/word_2.png kills
    test/word_3.png A
    ...
    
  3. Modify โ€”select_data, โ€”batch_ratio, and opt.character (Note the following Link or train_KOR_ENG_data.py)

parser.add_argument('--character', type=str, default='0123456789abcdefghijklmnopqrstuvwxyz๊ฐ€๊ฐ๊ฐ„๊ฐ‡๊ฐˆ๊ฐ๊ฐ‘๊ฐ’๊ฐ“๊ฐ•๊ฐ–๊ฐ™๊ฐš๊ฐ›๊ฐœ๊ฐ๊ฑ€๊ฑ”๊ฑฐ๊ฑฑ๊ฑด๊ฑท๊ฑธ๊ฒ€๊ฒ๊ฒƒ๊ฒ‰๊ฒŒ๊ฒจ๊ฒฉ๊ฒช๊ฒฌ๊ฒฐ๊ฒน๊ฒฝ๊ณ๊ณ„๊ณ ๊ณก๊ณค๊ณง๊ณจ๊ณฐ๊ณฑ๊ณณ๊ณต๊ณผ๊ด€๊ด‘๊ดœ๊ดด๊ต‰๊ต๊ตฌ๊ตญ๊ตฐ๊ตณ๊ตด๊ตต๊ตถ๊ตฝ๊ถ๊ถŒ๊ท€๊ท“๊ทœ๊ท ๊ทค๊ทธ๊ทน๊ทผ๊ธ€๊ธ๊ธˆ๊ธ‰๊ธ‹๊ธ๊ธฐ๊ธด๊ธธ๊น€๊น…๊นŠ๊นŒ๊น๊นŽ๊น๊น”๊นœ๊น๊นก๊นฅ๊นจ๊บผ๊บพ๊ปŒ๊ป๊ป๊ป‘๊ป˜๊ปด๊ผฌ๊ผญ๊ผด๊ผผ๊ผฝ๊ฝ‚๊ฝƒ๊ฝ‰๊ฝค๊พธ๊พผ๊ฟ€๊ฟˆ๋€Œ๋„๋ˆ๋Š๋Œ๋“๋”๋—๋๋ผ๋‚Œ๋‚˜๋‚™๋‚š๋‚œ๋‚ ๋‚ก๋‚จ๋‚ฉ๋‚ซ๋‚ญ๋‚ฎ๋‚ฏ๋‚ฑ๋‚ณ๋‚ด๋ƒ„๋ƒ‡๋ƒ‰๋ƒ๋ƒฅ๋„ˆ๋„‰๋„Œ๋„๋„“๋„˜๋„ฃ๋„ค๋„ฅ๋„ท๋…€๋…๋…„๋…๋…•๋…ธ๋…น๋…ผ๋†€๋†ˆ๋†๋†’๋†“๋†”๋‡Œ๋‡จ๋ˆ„๋ˆˆ๋ˆ•๋‰˜๋‰ด๋Š„๋Š๋Š‘๋Š”๋Š˜๋Š™๋Šฅ๋Šฆ๋Šฌ๋‹ˆ๋‹๋‹˜๋‹ค๋‹ฅ๋‹ฆ๋‹จ๋‹ซ๋‹ฌ๋‹ญ๋‹ฎ๋‹ด๋‹ต๋‹ท๋‹น๋‹ฟ๋Œ€๋Œ๋Œ๋Œ“๋”๋•๋˜๋œ๋Ÿ๋ค๋ฅ๋ง๋ฉ๋ฎ๋ฐ๋ธ๋„๋…๋ˆ๋Œ๋•๋—๋™๋ผ๋˜๋œ๋‘๋‘‘๋‘˜๋‘ ๋‘ก๋‘ฅ๋’ค๋’ท๋“œ๋“๋“ ๋“ฃ๋“ค๋“ฌ๋“ญ๋“ฏ๋“ฑ๋””๋”ฉ๋”ช๋”ฐ๋”ฑ๋”ด๋”ธ๋•€๋•…๋•Œ๋•œ๋– ๋–ก๋–ค๋–จ๋–ป๋–ผ๋˜๋˜‘๋šœ๋šซ๋šฑ๋›ฐ๋œจ๋œฉ๋œฏ๋œฐ๋œป๋„๋ผ๋ฝ๋ž€๋žŒ๋ž๋ž‘๋ž—๋ž˜๋žœ๋žจ๋žซ๋žต๋Ÿ‰๋Ÿฌ๋Ÿญ๋Ÿฐ๋Ÿด๋Ÿผ๋Ÿฝ๋Ÿฟ๋ ๋ ‡๋ ˆ๋ ‰๋ Œ๋ ค๋ ฅ๋ จ๋ ฌ๋ ต๋ น๋ก€๋กœ๋ก๋ก ๋กฌ๋กญ๋กฏ๋ฃŒ๋ฃจ๋ฃฉ๋ฃน๋ฃป๋ค„๋ฅ˜๋ฅ™๋ฅ ๋ฅญ๋ฅด๋ฅธ๋ฆ„๋ฆ‡๋ฆŽ๋ฆฌ๋ฆญ๋ฆฐ๋ฆผ๋ฆฝ๋ฆฟ๋ง๋งˆ๋ง‰๋งŒ๋งŽ๋ง๋ง‘๋ง˜๋ง™๋ง›๋ง๋งž๋งก๋งฃ๋งค๋งฅ๋งจ๋งต๋งบ๋จธ๋จน๋จผ๋ฉ€๋ฉˆ๋ฉ‹๋ฉ๋ฉŽ๋ฉ”๋ฉ˜๋ฉฉ๋ฉฐ๋ฉด๋ฉธ๋ช…๋ช‡๋ชจ๋ชฉ๋ชฌ๋ชฐ๋ชธ๋ชน๋ชป๋ชฝ๋ฌ˜๋ฌด๋ฌต๋ฌถ๋ฌธ๋ฌป๋ฌผ๋ญ„๋ญ‡๋ญ๋ญ˜๋ญฃ๋ฏ€๋ฏธ๋ฏผ๋ฏฟ๋ฐ€๋ฐ‰๋ฐŒ๋ฐ๋ฐ‘๋ฐ”๋ฐ•๋ฐ–๋ฐ˜๋ฐ›๋ฐœ๋ฐ๋ฐŸ๋ฐค๋ฐฅ๋ฐฉ๋ฐญ๋ฐฐ๋ฐฑ๋ฑ€๋ฑƒ๋ฑ‰๋ฒ„๋ฒˆ๋ฒŒ๋ฒ”๋ฒ•๋ฒ—๋ฒ ๋ฒค๋ฒจ๋ฒผ๋ฒฝ๋ณ€๋ณ„๋ณ๋ณ‘๋ณ•๋ณด๋ณต๋ณถ๋ณธ๋ณผ๋ด„๋ด‡๋ด‰๋ตˆ๋ต™๋ถ€๋ถ๋ถ„๋ถˆ๋ถ‰๋ถ๋ถ“๋ถ•๋ถ™๋ทฐ๋ธŒ๋ธ๋ธ”๋น„๋นŒ๋น”๋น—๋นš๋น›๋น ๋นก๋นจ๋นต๋นผ๋บ๋บจ๋ป๋ป”๋ป—๋ผˆ๋ผ‰๋ฝ‘๋ฟŒ๋ฟ์˜์จ์‚ฌ์‚ญ์‚ฐ์‚ด์‚ถ์‚ผ์‚ฟ์ƒ์ƒˆ์ƒ‰์ƒŒ์ƒ์ƒค์„œ์„์„ž์„ ์„ค์„ฌ์„ญ์„ฏ์„ฑ์„ธ์„น์„ผ์…ˆ์…‹์…”์…˜์†Œ์†์†์†”์†œ์†Ÿ์†ก์†ฅ์‡„์‡ ์‡ผ์ˆ˜์ˆ™์ˆœ์ˆŸ์ˆ ์ˆจ์ˆซ์ˆญ์ˆฒ์‰ฌ์‰ฐ์‰ฝ์Šˆ์Šค์Šจ์Šฌ์Šด์Šต์Šท์Šน์‹œ์‹์‹ ์‹ฃ์‹ค์‹ซ์‹ฌ์‹ญ์‹ฏ์‹ฑ์‹ถ์‹ธ์‹น์‹ผ์Œ€์Œ์Œ“์จ์ฉ์ฐ์น์Ž„์˜์Ÿ์‘ค์“ฐ์“ด์“ธ์”€์”Œ์”จ์”ฉ์”ฌ์”น์”ป์•„์•…์•ˆ์•‰์•Š์•Œ์•“์•”์••์•—์•™์•ž์• ์•ก์•จ์•ผ์•ฝ์–€์–„์–‡์–‘์–•์–—์–˜์–ด์–ต์–ธ์–น์–ป์–ผ์—„์—…์—†์—‡์—‰์—Š์—Œ์—Ž์—์—”์—˜์—ฌ์—ญ์—ฐ์—ด์—ท์—ผ์—ฝ์—ฟ์˜์˜†์˜ˆ์˜›์˜ค์˜ฅ์˜จ์˜ฌ์˜ฎ์˜ณ์˜ท์˜น์™€์™„์™•์™œ์™ ์™ธ์™ผ์š”์š•์šฉ์šฐ์šฑ์šด์šธ์›€์›ƒ์›…์›Œ์›์›”์›จ์›ฌ์œ„์œ—์œ ์œก์œจ์œผ์œฝ์€์„์Œ์‘์˜์ด์ต์ธ์ผ์ฝ์žƒ์ž„์ž…์ž‡์žˆ์žŠ์žŽ์ž์ž‘์ž”์ž–์ž˜์ž ์žก์žฃ์žฅ์žฆ์žฌ์Ÿ์Ÿค์ €์ ์ „์ ˆ์ Š์ ์ ‘์ “์ •์ –์ œ์  ์ ฏ์ ธ์กฐ์กฑ์กด์กธ์ข€์ข์ข…์ข‹์ขŒ์ฃ„์ฃผ์ฃฝ์ค€์ค„์คŒ์ค์ค‘์ฅ์ฆˆ์ฆ‰์ฆŒ์ฆ์ฆ˜์ฆ์ง€์ง์ง„์งˆ์ง์ง‘์ง“์ง•์ง™์งš์งœ์ง์งง์งธ์จŒ์ฉŒ์ฉ์ฉ์ฉ”์ฉœ์ชฝ์ซ“์ญˆ์ญ‰์ฐŒ์ฐ์ฐข์ฐจ์ฐฉ์ฐฌ์ฐฎ์ฐฐ์ฐธ์ฐป์ฐฝ์ฐพ์ฑ„์ฑ…์ฑ”์ฑ™์ฒ˜์ฒ™์ฒœ์ฒ ์ฒฉ์ฒซ์ฒญ์ฒด์ณ์ดˆ์ด‰์ดŒ์ด›์ด์ดฌ์ตœ์ถ”์ถ•์ถ˜์ถœ์ถค์ถฅ์ถง์ถฉ์ทจ์ธ ์ธก์ธฐ์ธต์น˜์น™์นœ์น ์นจ์นซ์นญ์นด์นธ์นผ์บ„์บ์บ ์ปค์ปจ์ปฌ์ปด์ปต์ปท์ผ€์ผ“์ผœ์ฝ”์ฝ˜์ฝœ์ฝค์ฝฉ์พŒ์ฟ„์ฟ ํ€ดํฌํฐํดํผํ‚คํ‚ฌํƒ€ํƒํƒ„ํƒˆํƒ‘ํƒ“ํƒ•ํƒœํƒํƒคํ„ฐํ„ฑํ„ดํ„ธํ……ํ…Œํ…ํ…”ํ…œํ† ํ†คํ†จํ†ฑํ†ตํ‡ดํˆฌํˆดํˆผํ‰ํŠ€ํŠœํŠธํŠนํŠผํŠฟํ‹€ํ‹ˆํ‹ฐํ‹ฑํŒ€ํŒ…ํŒŒํŒŽํŒํŒ”ํŒํŒจํŒฉํŒฌํผํฝํŽ˜ํŽœํŽดํŽธํŽผํ‰ํํฌํญํฐํ‘œํ‘ธํ‘นํ’€ํ’ˆํ’ํ“จํ”„ํ”Œํ””ํ”ผํ”ฝํ•„ํ•ํ•‘ํ•˜ํ•™ํ•œํ• ํ•จํ•ฉํ•ญํ•ดํ•ตํ•ธํ–„ํ–‡ํ–‰ํ–ฅํ—ˆํ—Œํ—˜ํ—คํ—ฌํ˜€ํ˜„ํ˜ˆํ˜‘ํ˜•ํ˜œํ˜ธํ˜นํ˜ผํ™€ํ™ˆํ™‰ํ™ํ™”ํ™•ํ™˜ํ™œํ™ฉํšŒํšํšŸํšกํšจํ›„ํ›ˆํ›Œํ›”ํ›จํœ˜ํœดํ‰ํํ‘ํ”ํ˜ํ™ํกํฅํฉํฌํฐํžˆํž˜-+/.&', help='character label')

โœ๏ธย text-recognition ๋ชจ๋ธ์„ ์˜์•ฝํ’ˆ์•ˆ์ „๋‚˜๋ผ ๋ฐ์ดํ„ฐ๋กœ ํ‰๊ฐ€ํ•˜๊ธฐ

ํ•ด๋‹น ํ”„๋กœ์„ธ์Šค๋Š” ipynb ํŒŒ์ผ์— ์ •๋ฆฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

Check out this link!