UTF-8 letters support #4351

lavrenkov-sketch · 2021-08-09T06:23:09Z

How to add support for UTF-8 fonts, I want to display predictions in Russian, the opencv library itself seems to be able to do this

github-actions · 2021-08-09T06:23:46Z

👋 Hello @lavrenkov-sketch, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

kinoute · 2021-08-10T11:06:02Z

I'm interested too. The plot_one_box function uses OpenCV to draw rectangles and put texts. By default OpenCV seems limited when it comes to fonts (no unicode version) but they added in version 3 a free type module to load custom true type font:

See: http://fireant.github.io/misc/2017/01/28/ttf-opencv.html

There is plot_one_box_PIL in utils/plots.py that uses PIL instead. PIL can load custom True Type fonts but I couldn't make it work with detect.py and "Arial Unicode" or another custom font. Rectangles are not displayed either.

kinoute · 2021-08-10T14:07:53Z

By the way, not only labels containing UTF-8 characters such as accents don't appear on images, the results (class_name(s)), seem to remove the accents as well. The problem doesn't appear only when calling openCV, it seems general to the project.

Example: "Société France" would become "Socit France"

glenn-jocher · 2021-08-11T15:12:13Z

@kinoute @lavrenkov-sketch we have a PIL-based plotter that might be better for UTF-8:

yolov5/utils/plots.py

Lines 82 to 94 in e96c74b

    
           def plot_one_box_PIL(box, im, color=(128, 128, 128), label=None, line_thickness=None): 
        
               # Plots one bounding box on image 'im' using PIL 
        
               im = Image.fromarray(im) 
        
               draw = ImageDraw.Draw(im) 
        
               line_thickness = line_thickness or max(int(min(im.size) / 200), 2) 
        
               draw.rectangle(box, width=line_thickness, outline=color)  # plot 
        
               if label: 
        
                   font = ImageFont.truetype("Arial.ttf", size=max(round(max(im.size) / 40), 12)) 
        
                   txt_width, txt_height = font.getsize(label) 
        
                   draw.rectangle([box[0], box[1] - txt_height + 4, box[0] + txt_width, box[1]], fill=color) 
        
                   draw.text((box[0], box[1] - txt_height + 1), label, fill=(255, 255, 255), font=font) 
        
               return np.asarray(im)

lavrenkov-sketch · 2021-08-13T05:43:07Z

@glenn-jocher how can i use plot_one_box_PIL it returns an array and doesn't draw any labels on the image

lavrenkov-sketch · 2021-08-13T08:24:07Z

im0 = plot_one_box_PIL(xyxy, im0, label=label, color=colors(c, True), line_thickness=line_thickness)

glenn-jocher · 2021-08-13T15:10:00Z

@lavrenkov-sketch I've opened up PR #4407 to assign a default line width with PIL plotting, not sure why it was missing.

Can you supply an example class name that is failing with the existing cv2 plotting function? We should update this to switch to PIL automatically on detection of UTF-8 characters.

kinoute · 2021-08-13T15:22:31Z

"Société Générale", image from W&B (same in inference, running the model through Docker Hub):

glenn-jocher · 2021-08-13T16:24:57Z

@kinoute perfect thanks! I'll see if we can merge a more comprehensive fix this weekend.

But wait in this case the cv2 label function works with a silent error of not displaying the é characters? Is this master code or did you modify this to remove these characters?

kinoute · 2021-08-13T17:05:01Z

I didn't modify anything, it comes from a training I did 3 days ago. I train with the official docker image. To avoid this problem, I temporarily removed the accents from my yaml file. Do note that accents don't show up in "results" at inference time too. It's not only a CV2 problem.

glenn-jocher · 2021-08-13T21:14:37Z

@kinoute I see correct console printout but incorrect cv2 plotting in Colab.

Can you run this code to reproduce in you environment:

# PyTorch Hub
import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
model.names[0] = 'Société Générale'

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()
results.save()
results.pandas().xyxy[0]

kinoute · 2021-08-13T22:20:34Z

@glenn-jocher It works like this because you overwrite the class name right before the inference, but I think the problem comes from the time the classes are loaded from the yaml dataset file. I was about to reproduce the issue like this:

# download yolov5 master
git clone https://github.com/ultralytics/yolov5
cd yolov5

# download coco128 dataset
chmod +x data/scripts/get_coco128.sh && ./data/scripts/get_coco128.sh

# change first class name in data/coco128.yaml from "person" to "person Société Générale"
vi data/coco128.yaml

# train
python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt —cache

After the training, we can confirm some images have accents removed, such as runs/train/exp/confusion_matrix.png and runs/train/exp/val_batch0_labels.jpg:

Now, on the detection/inference, running:

python detect.py --weights runs/train/exp/weights/best.pt --img 640 --conf 0.25 --source data/images/ --save-txt

shows the same problem too:

And finally, using the Docker Hub for inference:

 /sites/yolov5  python3
Python 3.8.9 (default, Apr  3 2021, 01:49:17)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> model = torch.hub.load('ultralytics/yolov5', 'custom', path="runs/train/exp/weights/best.pt")
Using cache found in /Users/kinoute/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2021-8-14 torch 1.7.0 CPU

Fusing layers...
Model Summary: 224 layers, 7266973 parameters, 0 gradients, 17.0 GFLOPs
Adding AutoShape...
>>> dir = 'https://ultralytics.com/images/'
>>> imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images
>>> results = model(imgs)
>>> results.print()
image 1/2: 720x1280 2 person Socit Gnrales, 2 ties
image 2/2: 1080x810 4 person Socit Gnrales, 1 bus
Speed: 536.4ms pre-process, 403.4ms inference, 2.1ms NMS per image at shape (2, 3, 640, 640)
>>> results.save()
Saved 2 images to 'runs/detect/exp3'
>>> results.pandas().xyxy[0]
         xmin        ymin         xmax        ymax  confidence  class  \
0  748.832520   38.970703  1150.665039  711.768921    0.866193      0
1  113.337891  193.695984  1127.441406  712.344666    0.676592      0
2  433.699280  433.693848   517.761475  715.654663    0.659638     27
3  986.345520  303.328003  1027.942627  420.201050    0.278434     27

                  name
0  person Socit Gnrale
1  person Socit Gnrale
2                  tie
3                  tie
>>>

We can see in the "textual" result, the accents don't appear as well.

glenn-jocher · 2021-08-14T11:47:26Z

@kinoute good news 😃! Your original issue may now be partially fixed ✅ in PR #4413. This PR removes 'ascii' encoding that had been in place when reading yaml files, so now the UTF-8 characters are saved correctly in the model names list.

Unfortunately the cv2 plotting issue remains though, which will require separate additional fixes. In this example I trained a new model with an updated coco128.yaml as in your above example. The model best.pt correctly saves the UTF-8 characters:

To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload with model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

glenn-jocher · 2021-08-14T11:49:02Z

TODO: UTF-8 image plotting fix

kinoute · 2021-08-14T11:58:18Z

I will have a look this afternoon and report to you it the fix works, thanks!

Regarding PIL UTF-8, I think the Nvidia docker image is based on Ubuntu 20.04. We could switch from Arial.ttf to another built-in font such as the Ubuntu font, which is unicode by default. But we have to find a way to deal with people that don't run Yolov5 through the docker image as well.

I think there is a way to fix CV2 too, but it will need way more work and changes.

glenn-jocher · 2021-08-14T12:10:34Z

@kinoute I think we need to merge the PIL and cv2 plotting functions into one, which accepts a use_pil boolean, which is set to True automatically if UTF characters are found. See https://stackoverflow.com/questions/54189911/cv2-imwrite-and-german-letters-%C3%A4-%C3%BC-%C3%B6

lavrenkov-sketch closed this as completed Aug 13, 2021

glenn-jocher linked a pull request Aug 13, 2021 that will close this issue

Update plots.py PIL box plotting #4407

Merged

glenn-jocher reopened this Aug 14, 2021

glenn-jocher linked a pull request Aug 14, 2021 that will close this issue

Remove encoding='ascii' #4413

Merged

glenn-jocher closed this as completed in #4413 Aug 14, 2021

glenn-jocher reopened this Aug 14, 2021

glenn-jocher added the TODO High priority items label Aug 14, 2021

glenn-jocher linked a pull request Aug 14, 2021 that will close this issue

Merge PIL and OpenCV in plot_one_box(use_pil=False) #4416

Merged

glenn-jocher mentioned this issue Aug 14, 2021

Merge PIL and OpenCV in plot_one_box(use_pil=False) #4416

Merged

glenn-jocher closed this as completed in #4416 Aug 14, 2021

glenn-jocher removed the TODO High priority items label Aug 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 letters support #4351

UTF-8 letters support #4351

lavrenkov-sketch commented Aug 9, 2021

github-actions bot commented Aug 9, 2021 •

edited by UltralyticsAssistant

Loading

kinoute commented Aug 10, 2021

kinoute commented Aug 10, 2021 •

edited

Loading

glenn-jocher commented Aug 11, 2021

lavrenkov-sketch commented Aug 13, 2021

lavrenkov-sketch commented Aug 13, 2021

glenn-jocher commented Aug 13, 2021 •

edited

Loading

kinoute commented Aug 13, 2021

glenn-jocher commented Aug 13, 2021

kinoute commented Aug 13, 2021 via email •

edited

Loading

glenn-jocher commented Aug 13, 2021

kinoute commented Aug 13, 2021 •

edited

Loading

glenn-jocher commented Aug 14, 2021 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Aug 14, 2021

kinoute commented Aug 14, 2021

glenn-jocher commented Aug 14, 2021 •

edited

Loading

UTF-8 letters support #4351

UTF-8 letters support #4351

Comments

lavrenkov-sketch commented Aug 9, 2021

github-actions bot commented Aug 9, 2021 • edited by UltralyticsAssistant Loading

Requirements

Environments

Status

kinoute commented Aug 10, 2021

kinoute commented Aug 10, 2021 • edited Loading

glenn-jocher commented Aug 11, 2021

lavrenkov-sketch commented Aug 13, 2021

lavrenkov-sketch commented Aug 13, 2021

glenn-jocher commented Aug 13, 2021 • edited Loading

kinoute commented Aug 13, 2021

glenn-jocher commented Aug 13, 2021

kinoute commented Aug 13, 2021 via email • edited Loading

glenn-jocher commented Aug 13, 2021

kinoute commented Aug 13, 2021 • edited Loading

glenn-jocher commented Aug 14, 2021 • edited by UltralyticsAssistant Loading

glenn-jocher commented Aug 14, 2021

kinoute commented Aug 14, 2021

glenn-jocher commented Aug 14, 2021 • edited Loading

github-actions bot commented Aug 9, 2021 •

edited by UltralyticsAssistant

Loading

kinoute commented Aug 10, 2021 •

edited

Loading

glenn-jocher commented Aug 13, 2021 •

edited

Loading

kinoute commented Aug 13, 2021 via email •

edited

Loading

kinoute commented Aug 13, 2021 •

edited

Loading

glenn-jocher commented Aug 14, 2021 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Aug 14, 2021 •

edited

Loading