Skip to content

Testing different computer vision methods (object detection, image classification) to do customized, large-scale image processing for Encylopedia of Life images.

License

Notifications You must be signed in to change notification settings

aubricot/computer_vision_with_eol_images

Repository files navigation

Computer Vision with EOL v3 Images

Testing different computer vision methods (object detection, image classification) to do customized, large-scale image processing for Encyclopedia of Life v3 database images (square, centered crops; image content tags; etc). Runs in Tensorflow 2 and Python 3.
Last updated 3 December 2024

Images a-c are hosted by Encyclopedia of Life (a. Choeronycteris mexicana, licensed under CC BY 2.0, b. Hippotion celerio, licensed under CC BY-NC-SA 3.0, c. Cuculus solitarius (left) and Cossypha caffra (right), licensed under CC BY-SA 2.0).

The Encyclopedia of Life (EOL) is an online biodiversity resource that seeks to provide information about all ~1.9 million species known to science. A goal for the latest version of EOL (v3) is to better leverage the older, less structured image content. To improve discoverability and display of EOL images, automated image processing pipelines that use computer vision with the goal of being scalable to millions of diverse images are being developed and tested.

Project Structure

Object detection for image cropping

Three object detection frameworks (Faster-RCNN Resnet 50 and either SSD/R-FCN/Faster-RCNN Inception v2 1 detection via the Tensorflow Object Detection API and YOLO v3 2 via Darkflow) were used to perform square cropping for EOL images of different groups of animals (birds, bats, butterflies & moths, beetles, frogs, carnivores, snakes & lizards) by using transfer learning and/or fine-tuning.

Frameworks differ in their speeds and accuracy: YOLO is the fastest but least accurate, while Faster RCNN is the slowest but most accurate, with MobileNet SSD and R-FCN falling somewhere in between 2 3 4. The model with the best trade-off between speed and accuracy for each group was selected to generate final cropping data for EOL images.

After detection, bounding boxes of detected animals are converted to square, centered cropping coordinates in order to standardize heterogenous image gallery displays.

  • For birds, pre-trained object detection models were used to detect birds.
  • For bats and butterflies & moths, object detection models were custom-trained to detect one class (either bats or butterflies & moths) using EOL user-generated cropping data (square coordinates around animal(s) of interest within each photo).
  • For beetles, frogs, carnivores and snakes & lizards, object detection models were custom-trained to detect all classes simultaneously using EOL user-generated cropping data.

➡️ 🌱 Click here to get started.

Demo video: Run your own images through the pre-trained EOL object detector in under 2 minutes.

Object detection results using trained multitaxa detector model displayed in a Google Colab Notebook. Image is hosted by Encyclopedia of Life (Lampropeltis californiae, licensed under CC BY-NC 4.0.

Classification for image tagging

Two classification frameworks (MobileNetSSD v2 11, Inception v3 5) were used to perform image tagging for different classes of EOL images (flowers, maps/labels/illustrations, image ratings) by using transfer learning and/or fine-tuning.

Frameworks differ in their speed and accuracy: MobileNetSSD v2 is faster, smaller, and less accurate and Inception v3 is slower, larger, and more accurate 5 6. The model with the best trade-off between speed and accuracy for each group was selected to generate final tagging data for EOL images.

While object detection includes classification and localization of the object of interest, image classification only includes the former step 7. Classification is used to identify images with flowers present, images of maps/collection labels/illustrations, and to generate image quality ratings. These tags will allow users to search for features not already included in image metadata.

  • For the flower classifier, models were trained to classify images into flower, fruit, entire, branch, stem or leaf using the PlantCLEF 2016 Image dataset as training data 8.
  • For the flower/fruit classifier, models were trained to classify images into flower/fruit or not flower/fruit using manually-sorted EOL images as training data.
  • For the image type classifier, models were trained to classify images into map, herbarium sheet, phylogeny, illustration, or none using Wikimedia commons, Flickr BHL, and EOL images as training data.
  • For the image rating classifier, models were trained to classify image quality rating classes 1-5 (worst to best) using EOL user generated training data.

➡️ 🌱 Click here to get started.

Image classification results using trained flower/fruit classification model displayed in a Google Colab Notebook. Image is hosted by Encyclopedia of Life (Leucopogon tenuicaulis, licensed under CC BY 3.0).

Object detection for image tagging

Three object detection frameworks (YOLO v3 in darknet 9, MobileNetSSD v2 10, and YOLO v4 10) were used to perform image tagging for different classes of EOL images (flowers, insects, mammals/amphibians/reptiles/birds).

Frameworks differ in their speeds and accuracy: YOLO v4 is the fastest with intermediate accuracy, MobileNetSSD v2 is intermediate speed and accuracy, and YOLO v3 is somewhere in between 10 6). The model with the best trade-off between speed and accuracy for each group was selected to generate final tagging data for EOL images.

For tagging, only the class of detected objects are kept and their locations are discarded. Object detection is used to identify plant-pollinator coocurrence, insect life stage, the presence of mammal, amphibian, reptile, or bird scat and/or footprints, and when a human (or body part, like 'hand') is present. These tags will allow users to search for features not already included in image metadata.

  • For plant-pollinator coocurrence, a model pre-trained on Google OpenImages 12 was used. EOL images are run through the model and predictions for 'Butterfly', 'Insect', 'Beetle', 'Ant', 'Bat (Animal)', 'Bird', 'Bee', or 'Invertebrate' were kept and then converted to "pollinator present" during post-processing.
  • For insect life stages, a model pre-trained on Google OpenImages 12 was used. EOL images are run through the model and predictions for 'Ant', 'Bee', 'Beetle', 'Butterfly', 'Dragonfly', 'Insect', 'Invertebrate', 'Moths and butterflies' were kept and then converted to "adult" during post-processing. Predictions for 'Caterpillar', 'Centipede', 'Worm' were converted to "juvenile" during post-processing.
  • For scat/footprint present, models were custom-trained to detect scat or footprints from EOL images, but never learned despite adjusting augmentation and model hyperparameters for many training sessions. Pipelines and datasets should be revisted in the future with different approaches.
  • For human present, a model pre-trained on Google OpenImages 12 was used. EOL images are run through the model and predictions for 'Person' or any string containing 'Human' ('Body', 'Eye', 'Head', 'Hand', 'Foot', 'Face', 'Arm', 'Leg', 'Ear', 'Eye', 'Face', 'Nose', 'Beard') were kept and then converted to "human present" during post-processing.

➡️ 🌱 Click here to get started.

Object detection for image tagging results using pre-trained plant-pollinator coocurrence model displayed in a Google Colab Notebook. Image is hosted by Flickr (another flower - insect photo! by thart2009, licensed under CC BY 2.0).

Utils
This folder contains Colab Notebooks and Google Chrome developer console scripts with useful functions for building on existing EOL computer vision pipelines or for developing your own from scratch.

Getting Started

All files in this repository are run in Google Colab*. This repository is set up so that each notebook can be run as a standalone script. It is not necessary to clone the entire repository. Instead, you can navigate project sections (ie. GitHub folders) that are interesting and directly try the notebooks for yourself! All needed files and directories are set up within the notebook.

For additional details on steps below, see the project wiki.

New to Google Colab?
Google Colaboratory is "a free cloud service, based on Jupyter Notebooks for machine-learning education and research." Notebooks run >entirely on VMs in the cloud and links to you Google Drive for accessing files. This means no local software or library installs are >requried. If running locally and using a GPU, there are several softwares that need to be installed first and take up ~10 GB and a >few workarounds are required if running on a Windows OS. Working in the cloud eliminates these problems and makes it easier to >collaborate if multiple users are on different operating systems. If you prefer to use your local machine for object detection, refer to the Tensorflow Object Detection API Tutorial.*

Data and model availability

EOL image tags and square cropping coordinates produced using these pipelines are available on Zenodo. EOL trained models are currently set to directly download within Colab Notebooks. We are in the process of adding all of our trained models to Kaggle, check EOL's Kaggle Model Zoo here. Currently, Image Type and Image Quality Rating models are available. If there is a specific model file you would like, open a feature request and we will push it to the top of our list for upload.

References

1Ren et al. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence.
2Hui 2018. Object detection: speed and accuracy comparison (Faster R-CNN, R-FCN, SSD, FPN, RetinaNet and YOLOv3). Medium. 27 March 2018.
3Redmon and Farhadi 2018. YOLOv3: An Incremental Improvement.
4Lin et al. 2015. Microsoft COCO: Common Objects in Context.
5Sandler et al. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. arXiv.
6Szegedy et al. 2015. Rethinking the Inception Architecture for Computer Vision. arXiv.
7Sharma 2019. Image Classification vs. Object Detection vs. Image Segmentation. Medium. 23 Feb 2020.
8Goeau et al. 2016. Plant identification in an open-world (LifeCLEF 2016). CEUR Workshop Proceedings.
9AlexeyAB 2020. Darknet. GitHub.
10Bochkovskiy et al. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv.
11Liu et al. 2016. SSD: Single shot multibox detector. Lecture Notes in Computer Science.
12Krasin et al. 2017. Open images: A public dataset for large-scale multi-label and multi-class image classification. GitHub.

License

Code
Code in this repository is released under the MIT license. More information is available at the Open Source Initiative.
Images
All images used in this repository and notebooks contained therein are licensed under Creative Commons. EOL content is freely available to the public. More information about re-use of content hosted by EOL is available at EOL Terms of Use and EOL API Terms of Use. Specific attribution information for EOL images used for training and testing models is available in bundle URLs containing "breakdown_download" found within notebooks.

About

Testing different computer vision methods (object detection, image classification) to do customized, large-scale image processing for Encylopedia of Life images.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published