Skip to content

Computer Vision

mirceatlx edited this page Mar 30, 2023 · 2 revisions

Computer vision is a field of artificial intelligence (AI) that focuses on enabling machines to interpret and analyse visual data from the world around them. In the context of precision agriculture, computer vision techniques are used to analyse and extract information from various types of data such as image or videos captured by a UAV.

Terrafarm uses computer vision to extract insights for the farmer about the crops and fields. Currently, we use computer vision for health index calculation, disease detection, and semantic segmentation to identify weed clusters, waterways, and other issues.

Semantic segmentation

Semantic segmentation is a computer vision technique that involves assigning a label or category to each pixel in an image with the goal of identifying and classifying different objects or regions within an image.

The process of semantic segmentation involves taking an input image and dividing it into different regions, where each region represents a different object or part of the image. A classifier is then trained to assign a label or category to each pixel in the image, based on the objects or regions it belongs to.

For our project, semantic segmentation is used to identify areas of interest to the farmer for different types of annotations. More specifically, we are able to recognise 9 possible problems by looking at top-down images captured by the UAV:

  • double plant
  • drydown
  • endrow
  • nutrient deficiency
  • planter skip
  • storm damage
  • water
  • waterway
  • weed cluster

Example of weed cluster semantic segmentation

weedcluster

Once you obtain such a mask, the farmer can easily pinpoint the area with a specific problem and act on it. Moreover, the mask allows for an efficient lower altitude analysis using the UAV to further investigate the problem.

DeepLabv3+

In order to perform this segmentation we use the DeepLabv3+ model.

DeepLabv3+ is a state-of-the-art neural network architecture for semantic segmentation tasks in computer vision. It is an extension of the DeepLabv3 architecture, which was introduced in 2017 by Google Research.

Compared to its predecesor, the new version employs a encoder-decoder architecture that uses skip connections to improve the spatial resolution and information loss, which is common in other segmentation models.

architecture

Encoder

The encoder part of the model consists of a pretrained network such as ResNet or Xception. Our current implementation makes use of ResNet50 architecture which is pre-trained on the ImageNet dataset. The pretrained network is used to extract low-level features from the raw input images.

Decoder

The role of the decoder is to take the features created by the encoder and upsample them to generate high-resolution segmentation images.

One of the main features of DeepLabv3+ is the usage of atrous convolutions and more specifically the new atrous spatial pyramid pooling (ASPP) module which enables the model to capture multi-scale contextual information about the image.

image

As seen in the architecture image, it uses atrous convolutions at different rates to capture features at different scales and does not allow the feature maps to become smaller and smaller.

Training data

In order to train the model, we used a publicly available dataset known as Agriculture Vision. Link to the dataset: https://www.agriculture-vision.com/agriculture-vision-2021/dataset-2021

The pipeline uses a special module which is used specifically for data cleaning and preprocessing. Once the images are preprocessed they are fed to the model in a standard training environment using https://www.tensorflow.org/ .

Each image is resized to 512x512 dimension. At the moment, we employ two different networks, one which accepts images with only 3 channels and one with 4 channels, depending on the way data is collected.

Inference

The inference has a similar structure to the training phase such that the images have the same distribution and spatial properties as the training data.

As the field image representation has big dimensions, we separate the image in 512x512 patches and infer them using the DeepLabv3+ model trained before. Once all the masks are inferred, we reconstruct the orthomosaic image representing the field.

Disease detection

Reference paper: https://arxiv.org/pdf/2009.04365.pdf

Disease detection is an important aspect of precision agriculture as it allows farmers to identify and treat crop diseases in a timely and efficient manner.

We perform disease detection using transfer learning or model fine-tuning of existing pre-trained models using Tensorflow.

At the moment we support tomato disease detection and enable the farmer to prevent up to 9 different diseases.

Training

Similar to semantic segmentation, we use the ResNet50 architecture with transfer learning to decide if the plant is healthy or not. If the plant has a certain disease, we infer exactly what type of disease we have in order to provide the farmer a way to decide what treatment to use.

Again we use publicly available datasets. For tomato diseases, the model is trained using the PlantVillage Dataset. Link to the dataset: https://www.tensorflow.org/datasets/catalog/plant_village.

Inference

The inference follows the same structure as the training procedure and is encapsulated in an Insight module.