Skip to content

FrancescoSaverioZuppichini/glasses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Glasses 😎

alt

codecov

Compact, concise and customizable deep learning computer vision library

Models have been stored into the hugging face hub!

Doc is here

TL;DR

This library has

  • human readable code, no research code
  • common component are shared across models
  • same APIs for all models (you learn them once and they are always the same)
  • clear and easy to use model constomization (see here)
  • classification and segmentation
  • emoji in the name ;)

Stuff implemented so far:

Installation

You can install glasses using pip by running

pip install git+https://github.com/FrancescoSaverioZuppichini/glasses

Motivations

Almost all existing implementations of the most famous model are written with very bad coding practices, what today is called research code. I struggled to understand some of the implementations even if in the end were just a few lines of code.

Most of them are missing a global structure, they used tons of code repetition, they are not easily customizable and not tested. Since I do computer vision for living, I needed a way to make my life easier.

Getting started

The API are shared across all models!

import torch
from glasses.models import AutoModel, AutoTransform
# load one model
model = AutoModel.from_pretrained('resnet18').eval()
# and its correct input transformation
tr = AutoTransform.from_name('resnet18')
model.summary(device='cpu' ) # thanks to torchinfo
# at any time, see all the models
AutoModel.models_table() 
            Models                 
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Name                   ┃ Pretrained ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
β”‚ resnet18               β”‚ true       β”‚
β”‚ resnet26               β”‚ true       β”‚
β”‚ resnet26d              β”‚ true       β”‚
β”‚ resnet34               β”‚ true       β”‚
β”‚ resnet34d              β”‚ true       β”‚
β”‚ resnet50               β”‚ true       β”‚
...

Interpretability

import requests
from PIL import Image
from io import BytesIO
from glasses.interpretability import GradCam, SaliencyMap
from torchvision.transforms import Normalize
# get a cute dog 🐢
r = requests.get('https://i.insider.com/5df126b679d7570ad2044f3e?width=700&format=jpeg&auto=webp')
im = Image.open(BytesIO(r.content))
# un-normalize when done
mean, std = tr.transforms[-1].mean, tr.transforms[-1].std
postprocessing = Normalize(-mean / std, (1.0 / std))
# apply preprocessing
x =  tr(im).unsqueeze(0)
_ = model.interpret(x, using=GradCam(), postprocessing=postprocessing).show()

alt

Classification

from glasses.models import ResNet
from torch import nn
# change activation
model = AutoModel.from_pretrained('resnet18', activation = nn.SELU).eval()
# or directly from the model class
ResNet.resnet18(activation = nn.SELU)
# change number of classes
ResNet.resnet18(n_classes=100)
# freeze only the convolution weights
model = AutoModel.from_pretrained('resnet18')
model.freeze(who=model.encoder)

Get the inner features

# model.encoder has special hooks ready to be activated
# call the .features to trigger them
model.encoder.features
x = torch.randn((1, 3, 224, 224))
model(x)
[f.shape for f in model.encoder.features]

Change inner block

# what about resnet with inverted residuals?
from glasses.models.classification.efficientnet import InvertedResidualBlock
ResNet.resnet18(block = InvertedResidualBlock)

Segmentation

from functools import partial
from glasses.models.segmentation.unet import UNet, UNetDecoder
# vanilla Unet
unet = UNet()
# let's change the encoder
unet = UNet.from_encoder(partial(AutoModel.from_name, 'efficientnet_b1'))
# mmm I want more layers in the decoder!
unet = UNet(decoder=partial(UNetDecoder, widths=[256, 128, 64, 32, 16]))
# maybe resnet was better
unet = UNet(encoder=lambda **kwargs: ResNet.resnet26(**kwargs).encoder)
# same API
# unet.summary(input_shape=(1,224,224))

unet

More examples

# change the decoder part
model = AutoModel.from_pretrained('resnet18')
my_head = nn.Sequential(
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(),
    nn.Linear(model.encoder.widths[-1], 512),
    nn.Dropout(0.2),
    nn.ReLU(),
    nn.Linear(512, 1000))

model.head = my_head

x = torch.rand((1,3,224,224))
model(x).shape #torch.Size([1, 1000])

Pretrained Models

I am currently working on the pretrained models and the best way to make them available

This is a list of all the pretrained models available so far!. They are all trained on ImageNet.

I used a batch_size=64 and a GTX 1080ti to evaluale the models.

top1 top5 time batch_size
vit_base_patch16_384 0.842 0.9722 1130.81 64
vit_large_patch16_224 0.82836 0.96406 893.486 64
eca_resnet50t 0.82234 0.96172 241.754 64
eca_resnet101d 0.82166 0.96052 213.632 64
efficientnet_b3 0.82034 0.9603 199.599 64
regnety_032 0.81958 0.95964 136.518 64
vit_base_patch32_384 0.8166 0.9613 243.234 64
vit_base_patch16_224 0.815 0.96018 306.686 64
deit_small_patch16_224 0.81082 0.95316 132.868 64
eca_resnet50d 0.80604 0.95322 135.567 64
resnet50d 0.80492 0.95128 97.5827 64
cse_resnet50 0.80292 0.95048 108.765 64
efficientnet_b2 0.80126 0.95124 127.177 64
eca_resnet26t 0.79862 0.95084 155.396 64
regnety_064 0.79712 0.94774 183.065 64
regnety_040 0.79222 0.94656 124.881 64
resnext101_32x8d 0.7921 0.94556 290.38 64
regnetx_064 0.79066 0.94456 176.3 64
wide_resnet101_2 0.7891 0.94344 277.755 64
regnetx_040 0.78486 0.94242 122.619 64
wide_resnet50_2 0.78464 0.94064 201.634 64
efficientnet_b1 0.7831 0.94096 98.7143 64
resnet152 0.7825 0.93982 186.191 64
regnetx_032 0.7792 0.93996 319.558 64
resnext50_32x4d 0.77628 0.9368 114.325 64
regnety_016 0.77604 0.93702 96.547 64
efficientnet_b0 0.77332 0.93566 67.2147 64
resnet101 0.77314 0.93556 134.148 64
densenet161 0.77146 0.93602 239.388 64
resnet34d 0.77118 0.93418 59.9938 64
densenet201 0.76932 0.9339 158.514 64
regnetx_016 0.76684 0.9328 91.7536 64
resnet26d 0.766 0.93188 70.6453 64
regnety_008 0.76238 0.93026 54.1286 64
resnet50 0.76012 0.92934 89.7976 64
densenet169 0.75628 0.9281 127.077 64
resnet26 0.75394 0.92584 65.5801 64
resnet34 0.75096 0.92246 56.8985 64
regnety_006 0.75068 0.92474 55.5611 64
regnetx_008 0.74788 0.92194 57.9559 64
densenet121 0.74472 0.91974 104.13 64
deit_tiny_patch16_224 0.7437 0.91898 66.662 64
vgg19_bn 0.74216 0.91848 169.357 64
regnety_004 0.73766 0.91638 68.4893 64
regnetx_006 0.73682 0.91568 81.4703 64
vgg16_bn 0.73476 0.91536 150.317 64
vgg19 0.7236 0.9085 155.851 64
regnetx_004 0.72298 0.90644 58.0049 64
vgg16 0.71628 0.90368 135.398 64
vgg13_bn 0.71618 0.9036 129.077 64
efficientnet_lite0 0.7041 0.89894 62.4211 64
vgg11_bn 0.70408 0.89724 86.9459 64
vgg13 0.69984 0.89306 116.052 64
regnety_002 0.6998 0.89422 46.804 64
resnet18 0.69644 0.88982 46.2029 64
vgg11 0.68872 0.88658 79.4136 64
regnetx_002 0.68658 0.88244 45.9211 64

Assuming you want to load efficientnet_b1:

from glasses.models import EfficientNet, AutoModel, AutoTransform

# load it using AutoModel
model = AutoModel.from_pretrained('efficientnet_b1').eval()
# or from its own class
model = EfficientNet.efficientnet_b1(pretrained=True)
# you may also need to get the correct transformation that must be applied on the input
tr = AutoTransform.from_name('efficientnet_b1')

In this case, tr is

Compose(
    Resize(size=240, interpolation=PIL.Image.BICUBIC)
    CenterCrop(size=(240, 240))
    ToTensor()
    Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
)

Deep Customization

All models are composed by sharable parts:

  • Block
  • Layer
  • Encoder
  • Head
  • Decoder

Block

Each model has its building block, they are noted by *Block. In each block, all the weights are in the .block field. This makes it very easy to customize one specific model.

from glasses.models.classification.vgg import VGGBasicBlock
from glasses.models.classification.resnet import ResNetBasicBlock, ResNetBottleneckBlock, ResNetBasicPreActBlock, ResNetBottleneckPreActBlock
from glasses.models.classification.senet import SENetBasicBlock, SENetBottleneckBlock
from glasses.models.classification.resnetxt import ResNetXtBottleNeckBlock
from glasses.models.classification.densenet import DenseBottleNeckBlock
from glasses.models.classification.wide_resnet import WideResNetBottleNeckBlock
from glasses.models.classification.efficientnet import EfficientNetBasicBlock

For example, if we want to add Squeeze and Excitation to the resnet bottleneck block, we can just

from glasses.nn.att import SpatialSE
from  glasses.models.classification.resnet import ResNetBottleneckBlock

class SEResNetBottleneckBlock(ResNetBottleneckBlock):
    def __init__(self, in_features: int, out_features: int, squeeze: int = 16, *args, **kwargs):
        super().__init__(in_features, out_features, *args, **kwargs)
        # all the weights are in block, we want to apply se after the weights
        self.block.add_module('se', SpatialSE(out_features, reduction=squeeze))
        
SEResNetBottleneckBlock(32, 64)

Then, we can use the class methods to create the new models following the existing architecture blueprint, for example, to create se_resnet50

ResNet.resnet50(block=ResNetBottleneckBlock)

The cool thing is each model has the same api, if I want to create a vgg13 with the ResNetBottleneckBlock I can just

from glasses.models import VGG
model = VGG.vgg13(block=SEResNetBottleneckBlock)
model.summary()

Some specific model can require additional parameter to the block, for example MobileNetV2 also required a expansion parameter so our SEResNetBottleneckBlock won't work.

Layer

A Layer is a collection of blocks, it is used to stack multiple blocks together following some logic. For example, ResNetLayer

from glasses.models.classification.resnet import ResNetLayer

ResNetLayer(64, 128, depth=2)

Encoder

The encoder is what encoders a vector, so the convolution layers. It has always two very important parameters.

  • widths
  • depths

widths is the wide at each layer, so how much features there are depths is the depth at each layer, so how many blocks there are

For example, ResNetEncoder will creates multiple ResNetLayer based on the len of widths and depths. Let's see some example.

from glasses.models.classification.resnet import ResNetEncoder
# 3 layers, with 32,64,128 features and 1,2,3 block each
ResNetEncoder(
    widths=[32,64,128],
    depths=[1,2,3])

All encoders are subclass of Encoder that allows us to hook on specific stages to get the featuers. All you have to do is first call .features to notify the model you want to receive the features, and then pass an input.

enc = ResNetEncoder()
enc.features
enc(torch.randn((1,3,224,224)))
print([f.shape for f in enc.features])

Remember each model has always a .encoder field

from glasses.models import ResNet

model = ResNet.resnet18()
model.encoder.widths[-1]

The encoder knows the number of output features, you can access them by

Features

Each encoder can return a list of features accessable by the .features field. You need to call it once before in order to notify the encoder we wish to also store the features

from glasses.models.classification.resnet import ResNetEncoder

x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
enc(x)
features = enc.features # now we have all the features from each layer (stage)
[print(f.shape) for f in features]
# torch.Size([1, 64, 112, 112])
# torch.Size([1, 64, 56, 56])
# torch.Size([1, 128, 28, 28])
# torch.Size([1, 256, 14, 14])

Head

Head is the last part of the model, it usually perform the classification

from glasses.models.classification.resnet import ResNetHead


ResNetHead(512, n_classes=1000)

Decoder

The decoder takes the last feature from the .encoder and decode it. This is usually done in segmentation models, such as Unet.

from glasses.models.segmentation.unet import UNetDecoder
x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
x = enc(x)
features = enc.features
# we need to tell the decoder the first feature size and the size of the lateral features
dec = UNetDecoder(start_features=enc.widths[-1],
                  lateral_widths=enc.features_widths[::-1])
out = dec(x, features[::-1])
out.shape

This object oriented structure allows to reuse most of the code across the models

name Parameters Size (MB)
cse_resnet101 49,326,872 188.17
cse_resnet152 66,821,848 254.91
cse_resnet18 11,778,592 44.93
cse_resnet34 21,958,868 83.77
cse_resnet50 28,088,024 107.15
deit_base_patch16_224 87,184,592 332.58
deit_base_patch16_384 87,186,128 357.63
deit_small_patch16_224 22,359,632 85.3
deit_tiny_patch16_224 5,872,400 22.4
densenet121 7,978,856 30.44
densenet161 28,681,000 109.41
densenet169 14,149,480 53.98
densenet201 20,013,928 76.35
eca_resnet101d 44,568,563 212.62
eca_resnet101t 44,566,027 228.65
eca_resnet18d 16,014,452 98.41
eca_resnet18t 1,415,684 37.91
eca_resnet26d 16,014,452 98.41
eca_resnet26t 16,011,916 114.44
eca_resnet50d 25,576,350 136.65
eca_resnet50t 25,573,814 152.68
efficientnet_b0 5,288,548 20.17
efficientnet_b1 7,794,184 29.73
efficientnet_b2 9,109,994 34.75
efficientnet_b3 12,233,232 46.67
efficientnet_b4 19,341,616 73.78
efficientnet_b5 30,389,784 115.93
efficientnet_b6 43,040,704 164.19
efficientnet_b7 66,347,960 253.1
efficientnet_b8 87,413,142 505.01
efficientnet_l2 480,309,308 2332.13
efficientnet_lite0 4,652,008 17.75
efficientnet_lite1 5,416,680 20.66
efficientnet_lite2 6,092,072 23.24
efficientnet_lite3 8,197,096 31.27
efficientnet_lite4 13,006,568 49.62
fishnet150 24,960,808 95.22
fishnet99 16,630,312 63.44
mobilenet_v2 3,504,872 24.51
mobilenetv2 3,504,872 13.37
regnetx_002 2,684,792 10.24
regnetx_004 5,157,512 19.67
regnetx_006 6,196,040 23.64
regnetx_008 7,259,656 27.69
regnetx_016 9,190,136 35.06
regnetx_032 15,296,552 58.35
regnetx_040 22,118,248 97.66
regnetx_064 26,209,256 114.02
regnetx_080 34,561,448 147.43
regnety_002 3,162,996 12.07
regnety_004 4,344,144 16.57
regnety_006 6,055,160 23.1
regnety_008 6,263,168 23.89
regnety_016 11,202,430 42.73
regnety_032 19,436,338 74.14
regnety_040 20,646,656 91.77
regnety_064 30,583,252 131.52
regnety_080 39,180,068 165.9
resnest101e 48,275,016 184.15
resnest14d 10,611,688 40.48
resnest200e 70,201,544 267.8
resnest269e 7,551,112 28.81
resnest26d 17,069,448 65.11
resnest50d 27,483,240 104.84
resnest50d_1s4x24d 25,677,000 97.95
resnest50d_4s2x40d 30,417,592 116.03
resnet101 44,549,160 169.94
resnet152 60,192,808 229.62
resnet18 11,689,512 44.59
resnet200 64,673,832 246.71
resnet26 15,995,176 61.02
resnet26d 16,014,408 61.09
resnet34 21,797,672 83.15
resnet34d 21,816,904 83.22
resnet50 25,557,032 97.49
resnet50d 25,576,264 97.57
resnext101_32x16d 194,026,792 740.15
resnext101_32x32d 468,530,472 1787.3
resnext101_32x48d 828,411,176 3160.14
resnext101_32x8d 88,791,336 338.71
resnext50_32x4d 25,028,904 95.48
se_resnet101 49,292,328 188.04
se_resnet152 66,770,984 254.71
se_resnet18 11,776,552 44.92
se_resnet34 21,954,856 83.75
se_resnet50 28,071,976 107.09
unet 23,202,530 88.51
vgg11 132,863,336 506.83
vgg11_bn 132,868,840 506.85
vgg13 133,047,848 507.54
vgg13_bn 133,053,736 507.56
vgg16 138,357,544 527.79
vgg16_bn 138,365,992 527.82
vgg19 143,667,240 548.05
vgg19_bn 143,678,248 548.09
vit_base_patch16_224 86,415,592 329.65
vit_base_patch16_384 86,415,592 329.65
vit_base_patch32_384 88,185,064 336.4
vit_huge_patch16_224 631,823,080 2410.21
vit_huge_patch32_384 634,772,200 2421.46
vit_large_patch16_224 304,123,880 1160.14
vit_large_patch16_384 304,123,880 1160.14
vit_large_patch32_384 306,483,176 1169.14
vit_small_patch16_224 48,602,344 185.4
wide_resnet101_2 126,886,696 484.03
wide_resnet50_2 68,883,240 262.77

Credits

Most of the weights were trained by other people and adapted to glasses. It is worth cite