Skip to content

Commit

Permalink
added readme file for mlpmizer, perceiver io and vit (#94)
Browse files Browse the repository at this point in the history
* added readme file for mlpmizer, perceiver io  and vit

* updated readme files

* updated readme files
  • Loading branch information
mosesdaudu001 authored Sep 4, 2023
1 parent 08c8828 commit a6b9b06
Show file tree
Hide file tree
Showing 3 changed files with 258 additions and 0 deletions.
78 changes: 78 additions & 0 deletions ivy_models/mlpmixer/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo.png?raw=true#gh-light-mode-only
:width: 100%
:class: only-light

.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo_dark.png?raw=true#gh-dark-mode-only
:width: 100%
:class: only-dark


.. raw:: html

<br/>
<a href="https://pypi.org/project/ivy-models">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://badge.fury.io/py/ivy-models.svg">
</a>
<a href="https://github.com/unifyai/models/actions?query=workflow%3Adocs">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/docs.yml/badge.svg">
</a>
<a href="https://github.com/unifyai/models/actions?query=workflow%3Anightly-tests">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/nightly-tests.yml/badge.svg">
</a>
<a href="https://discord.gg/G4aR9Q7DTN">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://img.shields.io/discord/799879767196958751?color=blue&label=%20&logo=discord&logoColor=white">
</a>
<br clear="all" />

MLP-Mixer
===========

`MLP-Mixer <https://arxiv.org/abs/2105.01601>`_ is based entirely on multi-layer perceptrons (MLPs), which are a type of neural network that consists of a stack of linear layers and
non-linear activation functions.

The main idea behind MLP-Mixer is that MLPs can be used to learn spatial and channel mixing functions that can be used to extract features from images.
MLP-Mixer achieves this by stacking two types of layers. These are the patch mixing layers and the channel mixing layers.
The patch mixing layers apply MLPs to each patch of the image, independently of the other patches. This allows MLP-Mixer to learn spatial mixing functions that can
capture the relationships between different patches in the image.
The channel mixing layers on the otherhand apply MLPs to the entire image, across all channels. This allows MLP-Mixer to learn channel mixing functions that can
capture the relationships between different channels in the image.


Getting started
-----------------

.. code-block:: python
!pip install huggingface_hub
import ivy
from ivy_models.mlpmixer import mlpmixer
ivy.set_backend("torch")
# Instantiate mlpmixer model
ivy_mlpmixer = mlpmixer(pretrained=True)
The pretrained mlpmixer model is now ready to be used, and is compatible with any other PyTorch code

Citation
--------

::

@article{
title={MLP-Mixer: An all-MLP Architecture for Vision},
author={
Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner,
Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic and Alexey Dosovitskiy
},
journal={arXiv preprint arXiv:2105.01601},
year={2021}
}


@article{lenton2021ivy,
title={Ivy: Templated deep learning for inter-framework portability},
author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald},
journal={arXiv preprint arXiv:2102.02886},
year={2021}
}
104 changes: 104 additions & 0 deletions ivy_models/transformers/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo.png?raw=true#gh-light-mode-only
:width: 100%
:class: only-light

.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo_dark.png?raw=true#gh-dark-mode-only
:width: 100%
:class: only-dark


.. raw:: html

<br/>
<a href="https://pypi.org/project/ivy-models">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://badge.fury.io/py/ivy-models.svg">
</a>
<a href="https://github.com/unifyai/models/actions?query=workflow%3Adocs">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/docs.yml/badge.svg">
</a>
<a href="https://github.com/unifyai/models/actions?query=workflow%3Anightly-tests">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/nightly-tests.yml/badge.svg">
</a>
<a href="https://discord.gg/G4aR9Q7DTN">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://img.shields.io/discord/799879767196958751?color=blue&label=%20&logo=discord&logoColor=white">
</a>
<br clear="all" />

Perceiver IO
===========

`Perceiver IO <https://arxiv.org/abs/2107.14795>`_ is based on the Perceiver architecture, which was originally proposed by Google AI in 2021. Perceiver IO extends the Perceiver architecture
by adding a new module called the Querying Module. The Querying Module allows Perceiver IO to produce outputs of arbitrary size and semantics,
which makes it a more general-purpose architecture than the Perceiver.

The Perceiver IO architecture consists of three main modules. These are the reading module which takes the input data and encodes it into a latent space,
the processing module which refines the latent representation learned by the reading module and the querying module which takes the latent
representation from the Processing Module and produces outputs of arbitrary size and semantics.

The Querying Module is the key innovation of Perceiver IO. It works by first constructing a query vector for each output element.
The query vector is a representation of the desired output element, and it is constructed using the output-specific features.
The Querying Module then uses a self-attention mechanism to attend to the latent representation, and it produces the output element by combining
the latent representation with the query vector.

Getting started
-----------------

.. code-block:: python
import ivy
from ivy_models.transformers.perceiver_io import (
PerceiverIOSpec,
perceiver_io_img_classification,
)
ivy.set_backend("torch")
# params
input_dim = 3
num_input_axes = 2
output_dim = 1000
batch_shape = [1]
queries_dim = 1024
learn_query = True
network_depth = 8 if load_weights else 1
num_lat_att_per_layer = 6 if load_weights else 1
spec = PerceiverIOSpec(
input_dim=input_dim,
num_input_axes=num_input_axes,
output_dim=output_dim,
queries_dim=queries_dim,
network_depth=network_depth,
learn_query=learn_query,
query_shape=[1],
num_fourier_freq_bands=64,
num_lat_att_per_layer=num_lat_att_per_layer,
device='cuda',
)
model = perceiver_io_img_classification(spec)
The pretrained perceiver_io_img_classification model is now ready to be used!!!

Citation
--------

::

@article{
title={Perceiver IO: A General Architecture for Structured Inputs & Outputs},
author={
Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding,
Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick,
Andrew Zisserman, Oriol Vinyals and Joāo Carreira
},
journal={arXiv preprint arXiv:2107.14795},
year={2022}
}


@article{lenton2021ivy,
title={Ivy: Templated deep learning for inter-framework portability},
author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald},
journal={arXiv preprint arXiv:2102.02886},
year={2021}
}
76 changes: 76 additions & 0 deletions ivy_models/vit/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo.png?raw=true#gh-light-mode-only
:width: 100%
:class: only-light

.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo_dark.png?raw=true#gh-dark-mode-only
:width: 100%
:class: only-dark


.. raw:: html

<br/>
<a href="https://pypi.org/project/ivy-models">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://badge.fury.io/py/ivy-models.svg">
</a>
<a href="https://github.com/unifyai/models/actions?query=workflow%3Adocs">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/docs.yml/badge.svg">
</a>
<a href="https://github.com/unifyai/models/actions?query=workflow%3Anightly-tests">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/nightly-tests.yml/badge.svg">
</a>
<a href="https://discord.gg/G4aR9Q7DTN">
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://img.shields.io/discord/799879767196958751?color=blue&label=%20&logo=discord&logoColor=white">
</a>
<br clear="all" />

ViT
===========

Vision Transformer `(ViT) <https://arxiv.org/abs/2010.11929>`_ is a neural network architecture for image classification that is based on the Transformer architecture,
which was originally developed for natural language processing tasks. However,
ViT replaces the convolution layers in a convolutional neural network (CNN) with self-attention layers.

The main idea behind ViT is that an image can be represented as a sequence of image patches, and that these patches can be processed by a Transformer
in the same way that words are processed by a Transformer in a natural language processing task.
To do this, ViT first divides the image into a grid of image patches. Each patch is then flattened into a vector,
and these vectors are then stacked together to form a sequence. This sequence is then passed to a Transformer,
which learns to attend to different patches in the image in order to classify the image.


Getting started
-----------------

.. code-block:: python
import ivy
from ivy_models.vit import vit_h_14
ivy.set_backend("torch")
# Instantiate vit_h_14 model
ivy_vit_h_14 = vit_h_14(pretrained=True)
The pretrained vit_h_14 model is now ready to be used, and is compatible with any other PyTorch code

Citation
--------

::

@article{
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,
Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit and Neil Houlsby
},
journal={arXiv preprint arXiv:2010.11929},
year={2021}
}


@article{lenton2021ivy,
title={Ivy: Templated deep learning for inter-framework portability},
author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald},
journal={arXiv preprint arXiv:2102.02886},
year={2021}
}

0 comments on commit a6b9b06

Please sign in to comment.