Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Implement of RAM with a gradio interface #1802

Merged
merged 12 commits into from
Oct 25, 2023
Merged
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,10 @@ https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351

## What's new

🌟 v1.0.2 was released in 15/08/2023
🌟 v1.1.0 was released in 12/10/2023

Support [MFF](./configs/mff/) self-supervised algorithm and enhance the codebase. More details can be found in the [changelog](https://mmpretrain.readthedocs.io/en/latest/notes/changelog.html).

🌟 v1.0.1 was released in 28/07/2023

Fix some bugs and enhance the codebase. Please refer to [changelog](https://mmpretrain.readthedocs.io/en/latest/notes/changelog.html) for more details.
- Support Mini-GPT4 training and provide a Chinese model (based on Baichuan-7B)
- Support zero-shot classification based on CLIP.

🌟 v1.0.0 was released in 04/07/2023

Expand Down
9 changes: 3 additions & 6 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,13 +84,10 @@ https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351

## 更新日志

🌟 2023/8/15 发布了 v1.0.2 版本
🌟 2023/10/12 发布了 v1.1.0 版本

支持了 [MFF](./configs/mff/) 自监督算法,增强算法库功能。细节请参考 [更新日志](https://mmpretrain.readthedocs.io/zh_CN/latest/notes/changelog.html)。

🌟 2023/7/28 发布了 v1.0.1 版本

修复部分 bug 和增强算法库功能。细节请参考 [更新日志](https://mmpretrain.readthedocs.io/zh_CN/latest/notes/changelog.html)。
- 支持 Mini-GPT4 训练并提供一个基于 Baichuan-7B 的中文模型
- 支持基于 CLIP 的零样本分类。

🌟 2023/7/4 发布了 v1.0.0 版本

Expand Down
68 changes: 68 additions & 0 deletions configs/clip/clip_vit-base-p16_zeroshot-cls_cifar100.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
_base_ = '../_base_/default_runtime.py'

# data settings
data_preprocessor = dict(
type='MultiModalDataPreprocessor',
mean=[0.48145466 * 255, 0.4578275 * 255, 0.40821073 * 255],
std=[0.26862954 * 255, 0.26130258 * 255, 0.27577711 * 255],
to_rgb=False,
)

test_pipeline = [
dict(type='Resize', scale=(224, 224), interpolation='bicubic'),
dict(
type='PackInputs',
algorithm_keys=['text'],
meta_keys=['image_id', 'scale_factor'],
),
]

train_dataloader = None
test_dataloader = dict(
batch_size=32,
num_workers=8,
dataset=dict(
type='CIFAR100',
data_root='data/cifar100',
split='test',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
test_evaluator = dict(type='Accuracy', topk=(1, 5))

# schedule settings
train_cfg = None
val_cfg = None
test_cfg = dict()

# model settings
model = dict(
type='CLIPZeroShot',
vision_backbone=dict(
type='VisionTransformer',
arch='base',
img_size=224,
patch_size=16,
drop_rate=0.,
layer_cfgs=dict(act_cfg=dict(type='QuickGELU')),
pre_norm=True,
),
projection=dict(type='CLIPProjection', in_channels=768, out_channels=512),
text_backbone=dict(
type='CLIPTransformer',
width=512,
layers=12,
heads=8,
attn_mask=True,
),
tokenizer=dict(
type='AutoTokenizer',
name_or_path='openai/clip-vit-base-patch16',
use_fast=False),
vocab_size=49408,
transformer_width=512,
proj_dim=512,
text_prototype='cifar100',
text_prompt='openai_cifar100',
context_length=77,
)
69 changes: 69 additions & 0 deletions configs/clip/clip_vit-base-p16_zeroshot-cls_in1k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
_base_ = '../_base_/default_runtime.py'

# data settings
data_preprocessor = dict(
type='MultiModalDataPreprocessor',
mean=[0.48145466 * 255, 0.4578275 * 255, 0.40821073 * 255],
std=[0.26862954 * 255, 0.26130258 * 255, 0.27577711 * 255],
to_rgb=True,
)

test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(224, 224), interpolation='bicubic'),
dict(
type='PackInputs',
algorithm_keys=['text'],
meta_keys=['image_id', 'scale_factor'],
),
]

train_dataloader = None
test_dataloader = dict(
batch_size=32,
num_workers=8,
dataset=dict(
type='ImageNet',
data_root='data/imagenet',
split='val',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
test_evaluator = dict(type='Accuracy', topk=(1, 5))

# schedule settings
train_cfg = None
val_cfg = None
test_cfg = dict()

# model settings
model = dict(
type='CLIPZeroShot',
vision_backbone=dict(
type='VisionTransformer',
arch='base',
img_size=224,
patch_size=16,
drop_rate=0.,
layer_cfgs=dict(act_cfg=dict(type='QuickGELU')),
pre_norm=True,
),
projection=dict(type='CLIPProjection', in_channels=768, out_channels=512),
text_backbone=dict(
type='CLIPTransformer',
width=512,
layers=12,
heads=8,
attn_mask=True,
),
tokenizer=dict(
type='AutoTokenizer',
name_or_path='openai/clip-vit-base-patch16',
use_fast=False),
vocab_size=49408,
transformer_width=512,
proj_dim=512,
text_prototype='imagenet',
text_prompt='openai_imagenet_sub', # openai_imagenet, openai_imagenet_sub
context_length=77,
)
68 changes: 68 additions & 0 deletions configs/clip/clip_vit-large-p14_zeroshot-cls_cifar100.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
_base_ = '../_base_/default_runtime.py'

# data settings
data_preprocessor = dict(
type='MultiModalDataPreprocessor',
mean=[0.48145466 * 255, 0.4578275 * 255, 0.40821073 * 255],
std=[0.26862954 * 255, 0.26130258 * 255, 0.27577711 * 255],
to_rgb=False,
)

test_pipeline = [
dict(type='Resize', scale=(224, 224), interpolation='bicubic'),
dict(
type='PackInputs',
algorithm_keys=['text'],
meta_keys=['image_id', 'scale_factor'],
),
]

train_dataloader = None
test_dataloader = dict(
batch_size=32,
num_workers=8,
dataset=dict(
type='CIFAR100',
data_root='data/cifar100',
split='test',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
test_evaluator = dict(type='Accuracy', topk=(1, 5))

# schedule settings
train_cfg = None
val_cfg = None
test_cfg = dict()

# model settings
model = dict(
type='CLIPZeroShot',
vision_backbone=dict(
type='VisionTransformer',
arch='large',
img_size=224,
patch_size=14,
drop_rate=0.,
layer_cfgs=dict(act_cfg=dict(type='QuickGELU')),
pre_norm=True,
),
projection=dict(type='CLIPProjection', in_channels=1024, out_channels=768),
text_backbone=dict(
type='CLIPTransformer',
width=768,
layers=12,
heads=12,
attn_mask=True,
),
tokenizer=dict(
type='AutoTokenizer',
name_or_path='openai/clip-vit-large-patch14',
use_fast=False),
vocab_size=49408,
transformer_width=768,
proj_dim=768,
text_prototype='cifar100',
text_prompt='openai_cifar100',
context_length=77,
)
69 changes: 69 additions & 0 deletions configs/clip/clip_vit-large-p14_zeroshot-cls_in1k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
_base_ = '../_base_/default_runtime.py'

# data settings
data_preprocessor = dict(
type='MultiModalDataPreprocessor',
mean=[0.48145466 * 255, 0.4578275 * 255, 0.40821073 * 255],
std=[0.26862954 * 255, 0.26130258 * 255, 0.27577711 * 255],
to_rgb=True,
)

test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(224, 224), interpolation='bicubic'),
dict(
type='PackInputs',
algorithm_keys=['text'],
meta_keys=['image_id', 'scale_factor'],
),
]

train_dataloader = None
test_dataloader = dict(
batch_size=32,
num_workers=8,
dataset=dict(
type='ImageNet',
data_root='data/imagenet',
split='val',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
test_evaluator = dict(type='Accuracy', topk=(1, 5))

# schedule settings
train_cfg = None
val_cfg = None
test_cfg = dict()

# model settings
model = dict(
type='CLIPZeroShot',
vision_backbone=dict(
type='VisionTransformer',
arch='large',
img_size=224,
patch_size=14,
drop_rate=0.,
layer_cfgs=dict(act_cfg=dict(type='QuickGELU')),
pre_norm=True,
),
projection=dict(type='CLIPProjection', in_channels=1024, out_channels=768),
text_backbone=dict(
type='CLIPTransformer',
width=768,
layers=12,
heads=12,
attn_mask=True,
),
tokenizer=dict(
type='AutoTokenizer',
name_or_path='openai/clip-vit-large-patch14',
use_fast=False),
vocab_size=49408,
transformer_width=768,
proj_dim=768,
text_prototype='imagenet',
text_prompt='openai_imagenet_sub', # openai_imagenet, openai_imagenet_sub
context_length=77,
)
6 changes: 3 additions & 3 deletions docker/serve/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
ARG PYTORCH="1.12.1"
ARG CUDA="11.3"
ARG PYTORCH="2.0.1"
ARG CUDA="11.7"
ARG CUDNN="8"
FROM pytorch/torchserve:latest-gpu

ARG MMPRE="1.0.2"
ARG MMPRE="1.1.0"

ENV PYTHONUNBUFFERED TRUE

Expand Down
22 changes: 22 additions & 0 deletions docs/en/notes/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# Changelog (MMPreTrain)

## v1.1.0(12/10/2023)

### New Features

- [Feature] Implement of Zero-Shot CLIP Classifier ([#1737](https://github.com/open-mmlab/mmpretrain/pull/1737))
- [Feature] Add minigpt4 gradio demo and training script. ([#1758](https://github.com/open-mmlab/mmpretrain/pull/1758))

### Improvements

- [Config] New Version of config Adapting MobileNet Algorithm ([#1774](https://github.com/open-mmlab/mmpretrain/pull/1774))
- [Config] Support DINO self-supervised learning in project ([#1756](https://github.com/open-mmlab/mmpretrain/pull/1756))
- [Config] New Version of config Adapting Swin Transformer Algorithm ([#1780](https://github.com/open-mmlab/mmpretrain/pull/1780))
- [Enhance] Add iTPN Supports for Non-three channel image ([#1735](https://github.com/open-mmlab/mmpretrain/pull/1735))
- [Docs] Update dataset download script from opendatalab to openXlab ([#1765](https://github.com/open-mmlab/mmpretrain/pull/1765))
- [Docs] Update COCO-Retrieval dataset docs. ([#1806](https://github.com/open-mmlab/mmpretrain/pull/1806))

### Bug Fix

- Update `train.py` to compat with new config.
- Update OFA module to compat with the latest huggingface.
- Fix pipeline bug in ImageRetrievalInferencer.

## v1.0.2(15/08/2023)

### New Features
Expand Down
2 changes: 1 addition & 1 deletion docs/en/notes/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ and make sure you fill in all required information in the template.

| MMPretrain version | MMEngine version | MMCV version |
| :----------------: | :---------------: | :--------------: |
| 1.0.2 (main) | mmengine >= 0.8.3 | mmcv >= 2.0.0 |
| 1.1.0 (main) | mmengine >= 0.8.3 | mmcv >= 2.0.0 |
| 1.0.0 | mmengine >= 0.8.0 | mmcv >= 2.0.0 |
| 1.0.0rc8 | mmengine >= 0.7.1 | mmcv >= 2.0.0rc4 |
| 1.0.0rc7 | mmengine >= 0.5.0 | mmcv >= 2.0.0rc4 |
Expand Down
2 changes: 1 addition & 1 deletion docs/zh_CN/notes/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

| MMPretrain 版本 | MMEngine 版本 | MMCV 版本 |
| :-------------: | :---------------: | :--------------: |
| 1.0.2 (main) | mmengine >= 0.8.3 | mmcv >= 2.0.0 |
| 1.1.0 (main) | mmengine >= 0.8.3 | mmcv >= 2.0.0 |
| 1.0.0 | mmengine >= 0.8.0 | mmcv >= 2.0.0 |
| 1.0.0rc8 | mmengine >= 0.7.1 | mmcv >= 2.0.0rc4 |
| 1.0.0rc7 | mmengine >= 0.5.0 | mmcv >= 2.0.0rc4 |
Expand Down
2 changes: 1 addition & 1 deletion mmpretrain/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from .version import __version__

mmcv_minimum_version = '2.0.0'
mmcv_maximum_version = '2.1.0'
mmcv_maximum_version = '2.2.0'
mmcv_version = digit_version(mmcv.__version__)

mmengine_minimum_version = '0.8.3'
Expand Down
1 change: 1 addition & 0 deletions mmpretrain/apis/image_retrieval.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ def build_dataloader(dataset):
# A config of dataset
from mmpretrain.registry import DATASETS
test_pipeline = [dict(type='LoadImageFromFile'), self.pipeline]
prototype.setdefault('pipeline', test_pipeline)
dataset = DATASETS.build(prototype)
dataloader = build_dataloader(dataset)
elif isinstance(prototype, DataLoader):
Expand Down
Loading