Support libai DETR project #260

HiHippie · 2022-04-12T03:21:03Z

TODO LIST:

coco_dataset预处理
modeling
trainer
torch权重加载测试（已对齐）
eager global tensor parallel evaluation结果对齐
更libai的transformer实现，目前版本参考很多torch.nn.MultiHeadAttention
推进训练

oneflow bug和不支持算子记录

oneflow min/max op 无法在不同数据类型间执行
flow.cumsum ~~tensor.cumsum~~
~~nn.MultiHeadAttention~~
~~flow.cdist~~
flow.as_tensor从numpy array转换时无法显式指定data type
~~flow.full_like~~
for m in tensor: m[0]=False 并不会改变tensor数值
tensor.copy_()不管用
F.interpolate行为不一致
tensor.split当split_size_or_sections=[x,0]的时候有bug
~~flow.ByteStorage~~
tensor.unbind在global tensor中 *** NotImplementedError

…detr

HiHippie · 2022-04-13T02:59:41Z

from ~~flowvision~~torchvision.models._utils import IntermediateLayerGetter
不支持

这个地方flowvision不支持吗，那这边我去flowvision下更新一下然后打个tag包吧

嗯不支持的~
哈哈行~我刚准备绕一下先

好像在这里支持了 https://github.com/Oneflow-Inc/vision/blob/main/flowvision/models/layer_getter.py, 应该是文件名没有对齐23333

收到~

HiHippie · 2022-04-14T07:24:38Z

oneflow min/max op 无法在不同数据类型间执行

>>> flow.__version__
'0.8.0.dev20220411+cu102'
>>> torch.__version__
'1.11.0+cu102'

最小复现代码
以float64和float32为例，其他不同类型间同理

torch

>>> import torch
>>> x = torch.randn(5, dtype=torch.float32)
>>> y = torch.randn(5, dtype=torch.float64)
>>> torch.max(x,y)
tensor([ 1.1421,  1.2252,  0.3676,  1.0047, -0.0242], dtype=torch.float64)
>>> torch.min(x,y)
tensor([-0.4623, -0.1920, -0.8689, -0.4471, -0.2798], dtype=torch.float64)

oneflow

>>> x = flow.randn(5, dtype=flow.float32)
>>> y = flow.randn(5, dtype=flow.float64)
>>> flow.max(x,y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
oneflow._oneflow_internal.exception.Exception: 
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 139, in Dispatch<oneflow::one::Tensor>
    Dispatch<TensorTuple>(op_expr, inputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 131, in Dispatch<oneflow::one::TensorTuple>
    Dispatch(op_expr, inputs, outputs.get(), ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
    internal_->Apply(op_expr, inputs, outputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_mirrored_op_interpreter.cpp", line 139, in NaiveInterpret
    user_op_expr.InferPhysicalShapeAndDType( attrs, device_tag ... TensorMeta* { return output_tensor_metas->at(i); })
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_expr.cpp", line 445, in InferPhysicalShapeAndDType
    dtype_infer_fn_(&infer_ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/infer_util.cpp", line 54, in UnchangedDataType
    Check failed: (tensor_desc.data_type()) == (first_tensor_desc->data_type()) (3 vs 2)

>>> flow.min(x,y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
oneflow._oneflow_internal.exception.Exception: 
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 139, in Dispatch<oneflow::one::Tensor>
    Dispatch<TensorTuple>(op_expr, inputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 131, in Dispatch<oneflow::one::TensorTuple>
    Dispatch(op_expr, inputs, outputs.get(), ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
    internal_->Apply(op_expr, inputs, outputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_mirrored_op_interpreter.cpp", line 139, in NaiveInterpret
    user_op_expr.InferPhysicalShapeAndDType( attrs, device_tag ... TensorMeta* { return output_tensor_metas->at(i); })
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_expr.cpp", line 445, in InferPhysicalShapeAndDType
    dtype_infer_fn_(&infer_ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/infer_util.cpp", line 54, in UnchangedDataType
    Check failed: (tensor_desc.data_type()) == (first_tensor_desc->data_type()) (3 vs 2)

已同步至https://github.com/Oneflow-Inc/OneTeam/issues/1207

…dev_detr

HiHippie · 2022-04-20T04:29:42Z

flow.cumsum支持，tensor.cumsum不支持

>>> flow.__version__
'0.8.0.dev20220411+cu102'
>>> torch.__version__
'1.11.0+cu102'

>>> x = flow.randn(10,10,10)
>>> y = flow.cumsum(x,1)
>>> y = x.cumsum(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'oneflow._oneflow_internal.Tensor' object has no attribute 'cumsum'
>>> x = torch.randn(10,10,10)
>>> y = torch.cumsum(x,1)
>>> y = x.cumsum(1)

无法指定dtype

>>> x = flow.randn(5,5)
>>> flow.cumsum(x,dim=0)
tensor([[ 0.0508,  1.0346, -0.7175, -0.2991,  0.7678],
        [ 0.4012,  2.2157, -1.1069,  0.7856,  2.3732],
        [-0.6691,  1.7376, -0.2673,  0.8270,  2.3241],
        [ 0.6488,  2.2601, -1.5217,  1.0009,  2.4177],
        [ 1.0917,  1.9483, -1.0218, -0.4837,  3.5062]], dtype=oneflow.float32)
>>> flow.cumsum(x,dim=0,dtype=flow.float32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
oneflow._oneflow_internal.exception.Exception: 
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/api/python/functional/py_function.cpp", line 40, in ReportKwargsError
    TypeError: cumsum(): got multiple values for argument 'dim'

…detr

yuanms2 · 2022-08-24T01:21:09Z

子秋注意跟踪一下，你在detr中反馈出来的问题是不是被修复了

HiHippie · 2022-08-24T02:46:11Z

子秋注意跟踪一下，你在detr中反馈出来的问题是不是被修复了

好的袁老师

HiHippie added 3 commits April 12, 2022 11:18

add detr model

50959e1

Merge branch 'main' of https://github.com/Oneflow-Inc/libai into dev_…

4531935

…detr

add coco dataset code

439f44c

This comment was marked as duplicate.

Sign in to view

HiHippie added 2 commits April 13, 2022 17:52

add config and model

1c868c0

develop dataloader

73ab193

BBuf mentioned this pull request Apr 14, 2022

Fix min max dtype promotion Oneflow-Inc/oneflow#8021

Merged

HiHippie added 10 commits April 14, 2022 17:27

develop run_step

ed4f4a9

dev code

17e7671

dev code, fix NestedTensor bugs

b0037d0

dev detr

17b0b8d

dev detr

112e8db

Merge branch 'dev_detr' of https://github.com/Oneflow-Inc/libai into …

f3fd956

…dev_detr

dev detr

2bc78aa

Merge branch 'dev_detr' of https://github.com/Oneflow-Inc/libai into …

c568dd4

…dev_detr

fix bugs and dev backbone

46bcea7

dev transformer in detr

9efbe06

HiHippie added 6 commits April 20, 2022 12:37

dev transformer in detr

9bc81d6

dev transformer in detr

e2a855b

add detr attention

17ef44c

Merge branch 'main' of https://github.com/Oneflow-Inc/libai into dev_…

da816e0

…detr

refine code

b8b687d

dev criterion

918e01e

BBuf mentioned this pull request Apr 22, 2022

OneFlow 算子对齐 PyTorch 完备计划推进表 Oneflow-Inc/oneflow#4936

Closed

HiHippie and others added 28 commits June 27, 2022 09:52

Merge branch 'main' of github.com:Oneflow-Inc/libai into dev_detr

14710fc

fix scheduler

3031cb1

add resnet50-dc5

04a6502

refine func name

a6207d4

add resnet50-dc5

a95ddca

Merge branch 'main' of github.com:Oneflow-Inc/libai into dev_detr

db90fd2

use libai WarmupStepLR

a991cf3

Merge branch 'dev_detr' of github.com:Oneflow-Inc/libai into dev_detr

13955f5

fix TODO

557b954

Temporarily bypass the bug

ee644a9

fix todo

d2888bb

use libai parallel_cross_entropy

7deb853

avoid inplace op

e0491fe

support eager global data parallel

fc36d56

Merge branch 'main' of github.com:Oneflow-Inc/libai into dev_detr

026d88c

merge main and refine code

d3f212e

refine impl

dc91d4a

fix evaluation

43c84c7

dev ddp

4064bad

Merge branch 'main' of github.com:Oneflow-Inc/libai into dev_detr

54f6475

fix oom bugs

97f35b0

fix bugs

54bd3f0

Merge branch 'main' of github.com:Oneflow-Inc/libai into dev_detr

e0face4

fix dtype bug

e09a50e

format

b103e9c

fix ddp loss calculation bug

85fa265

format & refine

eeb6634

format & refine

d4438ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support libai DETR project #260

Support libai DETR project #260

HiHippie commented Apr 12, 2022 •

edited

Loading

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as duplicate.

HiHippie commented Apr 13, 2022

HiHippie commented Apr 14, 2022 •

edited

Loading

HiHippie commented Apr 20, 2022 •

edited

Loading

yuanms2 commented Aug 24, 2022

HiHippie commented Aug 24, 2022

Support libai DETR project #260

Are you sure you want to change the base?

Support libai DETR project #260

Conversation

HiHippie commented Apr 12, 2022 • edited Loading

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as duplicate.

This comment was marked as duplicate.

HiHippie commented Apr 13, 2022

HiHippie commented Apr 14, 2022 • edited Loading

HiHippie commented Apr 20, 2022 • edited Loading

yuanms2 commented Aug 24, 2022

HiHippie commented Aug 24, 2022

HiHippie commented Apr 12, 2022 •

edited

Loading

HiHippie commented Apr 14, 2022 •

edited

Loading

HiHippie commented Apr 20, 2022 •

edited

Loading