Modify sbp.split()'s karg: axis to dim #8411

lixiang007666 · 2022-06-13T07:26:17Z

This PR is done:

flow.sbp.split() 的关键字参数由 axis 修改为 dim，并向前兼容 axis，需要做下面的改动(只改动相关接口参数名字，C++代码中的名字不变，比如还是用 split_parallel().axis()获取维度)：

python 层和接口改动：

修改 split_sbp() 方法的参数。
API 文档中 axis 修改为 dim。
自动测试中涉及到 axis 关键字参数的地方改动为 dim。
TODO

C++ 层改动：

修改 SbpToString 方法输出的 split 信息。
TODO

该 PR 来源的 issue：https://github.com/Oneflow-Inc/OneTeam/issues/897#issuecomment-1138109022

python/oneflow/framework/distribute.py

github-actions · 2022-06-14T02:36:14Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8411/

python/oneflow/framework/distribute.py

hjchen2 · 2022-06-14T07:47:24Z

python/oneflow/framework/distribute.py

+                    "split_sbp() got an unexpected keyword argument '%s'." % key
+                )
+
+    assert type(dim) is int


最好用isinstance

…Inc/oneflow into sbp_split_axis_to_sbp_split_dim

github-actions · 2022-06-14T18:03:23Z

Speed stats:

github-actions · 2022-06-14T19:07:35Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8411/

github-actions · 2022-06-14T19:17:27Z

Speed stats:

GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.8ms (= 12984.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.7ms (= 14266.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 142.7ms / 129.8ms)

OneFlow resnet50 time: 77.4ms (= 7736.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 82.6ms (= 8260.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.07 (= 82.6ms / 77.4ms)

OneFlow resnet50 time: 50.5ms (= 10091.5ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 53.9ms (= 10770.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.07 (= 53.9ms / 50.5ms)

OneFlow resnet50 time: 41.6ms (= 8316.9ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.8ms (= 8764.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.05 (= 43.8ms / 41.6ms)

OneFlow resnet50 time: 36.8ms (= 7367.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.4ms (= 7479.6ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.02 (= 37.4ms / 36.8ms)

OneFlow swin dataloader time: 0.236s (= 47.263s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.011s / 200, num_workers=1)
Relative speed: 0.635 (= 0.150s / 0.236s)

OneFlow swin dataloader time: 0.064s (= 12.767s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.344s / 200, num_workers=4)
Relative speed: 0.654 (= 0.042s / 0.064s)

OneFlow swin dataloader time: 0.036s (= 7.256s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.563s / 200, num_workers=8)
Relative speed: 0.629 (= 0.023s / 0.036s)

❌ OneFlow resnet50 time: 147.4ms (= 14739.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 172.9ms (= 17290.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 172.9ms / 147.4ms)

OneFlow resnet50 time: 95.4ms (= 9542.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 111.7ms (= 11171.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 111.7ms / 95.4ms)

OneFlow resnet50 time: 73.3ms (= 14658.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.2ms (= 17633.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 88.2ms / 73.3ms)

OneFlow resnet50 time: 59.6ms (= 11916.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.8ms (= 14959.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 74.8ms / 59.6ms)

OneFlow resnet50 time: 55.0ms (= 10991.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.0ms (= 14607.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 73.0ms / 55.0ms)

github-actions · 2022-06-14T21:58:52Z

Speed stats:

GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.7ms (= 12967.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.5ms (= 14355.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 143.5ms / 129.7ms)

OneFlow resnet50 time: 75.7ms (= 7574.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.7ms (= 8572.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 85.7ms / 75.7ms)

OneFlow resnet50 time: 48.6ms (= 9710.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.3ms (= 11655.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.20 (= 58.3ms / 48.6ms)

OneFlow resnet50 time: 42.6ms (= 8524.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.1ms (= 9025.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.06 (= 45.1ms / 42.6ms)

OneFlow resnet50 time: 37.9ms (= 7581.4ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.1ms (= 7426.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 0.98 (= 37.1ms / 37.9ms)

OneFlow swin dataloader time: 0.252s (= 50.313s / 200, num_workers=1)
PyTorch swin dataloader time: 0.152s (= 30.498s / 200, num_workers=1)
Relative speed: 0.606 (= 0.152s / 0.252s)

OneFlow swin dataloader time: 0.106s (= 21.170s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 8.071s / 200, num_workers=4)
Relative speed: 0.381 (= 0.040s / 0.106s)

OneFlow swin dataloader time: 0.036s (= 7.191s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.493s / 200, num_workers=8)
Relative speed: 0.625 (= 0.022s / 0.036s)

❌ OneFlow resnet50 time: 145.6ms (= 14558.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 170.0ms (= 17003.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 170.0ms / 145.6ms)

OneFlow resnet50 time: 95.5ms (= 9554.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 111.4ms (= 11144.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 111.4ms / 95.5ms)

OneFlow resnet50 time: 71.3ms (= 14266.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.4ms (= 17673.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 88.4ms / 71.3ms)

OneFlow resnet50 time: 61.0ms (= 12206.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.9ms (= 14775.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 73.9ms / 61.0ms)

OneFlow resnet50 time: 54.3ms (= 10855.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.2ms (= 13842.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 69.2ms / 54.3ms)

* Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b7, reversing changes made to baadf60. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Fix eval error in FusedMLP (#8413) Fix eval error * Init NCCL communicator in graph mode unifiedly (#8263) * centralized comm init * address review * revert * rename * ref nccl logical send recv * fix cpu only Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix dim_scatter 0-dim tensor bug (#8418) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * target based external libraries (#8421) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refine hardcoded attr setting/getting in ir (#8420) * use names in trait static func * more changes on op name attr * use wrapped func * Replace cu115 with cu116 in nightly (#8423) update workflows Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix repeat interleave 0-size tensor bug (#8414) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Autotest support print input in ci (#8383) * support print tensor value in autotest to provide more details in ci * revert * refine * auto format by CI * control precision to 1e-5 when record * fix bug * auto format by CI * relax tensor_size_mb * fix bug * fix bug * refine * releax * refinew * refine * fix bug * relax * refine * restruct * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Modify sbp.split()'s karg: axis to dim (#8411) * Modify sbp.split()'s axis karg to dim * Refine * Refine * Refine * Refine * Feat/graph logical op debug repr (#8131) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * add module config * save nn.Module info in job.proto for better debugging * add new line * add ModuleBlock.ops_proto() API * zero use stage 2 * print operators' info when print ModuleBlock * handle VariableOpConf * update * update * fix * move operators repr method to graph util * add limit consumer api * add new api * refine zero s select * add module block * fix * refact for rm op in module conf * fix * add sbp debug * add sbp repr * add shape * refine * add sys op in repr * add full op debug * fix index out of range * rm zero limit on device type * add no scope op to graph * zero test with activation checkpointing * fix order * add indentity when dp sequence len is 1 * add debug repr * refine repr of op * refine and fix * rm useless log * move to base with master * fix * fix * fix * fix proto * refine test * fix type * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * refine * restore test * refine pass and mem debug * merge master * repr dtype * add placement * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI * fix merge * auto format by CI * auto format by CI * refine get job api * refine graph util import order * auto format by CI * fix static check * auto format by CI * fix special case * refine level print and add full dtype repr * rm useless Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Cijie Xia <xiacijie1998@163.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * rm some test case in test_fused_dot_feature_interaction_pooling_sum (#8425) rm some case in test * Remove unused linkages (#8426) remove unused linkages * refactor stride (#8402) * Stride inherits DimVector Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix argument type of OFStrideToNumpyStride Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Move Tensor.__setitem__ and global related api to Python/C api (#8375) * add local_to_global, global_to_global, to_global. global_to_global still have bugs * fix bug of global_to_global * remove python api * add setitem * remove local_to_global sbp pack, format code * format code * remove redundant code * add error msg, refine check of to_global * fix bug of check * add error msg * fix clang static check error * remove useless api in tensor.py, remove redundant code, remove useless CHECK * add to_local * fix wrong exception type in unittest for to_local exception message * cuda add default error msg (#8427) default error Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Refactor ShapeView (#8422) * update Signed-off-by: daquexian <daquexian566@gmail.com> * update and add docs Signed-off-by: daquexian <daquexian566@gmail.com> * turn on view slice (#8302) * turn_on_view_slice * inplace scalar math hnandle non-contiguous input * fix clang check * add docs * refactor * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Add flow env init rdma api (#8415) * add_flow_env_init_rdma_api * adjust persistent_workers logic for RDMA support * adjust persistent_workers logic for RDMA support * add rmda_inited api * minro fix * add docs * Update python/oneflow/utils/data/dataloader.py Co-authored-by: daquexian <daquexian566@gmail.com> * fix typo * refine * fix RDMAIsInitialized * minor fix * refine * rename InitRdma to InitRDMA * refine Co-authored-by: Flowingsun007 <flowingsun007@163.com> Co-authored-by: daquexian <daquexian566@gmail.com> * add 1d send recv in nccl logical (#8355) * add 1d send recv in nccl logical * Update insert_nccl_logical_op_pass.cpp * auto format by CI Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support iree ci (#8419) * create mlir cpu and modify build gcc 7 shell script * fix the bug of test_iree_resnet.py cuda test in cpu version error * fix constant folding tests * suport oneflow_test_cpu_only * pub * build script add flag * modify test yml * add python3 into \PATH * don't use pretrain model * install flowvision Co-authored-by: mosout <mosout@qq.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> * Feat straighten task nodes (#8347) * Add a fast topological traversal * Add an initial implementation of straighen nodes * Add the straighen nodes algorithm * Change algorithm structure * Remove some debug information * Finalize the straighten algorithm after deciding the parameters by experiments * Notify the usage of straighten algorithm * Of format * Update oneflow/core/graph/straighten_nodes.cpp Of format Co-authored-by: daquexian <daquexian566@gmail.com> * Of format * Stop using visual string before we find a better key * Remove magic numbers and Of format * Remove starts * Of format * Fix a bug of using GetMaxVal<int32_t>() as an initial number for comparing * Refactor add straighten algo interface (#8435) * feat(*): export straighten nodes algorithm inferface * export documentation * Update python/oneflow/nn/graph/graph_config.py Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> * Use TopoForEachNodeFast as default. (#8436) * Use TopoForEachNodeFast as default. Rename the original one as TopoForEachNodeDynamic * Speed up TopoForEachNodeFast when traversing a subgraph * Rename the switch and code clean up * Hide the class TopoStruct * Hide all the other functions * Grammar * Of format Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * Refactor NLLLoss to support split class dim (#8380) * refactor * RuntimeError * avoid atomic add * test * fixes * update test * update test * update test * fix kernel * improve backward * update test * out_weight to be required * address static analysis errer * fix static analysis error * fix static analysis error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Strict ordering in memory reuse algorithm (#8441) * Support broadcast in fused_softmax kernel (#8321) * support broadcast * refine * Remove shape check * fix sbp when broadcast * rollback softmax grad threshold * increase threshold of test conv bn folding * tol to 1e-2 * check error msg of fuse softmax ops * add more dispatch * remove double datatype test and add broadcast test Co-authored-by: cheng cheng <472491134@qq.com> * Merge slice and logical slice (#8416) * remove Slice, SliceUpdate, SliceGrad op * rename logical_slice to slice and logical_slice_assign to slice_update * move gradient_func logical_slice.cpp to slice.cpp * fix some bug and refine local test * feat(SliceUpdate): support 0size tensor * test(Slice): refine consistent slice test * test(SliceUpdate): refine consistent slice_update test * not export slice_update's inplace parameter * auto format by CI * recovery slice_grad_op * fix slice_view bug * add error message and attr judgement * modified old test * auto format by CI * update test README * update tensor_string code * fix test bug * auto format by CI * fix(hsplit): hsplit functor bug * fix vsplit doc test bug * refine * fix test * fix pin_memory bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Graph block.config.set_stage() for recommended Pipeline api. (#8442) * Graph block.config.set_stage() for recommended Pipeline api. * revert diff * refine api doc Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Update PolynomialLR's doc and paramater (#8430) * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * rename the steps to decay_batch in parameters * update PolynomialLR test case Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add mv op (#8445) * add mv op with bug that Int is incompatible * add test * update test_mv.py * fix based on comments * fix based on comments Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * enable oneflow_iree(python package) and corresponding test works in ci (#8431) * update test.yml * add pytest for oneflow_iree examples * add oneflow frontend test * Dev tensor is pinned api (#8447) * support tensor.is_pinned * add test case * add docs * auto format by CI * refine * auto format by CI * refine * auto format by CI * refine * refine * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Nd sbp tensor str (#8458) * nd sbp tensor str * add nd sbp tensor str test * bigger input size * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Patch sbp cost (#8378) * Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Add the slight penalty for eager * Consider B -> (B, B) for a scalar * Do not consider parallel description in priority ratio * Of format * Fix a bug in the old version group boxing with 2D SBP (#8448) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Fix bugs of patch-sbp_cost (#8456) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Reduce to uniform B for 1 device. Use the actual parallel description for each tensor * Fix a bug of fix-group_boxing-bug * Group boxing reduce [2, 2]: (S0, S0) to [4]: S0, then we might infer a 1D SBP from a 2D SBP hint Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: cheng cheng <472491134@qq.com> * Decouple stream and instruction (#7607) * remove deprecated python api * backup code * backup code * fix compiler complaints * fix typo in refactoring * kMockDevice * add unit test test_mock.py * revert mock kernels * vert DEVICE_TYPE_SEQ * mock placement * address pr comments * register device kCriticalSectionDevice and kLazyJobLauncher * kControlDevice * Stream::vm_stream_ * fix compiler complaints * backup code * rename StreamIsTransport to IsCommNetStream * decouple vm::StreamType and vm::InstructionType * fix compiler complaints * remove 'gpu' related code * address static analyzer complaints * address static analyzer complaints * remove unused module in test_mock.py * the Env is never destroyed. * export Env into python * more unittests * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * fix oneflow.placement.__str__ * revert GlobalSync * init_producer_stream in oneflow.from_numpy * debug code for vm * init disable_vm_threads_ in VirtualMachine::VirtualMachine * Update oneflow/core/vm/virtual_machine.h Co-authored-by: daquexian <daquexian566@gmail.com> * create stream in forked subprocesses. * refactor StreamRoleSwitch to StreamRoleVisistor * ThreadLocalGuard * auto format by CI * fix compiler complaints * fix static analyzer complaints * VirtualMachine::GetVmStream * fix static analyzer complaints * reimplement AddAndReadVector by std::deque * reimplement AddAndReadVector * merge master * increase atol for test_consistent_rnn_cell.py * StreamRole::AsyncLaunchedCommNet is bound to EventRecordedCudaStreamType * auto format by CI * remove StreamRoleVisitor<T>::VisitInvalid * no copy in AddAndReadVector * fix bug of AddAndReadVector::size_ * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix AddAndReadVector::GetGranularity * remove bad unittest * auto format by CI * rename CallInstructionType to OpCallInstructionType * static variable GlobalSingletonPtr is a unique_ptr * replace ++atomic_cnt with atomic_cnt.fetch_add(1, std::memory_order_relaxed) * AddAndReadVector::operator[] * change comments 'lock free' to 'thread safe' * rename StatefulLocalOpKernel to StatefulOpKernel * rename VirtualMachine::vm_ to VirtualMachine::engine_ * mark VirtualMachine::NoMoreErasedInstructions private * mark VirtualMachine::FindOrCreateScheduleLocalDepObject private * remove unused version of VirtualMachineEngine::Receive * rename argname for VirtualMachineEngine::Receive * rename unused PendingInstructionList * rename AddAndReadVector to SteadyVector * optimize SteadyVector::operator[] by __builtin_clzll * refactor SteadyVector::granularity2vector_ to SteadyVector::granularity2data_ * reduce usage of steady_vector::size_ * rename unused anounymous namespace * greater atol for test_consistent_tensordot.py * fix BarrierInstructionType::ComputeInFuseMode * revert container_util.h * run AccessBlobByCallback in default stream of tensor->device * reslove static check * reslove static check * SteadyVector::MutableOrAdd Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> * fix_tensor_numpy_to_avoid_gpu_mem_increase (#8449) * fix_tensor_numpy_to_avoid_gpu_mem_increase * Update tensor.py * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Rename user op tensor shape to shape view (#8433) * ThreadLocalGuard * rename user_op::Tensor::shape to user_op::Tensor::shape_view * auto format by CI * fix static analyzer complaints * more verbose code for HobDataType * larger timeout * larger timeout Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> * speedup global test (#8468) * speedup global test * Test refine slice ops test (#8471) * refine consistent_slice test from 112s -> 30s in 4 device * test(SliceUpdate): refine test from 119s -> 28s in 4 device * delete useless code * auto format by CI Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: wyg1997 <wangyinggang@foxmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Set the minimum mtu value for IB communication connection (#8451) * Set the minimum mtu value for IB communication connection * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Merge branch 'master' into feat-general_basic_communication Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: ZZK <359521840@qq.com> Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Yao Zihang <1162526220@qq.com> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com> Co-authored-by: Cijie Xia <xiacijie1998@163.com> Co-authored-by: Jia <basicv8vc@gmail.com> Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: wyg1997 <wangyinggang@foxmail.com> Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>

* Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Fix a slight bug * Add at most 1 middle node for general basic communication * Add the cost for general basic communication * Add the slight penalty for eager * Skip initialization of boxing collector if not needed * Fix a bug * Dev nd nccl send recv boxing (#8467) * nd nccl_send_recv_boxing * rm print * support num_axes > 2 * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e5fa20b97229d6b0be46a7ff814dc8cd83. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b73a8041463e5e3d130784be386ee248bd8, reversing changes made to baadf6045f2fce69c090e442a755229c1c949773. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196d8530ffd2b9bf781abcf168b94ff9ca41. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * nccl send/recv support different placement * refine * auto format by CI * rm out ctrl * auto format by CI Co-authored-by: guo-ran <360112263@qq.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: ZZK <359521840@qq.com> Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Yao Zihang <1162526220@qq.com> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> * Support different hierarchy * Merge branch 'master' into feat-general_basic_communication (#8477) * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e5fa20b97229d6b0be46a7ff814dc8cd83. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b73a8041463e5e3d130784be386ee248bd8, reversing changes made to baadf6045f2fce69c090e442a755229c1c949773. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196d8530ffd2b9bf781abcf168b94ff9ca41. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Fix eval error in FusedMLP (#8413) Fix eval error * Init NCCL communicator in graph mode unifiedly (#8263) * centralized comm init * address review * revert * rename * ref nccl logical send recv * fix cpu only Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix dim_scatter 0-dim tensor bug (#8418) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * target based external libraries (#8421) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refine hardcoded attr setting/getting in ir (#8420) * use names in trait static func * more changes on op name attr * use wrapped func * Replace cu115 with cu116 in nightly (#8423) update workflows Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix repeat interleave 0-size tensor bug (#8414) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Autotest support print input in ci (#8383) * support print tensor value in autotest to provide more details in ci * revert * refine * auto format by CI * control precision to 1e-5 when record * fix bug * auto format by CI * relax tensor_size_mb * fix bug * fix bug * refine * releax * refinew * refine * fix bug * relax * refine * restruct * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Modify sbp.split()'s karg: axis to dim (#8411) * Modify sbp.split()'s axis karg to dim * Refine * Refine * Refine * Refine * Feat/graph logical op debug repr (#8131) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * add module config * save nn.Module info in job.proto for better debugging * add new line * add ModuleBlock.ops_proto() API * zero use stage 2 * print operators' info when print ModuleBlock * handle VariableOpConf * update * update * fix * move operators repr method to graph util * add limit consumer api * add new api * refine zero s select * add module block * fix * refact for rm op in module conf * fix * add sbp debug * add sbp repr * add shape * refine * add sys op in repr * add full op debug * fix index out of range * rm zero limit on device type * add no scope op to graph * zero test with activation checkpointing * fix order * add indentity when dp sequence len is 1 * add debug repr * refine repr of op * refine and fix * rm useless log * move to base with master * fix * fix * fix * fix proto * refine test * fix type * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * refine * restore test * refine pass and mem debug * merge master * repr dtype * add placement * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI * fix merge * auto format by CI * auto format by CI * refine get job api * refine graph util import order * auto format by CI * fix static check * auto format by CI * fix special case * refine level print and add full dtype repr * rm useless Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Cijie Xia <xiacijie1998@163.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * rm some test case in test_fused_dot_feature_interaction_pooling_sum (#8425) rm some case in test * Remove unused linkages (#8426) remove unused linkages * refactor stride (#8402) * Stride inherits DimVector Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix argument type of OFStrideToNumpyStride Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Move Tensor.__setitem__ and global related api to Python/C api (#8375) * add local_to_global, global_to_global, to_global. global_to_global still have bugs * fix bug of global_to_global * remove python api * add setitem * remove local_to_global sbp pack, format code * format code * remove redundant code * add error msg, refine check of to_global * fix bug of check * add error msg * fix clang static check error * remove useless api in tensor.py, remove redundant code, remove useless CHECK * add to_local * fix wrong exception type in unittest for to_local exception message * cuda add default error msg (#8427) default error Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Refactor ShapeView (#8422) * update Signed-off-by: daquexian <daquexian566@gmail.com> * update and add docs Signed-off-by: daquexian <daquexian566@gmail.com> * turn on view slice (#8302) * turn_on_view_slice * inplace scalar math hnandle non-contiguous input * fix clang check * add docs * refactor * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Add flow env init rdma api (#8415) * add_flow_env_init_rdma_api * adjust persistent_workers logic for RDMA support * adjust persistent_workers logic for RDMA support * add rmda_inited api * minro fix * add docs * Update python/oneflow/utils/data/dataloader.py Co-authored-by: daquexian <daquexian566@gmail.com> * fix typo * refine * fix RDMAIsInitialized * minor fix * refine * rename InitRdma to InitRDMA * refine Co-authored-by: Flowingsun007 <flowingsun007@163.com> Co-authored-by: daquexian <daquexian566@gmail.com> * add 1d send recv in nccl logical (#8355) * add 1d send recv in nccl logical * Update insert_nccl_logical_op_pass.cpp * auto format by CI Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support iree ci (#8419) * create mlir cpu and modify build gcc 7 shell script * fix the bug of test_iree_resnet.py cuda test in cpu version error * fix constant folding tests * suport oneflow_test_cpu_only * pub * build script add flag * modify test yml * add python3 into \PATH * don't use pretrain model * install flowvision Co-authored-by: mosout <mosout@qq.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> * Feat straighten task nodes (#8347) * Add a fast topological traversal * Add an initial implementation of straighen nodes * Add the straighen nodes algorithm * Change algorithm structure * Remove some debug information * Finalize the straighten algorithm after deciding the parameters by experiments * Notify the usage of straighten algorithm * Of format * Update oneflow/core/graph/straighten_nodes.cpp Of format Co-authored-by: daquexian <daquexian566@gmail.com> * Of format * Stop using visual string before we find a better key * Remove magic numbers and Of format * Remove starts * Of format * Fix a bug of using GetMaxVal<int32_t>() as an initial number for comparing * Refactor add straighten algo interface (#8435) * feat(*): export straighten nodes algorithm inferface * export documentation * Update python/oneflow/nn/graph/graph_config.py Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> * Use TopoForEachNodeFast as default. (#8436) * Use TopoForEachNodeFast as default. Rename the original one as TopoForEachNodeDynamic * Speed up TopoForEachNodeFast when traversing a subgraph * Rename the switch and code clean up * Hide the class TopoStruct * Hide all the other functions * Grammar * Of format Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * Refactor NLLLoss to support split class dim (#8380) * refactor * RuntimeError * avoid atomic add * test * fixes * update test * update test * update test * fix kernel * improve backward * update test * out_weight to be required * address static analysis errer * fix static analysis error * fix static analysis error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Strict ordering in memory reuse algorithm (#8441) * Support broadcast in fused_softmax kernel (#8321) * support broadcast * refine * Remove shape check * fix sbp when broadcast * rollback softmax grad threshold * increase threshold of test conv bn folding * tol to 1e-2 * check error msg of fuse softmax ops * add more dispatch * remove double datatype test and add broadcast test Co-authored-by: cheng cheng <472491134@qq.com> * Merge slice and logical slice (#8416) * remove Slice, SliceUpdate, SliceGrad op * rename logical_slice to slice and logical_slice_assign to slice_update * move gradient_func logical_slice.cpp to slice.cpp * fix some bug and refine local test * feat(SliceUpdate): support 0size tensor * test(Slice): refine consistent slice test * test(SliceUpdate): refine consistent slice_update test * not export slice_update's inplace parameter * auto format by CI * recovery slice_grad_op * fix slice_view bug * add error message and attr judgement * modified old test * auto format by CI * update test README * update tensor_string code * fix test bug * auto format by CI * fix(hsplit): hsplit functor bug * fix vsplit doc test bug * refine * fix test * fix pin_memory bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Graph block.config.set_stage() for recommended Pipeline api. (#8442) * Graph block.config.set_stage() for recommended Pipeline api. * revert diff * refine api doc Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Update PolynomialLR's doc and paramater (#8430) * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * rename the steps to decay_batch in parameters * update PolynomialLR test case Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add mv op (#8445) * add mv op with bug that Int is incompatible * add test * update test_mv.py * fix based on comments * fix based on comments Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * enable oneflow_iree(python package) and corresponding test works in ci (#8431) * update test.yml * add pytest for oneflow_iree examples * add oneflow frontend test * Dev tensor is pinned api (#8447) * support tensor.is_pinned * add test case * add docs * auto format by CI * refine * auto format by CI * refine * auto format by CI * refine * refine * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Nd sbp tensor str (#8458) * nd sbp tensor str * add nd sbp tensor str test * bigger input size * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Patch sbp cost (#8378) * Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Add the slight penalty for eager * Consider B -> (B, B) for a scalar * Do not consider parallel description in priority ratio * Of format * Fix a bug in the old version group boxing with 2D SBP (#8448) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Fix bugs of patch-sbp_cost (#8456) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Reduce to uniform B for 1 device. Use the actual parallel description for each tensor * Fix a bug of fix-group_boxing-bug * Group boxing reduce [2, 2]: (S0, S0) to [4]: S0, then we might infer a 1D SBP from a 2D SBP hint Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: cheng cheng <472491134@qq.com> * Decouple stream and instruction (#7607) * remove deprecated python api * backup code * backup code * fix compiler complaints * fix typo in refactoring * kMockDevice * add unit test test_mock.py * revert mock kernels * vert DEVICE_TYPE_SEQ * mock placement * address pr comments * register device kCriticalSectionDevice and kLazyJobLauncher * kControlDevice * Stream::vm_stream_ * fix compiler complaints * backup code * rename StreamIsTransport to IsCommNetStream * decouple vm::StreamType and vm::InstructionType * fix compiler complaints * remove 'gpu' related code * address static analyzer complaints * address static analyzer complaints * remove unused module in test_mock.py * the Env is never destroyed. * export Env into python * more unittests * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * fix oneflow.placement.__str__ * revert GlobalSync * init_producer_stream in oneflow.from_numpy * debug code for vm * init disable_vm_threads_ in VirtualMachine::VirtualMachine * Update oneflow/core/vm/virtual_machine.h Co-authored-by: daquexian <daquexian566@gmail.com> * create stream in forked subprocesses. * refactor StreamRoleSwitch to StreamRoleVisistor * ThreadLocalGuard * auto format by CI * fix compiler complaints * fix static analyzer complaints * VirtualMachine::GetVmStream * fix static analyzer complaints * reimplement AddAndReadVector by std::deque * reimplement AddAndReadVector * merge master * increase atol for test_consistent_rnn_cell.py * StreamRole::AsyncLaunchedCommNet is bound to EventRecordedCudaStreamType * auto format by CI * remove StreamRoleVisitor<T>::VisitInvalid * no copy in AddAndReadVector * fix bug of AddAndReadVector::size_ * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix AddAndReadVector::GetGranularity * remove bad unittest * auto format by CI * rename CallInstructionType to OpCallInstructionType * sta…

lixiang007666 added 2 commits June 13, 2022 15:09

Modify sbp.split()'s axis karg to dim

8a4f44a

Refine

f724b6e

lixiang007666 requested review from BBuf, daquexian and jackalcooper as code owners June 13, 2022 07:26

lixiang007666 added feature api labels Jun 13, 2022

strint reviewed Jun 13, 2022

View reviewed changes

python/oneflow/framework/distribute.py Outdated Show resolved Hide resolved

python/oneflow/framework/distribute.py Outdated Show resolved Hide resolved

python/oneflow/framework/distribute.py Outdated Show resolved Hide resolved

lixiang007666 and others added 5 commits June 13, 2022 16:03

Refine

998d673

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

c110556

Refine

b6e4815

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

c279a52

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

13c8d60

strint approved these changes Jun 14, 2022

View reviewed changes

clackhan approved these changes Jun 14, 2022

View reviewed changes

lixiang007666 added automerge documentation labels Jun 14, 2022

lixiang007666 requested a review from oneflow-ci-bot June 14, 2022 01:58

lixiang007666 added the python label Jun 14, 2022

hjchen2 reviewed Jun 14, 2022

View reviewed changes

lixiang007666 removed the request for review from oneflow-ci-bot June 14, 2022 02:22

lixiang007666 added 2 commits June 14, 2022 11:00

Refine

b43abc7

Refine

3b27a05

lixiang007666 enabled auto-merge (squash) June 14, 2022 03:06

lixiang007666 requested a review from oneflow-ci-bot June 14, 2022 03:06

hjchen2 reviewed Jun 14, 2022

View reviewed changes

python/oneflow/framework/distribute.py Outdated Show resolved Hide resolved

hjchen2 reviewed Jun 14, 2022

View reviewed changes

python/oneflow/framework/distribute.py Outdated Show resolved Hide resolved

lixiang007666 removed the request for review from oneflow-ci-bot June 14, 2022 03:16

Refine

a10beea

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

67ad2da

hjchen2 reviewed Jun 14, 2022

View reviewed changes

hjchen2 approved these changes Jun 14, 2022

View reviewed changes

mergify bot and others added 3 commits June 14, 2022 08:24

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

f551ab1

Refine

6d9b37e

Merge branch 'sbp_split_axis_to_sbp_split_dim' of github.com:Oneflow-…

53cf345

…Inc/oneflow into sbp_split_axis_to_sbp_split_dim

lixiang007666 requested a review from oneflow-ci-bot June 14, 2022 09:19

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

eb58e0c

lixiang007666 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 14, 2022 12:03

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

95ffbe9

lixiang007666 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 14, 2022 14:20

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

5171b1a

lixiang007666 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 14, 2022 15:13

mergify bot added 2 commits June 14, 2022 16:02

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

dd4b7ab

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

4520162

Merge branch 'master' into sbp_split_axis_to_sbp_split_dim

86ec5a1

lixiang007666 merged commit 56ace89 into master Jun 14, 2022

lixiang007666 deleted the sbp_split_axis_to_sbp_split_dim branch June 14, 2022 22:33

lixiang007666 mentioned this pull request Jun 22, 2022

Modify split axis to dim Oneflow-Inc/oneflow-documentation#491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify sbp.split()'s karg: axis to dim #8411

Modify sbp.split()'s karg: axis to dim #8411

lixiang007666 commented Jun 13, 2022 •

edited

Loading

github-actions bot commented Jun 14, 2022

hjchen2 Jun 14, 2022

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 14, 2022

Modify sbp.split()'s karg: axis to dim #8411

Modify sbp.split()'s karg: axis to dim #8411

Conversation

lixiang007666 commented Jun 13, 2022 • edited Loading

This PR is done:

python 层和接口改动：

C++ 层改动：

github-actions bot commented Jun 14, 2022

hjchen2 Jun 14, 2022

Choose a reason for hiding this comment

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 14, 2022

lixiang007666 commented Jun 13, 2022 •

edited

Loading