Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Multi Tensor apply Optimizer (#8373) * Add optim_cast and modify sgd * Remove * try to add fuseUpdatecast pass logic * use pass * still have bug in inplace * ban inplace and fix sgd update * fix regst num * add env var * remove cuda graph wrong use * add support for graph * initialize * add functional impl * add simple job rewrite * delete redundant sgd update kernel * support half * add kernel * use single loop kernel * refine * when in eval mode, we turn off multi tensor update * refine format * use juncheng kernel * Refine * group multi tensor op by some attr * add parallel conf to key * refine * Add unroll logic * fix bug * restruct * use pointer list * add adam kernel * support multi tensor adam update * Remove cpu * support skip if and scale by tensor * support sgd adam unittest * add more check * Remove config * Restruct tensorparams * support fused cast in multi tensor update * support cast in multi tensor * fix bug in model update cast pass * fix multi tensor sgd update with cast Pass check logic * refine * support multi tensor adam update with cast * refine format * Remove redundant template args * merge modify for fused cast * only allow fused cast in train mode * only support data parallel in multi tensor update * rewrite fuse update cast pass logic * remove redundant if * fix format * add new line * rename * Remove print * rename and add LOG * Add more type and test * still have bug in multi tensor adam * Fix multi tensor adam update bug * add multi tensor adam update with cast test * simplify code * fix format * Add model diff datatype in optimizer key * remove random seed * fix comment * fix comment * fix to use model copy * use for loop * Fix comment * use hashcombine * fix clang analysis error * add with cuda macro * fix env var in unittest * remove redundant unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix doc and ops template auto gen (#8546) * fix doc and add op calculator * fix bug * fix gen_ops * fix diag 0size tensr shape infer bug (#8557) * fix diag 0size tensr shape infer bug * refine * refine * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Format tensor on cpu (#8548) * Format tensor on cpu * use tensor.detach * Remove useless WITH_CUDAs (#8562) * unique identity (#8509) * unique identity * fix * add identit name * rm debug log * mv identity form class to graph * auto format by CI * fix unique iden with having multiple stage * auto format by CI * Update block.py Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add GenericStreamContext (#8560) * Modify some file and add test (#8556) * Modify some file and add test * modify the content * modify the format and test function name * modify the format and aligned with pytorch * delete print * modity the function name * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Move some op into amp gray list (#8545) enlarge gray list Co-authored-by: cheng cheng <472491134@qq.com> * Refine inplace expand runtime_error (#8561) * Refine inplace expand runtime_error * Opt * Refine * Add Note * OneEmbedding use malloc async (#8543) * in out ptrs * ops and test * test pass * prefetch tmp buffer * embedding shuffle tmp buffer * gradient shuffle * tmp buffer size * mem pool * cuda 11.2 * add id_shuffle to setNumunique in update tests * default not use dynamic alloc * fix of_tidy * add fused op * address review * init tmp_buffer * mv memset * fix * one_embedding fused_lookup_init_cast and fused_update_put (#8564) * add fused op * mv memset * fix * address review * rm fullcache n_missing check Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix cpu aligned_alloc size (#8569) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add flow norm (#8535) * add flow norm * rm import * rm doctest.testmod * fix pad_packed_sequence method input requires_grad==True (#8574) * fix pad_packed_sequence method input requires_grad==True * fix append error when batch_first=True Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix embedding manager tmp buffer (#8585) * fix embedding manager * format * fix reduce_ops 0size bug (#8551) * fix reduce_ops 0size bug * fix commnet * auto format by CI * fix bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Align Momentum Optimizer (#8549) * fix moemntum update * align momentum * fix bug and finish eager unittest * Support Graph optimizer * fix momentum bug * refine beta Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fill GetSbp bug and consistent test bug (#8576) fix(FillOp): fill GetSbp bug and consistent test bug Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Dev Fully fused MLP Grad[OneEmbedding] (#8462) * support fully fused mlp grad in eager * support lazy backward * fix output size * add fallback to tmp_buf logic when ones buffer is not enough * build sbp * overlap allreduce * fix overlap order * fix format * CUDA Graphs delayed capture * Add ifcomm create for graph * insert weight event roughly * fix dbias allreduce error * simplify code * Add 11060 limit * Remove print * Rename * fix fill bug and remove comm to cache * Rename variable and add debug code for cache * Use kernel state and fix bug * remove print * fix allreduce dbias bug * fix header file * fix comment * remove redundant headerfile * fix userops build error * refine * init nccl comm before execute kernel * fix comment Co-authored-by: liujuncheng <liujuncheng1022@gmail.com> * rename mirrored to local (#8503) * rename mirrored to local * rename files * rename files * auto format by CI * revert change of package_mirror.py * rename LocalObject to Dependence * rename fn LocalObject to Dependence * merge master * handle clang check * fix * refine * rename local_object to dependence Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Implement BroadcastElementwiseUnary primitive (#8384) * Add code skeleton for broadcast unary primitive * first try * finish impl * finish impl * format * fix build error * address review * refine * address review comments * use broadcast unary primitive in fill_tensor_ kernel * handle pack tail statically * fix * address review * address review * Fix SimplifyBroadcastDims * fix * revert fill_kernel Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * skip cpu autotest for graph global (#8593) * TODO * skip cpu autotest for graph global * Refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add function_library.h Exception (#8241) * add RuntimeError for checking * add RuntimeError to CHECK_EQ * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Refactor shrink (#8573) * caching allocator * auto format by CI * Update ep_device_context.h * EpDeviceCtx with CachingAllocator * rm RawAllocator typename * auto format by CI * specific allo in EpDeviceCtx * auto format by CI * rm outdated alloc * simplify thread safe guard * auto format by CI * avoid return mutex * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Speed up SliceKernel (#8589) * perf(SliceKernel): descrease number of cuda kernel and speed up * perf(SliceKernel): use old kernel when small tensor is all fullslice * use std::copy to copy contiguous memory * fix cpu kernel bug * Update readme and vsn for 0.8.0 (#8600) * update version * remove py3.6 * modify some file and improve error message (#8592) * modify some file and improve error message * modify scalar_by_tensor_op.cpp * Update scalar_by_tensor_op.cpp * Update slice_op.cpp * Update test_slice_op.py * Update test_slice_op.py * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * rename consistent to global (#8505) * rename consistent to global * rename consistent to global * rename files * rename files * refine * auto format by CI * refine * fix clang check * fix * fix * fix * rm to_consistent docs * auto format by CI * refine * fix * fix * revert changes * auto format by CI * revert changes * revert changes * rename * rename Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * add module releated container docs (#8580) * add module releated container docs * auto format by CI * fix comment * refine * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix rnn util extra memory usage when requires_grad=False (#8603) * fix rnn util extra memory usage when requires_grad=False * add comments * refine comments Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * use bracket format slice in tensor str (#8489) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Perf TensorInfo constructor (#8606) * perf(Autograd): perf TensorInfo constructor * rename consistent to global Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * print operators' python location when print nn_graph (#8558) 1. add a flag in nn.Graph.debug() named print_op_loc for printing operator location. 2. add a flag in nn.Graph.debug() named only_print_user_code_loc for only print users' code location * Add randint like (#8598) * add randnint_like op * add docs for random * refine * auto format by CI * add randint_like global test * refine doc * refine randint_like docs * fix bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add full_like api (#8595) * add full_like_op api * refine * add test * refine * refine docs * refine * add consistent_full test * add full_like op * fix docs commnet * change scalar sbp return value from list to tuple * auto format by CI * merge conflict * revert Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix cumsum GenBackwardOpConfFn (#8604) * fix cumsum GenBackwardOpConfFn * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * revert change (#8613) * fix test graph optimization conf CI bug (#8617) * restore resource config after random tests * refine * refine * Release pod tensor (#8552) * ThreadLocalGuard * split ReleaseTensor into ReleasePodTensor and ReleaseNonPodTensor. * rename Co-authored-by: luyang <flowingsun007@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add param group for optimizer (#8611) * add add_param_group interface for Optimize * add test for add_param_group * revert * fix comment * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix broadcast_elementwise_binary cpu (#8625) fix broadcast_elementwise_binary_cpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * align exception msg to torch (#8627) * align exception msg to torch * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * skip unstable global test in ci, reduce failture rate (#8635) * fuse embedding interaction (#8586) * fuse embedding interaction * fix of_tidy * refine * fix * address review Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix flip gen backward opconf (#8605) * fix flip gen backward opconf * use new opconf api Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add ONEFLOW_ONE_EMBEDDING_PERSISTENT_TABLE_SNAPSHOT_LOAD_MMAP_LOCKED (#8597) * Add ONEFLOW_ONE_EMBEDDING_PERSISTENT_TABLE_SNAPSHOT_LOAD_MMAP_LOCKED * refine * use MAP_POPULATE Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Profiling main thread (#8601) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fully Memory Log V2 with more details (#8565) * Fully Memory Log V2 with more details * refine log and long op name * fix clang tidy * fix test Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Stream policy (#8590) * ThreadLocalGuard * refactor signature of StreamType::InitDeviceCtx * refactor hint * add StreamPolicy * remove DeviceCtx args * refine OpCallInstructionUtil::Prepare & Compute * merge EpDeviceCtx and LazyJobDeviceCtx into StreamPolicy * minor fix * minor fix * del useless code * fix error * fix merge error * fix segment fault bug * fix complie error * del methods belong to Subclass * reslove comment Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add fully support for broadcast matmul (#6937) * fix arange bug * fully support broadcast matmul * add more check * remove check * add fully sbp * fix full sbp * Fix broadcast matmul grad * remove old broadcast matmul grad * add broadcast grad back and when B numaxes is 2, we use broadcast_gradB instead of matmul+reduce * add lazy backward * Add restrict when transpose_a is false we can use bmatmul_grad_b * revert * fix broadcast matmul backward * fix single client dispatch matmul logic * revert old bcast matmul grad b kernel * fix eager functional matmul backward * add more test case * remove redundant code * add more special case * when b num axes is 2, we only save tensor a * fix annotation * fix conflict and format * remove single client matmul code * Fix eval error * fix conflict * fix unittest * Add init value * support matrix vector matmul * add vector matrix product * Use matmul primitive to rewrite matrix vector product forward and backward * Add fullllllllly support for vector matrix product * Fix sbp * fix bug * add unittest * Add consistent test for broadcast matmul * Remove redundant code * fix userops annotation * fix * refine * Fix clang static analysis * fix clang analysis * set check graph as false * fix * fix for unittest * fix broadcast sbp bug * try to fix unittest * Fix consistent test * fix multiplier to 4 for unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert "skip cpu autotest for graph global" (#8608) * Revert "skip cpu autotest for graph global (#8593)" This reverts commit b076be782fd8f21e50ee4915f2d1562f3a9ab4c0. * cherry pick from master Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * OneEmbedding add tmp_buffer allocator (#8588) * fix embedding manager * format * refine embedding_manager tmp_buffer allocator * fix * format * refine * refine * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * refine error msg for some user ops (#8579) * refine error msg for some user ops * refine error msg for some user ops * optimize * optimize the writing * optimize the writing * optimize the writing * auto format by CI * optimize writing Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add tril fill value (#8655) add tril fill value Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix_non_pod_data_allocate_bug (#8657) Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix norm (#8629) * fix norm * add doc * add bool & * update math_functor.cpp * add note * fix_decorate_mem_leak_bug_in_eager_boxing (#8661) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add higher order derivative for leaky_relu and negative op (#8643) * add higher derivative for leakyrelu and negative * fix a typo * remove functor * add initialize alpha * fix incorrect dim size in global test * fix incorrect dim size in global test * optimize testcase Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * update oneflow intro to show the difference (#8669) * update oneflow intro * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine oneflow intro * Stacked error (#8671) * ThreadLocalGuard * StackedError * StackedError Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Refactor tensor initializer (#8626) * fix(*): fix xavier_initializer * refactor(Initializer): refactor initializer * fix function name * auto format by CI * refine * fix interface in tensor.py * fix(trunc_normal_): fix init bug and add test * auto format by CI * fix bug * add oneflow.nn.init.normal_ test Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Fix nn doc (#8650) * fix hsplit doc * add doc for module * fix dtype * fix formula * add ref * fix row length * Fix reduce max min bool dtype bug (#8651) * fix reduce_max_min_bool_dtype * fix bug * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Remove redundant exception wrapper (#8631) * remove redundant ExceptionWrapper * refine KeyErrorMessage * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Refactor MemoryCase to eliminate determine statements of device_type (#7727) * ref memory_case_util * ref BlobObject::CheckMemCase * ref mem_case using * address review * address review * namespace memcase -> memory * fix conflict * address review * address static analysis * rm check * cpu device_id is always 0 * fix conflict * timeout-minutes: 50 * revert change * increase thrd limit in container * skip 2x2 TestEinsumConsistent * skip failed case of distributed test * auto format by CI * fix_non_pod_data_allocate_bug Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: tsai <jackalcooper@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: clackhan <han_binbin@163.com> * fix some data races in c++ api and SteadyVector (#8654) * fix some data races in c++ api and SteadyVector Signed-off-by: daquexian <daquexian566@gmail.com> * skip self copy in MutShapeView::ToShape Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Fix sin/cos higher order derivative (#8648) * fix(GradGrad): fix sin/cos higher order derivative * fix(GradGrad): fix calculate error * refine autograd global test * auto format by CI * refine sin/cos grad_grad calculate * fix static analysis * merge conflict Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Ping Zhu <58718936+REYGU@users.noreply.github.com> Co-authored-by: Zhu, Ping <pingzhuu@outlook.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refine_eager_boxing_to_adapt_ep (#8568) * refine_eager_boxing_to_adapt_ep * fix typo * refine * refine symmetric-acyclic-nd-sbp-to-nd-sbp * refine * fix error * fix static check * add NOLINT Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix repeat bug (#8645) * make result contiguous * add test case * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Instruction policy (#8583) * ThreadLocalGuard * vm::InstructionPolicy * fix compile error (#8623) * fix compile error * change MirroredObject to Dependence * Modify DependenceVector * rm include stream type * fix stream type * auto format by CI Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * handle non-contiguous input (#8665) * handle non-contiguous input * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * rename define CONSISTENT to GLOBAL (#8652) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refine naive interpret (#8672) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments * rename one::EagerBlobObjectList* to vm::EagerBlobObject* * refactor signature of InstructionsBuiler::Call * PhysicalRun * refactor InstructionsBuilder::Call * remove unused StatefulOpKernel::need_check_mem_case * remove EagerLocalTensorImpl::is_shape_synced_ * refactor SoftSync * move SmallVector from common/container_util.h to framework/instructions_builder.cpp * explicit scalar initialization Co-authored-by: clackhan <han_binbin@163.com> * Rebuild Docs V0.8.0 (#8392) * rebuild for 5 module * fix bug * fix for doctree and content in nn and * fix * fix * fix * add some * fix for oneflow.rst * update oneflow oneflow.nn * update tensor * update tensor module * update * test * update * update * fix for undone desc * docs: oneflow.utils.data (#8485) * feat(utils.data): add oneflow.utils.data * docs(dataloader): change the docstring of DataLoader * docs(tensor): add methods to oneflow.Tensor document * docs(optim): change docstring of optimizer and add a note to the doucument * nn.graph * fix for graph * fix bug * review nn and linalg document (#8515) * docs(nn): add contents to oneflow.nn document * docs(linalg): refactor oneflow.linalg document * change attributes.rst and review nn.functional.rst (#8514) * change attributes.rst and review nn.functional.rst * reconstruction oneflow.cuda * fix cuda and rebuild comm demo (#8582) * update image * add distributed * oneembedding & refine graph * update for sdisributed one_embedding * fix rnn.py (#8616) * 重构 oneflow.nn.init 文档 (#8622) docs(nn.init): refactore nn.init document * docs(nn.init): remove the comments * docs(utils.data): remove the comments * update and fix bug * docs(review): refine the documents (#8646) * docs(review): refine oneflow, nn, Tensor, nn.init, linalg, utils.data, optim modules * docs(optim): modify the code examples * docs(tensor): edit note * 重构 oneflow.autograd 文档 (#8594) * docs(autograd): refactor oneflow.autograd * docs(autograd): edit "Default gradient layouts". * docs(autograd): reedit "Default gradient layouts" * docs(autograd): add comment * docs(autograd): add reference * update * docs(tensor): change autoclass to autosummary * update * update * add oneflow.linalg.diagonal (#8653) * docs(linalg): add oneflow.linalg.diagonal * update enviorment variable * Update docs/source/distributed.rst Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * Update docs/source/distributed.rst Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * update enviorment variable * update for ev & distributed * update distribued * update ev * update distribute desc * Update docs/source/distributed.rst Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * update * 修改 docstring 描述 (#8656) * docs: move pytorch refernce to end * docs: add some docstring * docs(refs): add refs * Update docs/source/distributed.rst * updte for distributed details and environment_variable * docs(docstring): Modify all reference links to version 1.10 (#8663) * fix bug * fix bug * fix all warning Co-authored-by: Guoliang Cheng <1876953310@qq.com> Co-authored-by: liu xuan <85344642+laoliu97@users.noreply.github.com> Co-authored-by: Guoliang Cheng <lmyybh_lazy@163.com> Co-authored-by: laoliu97 <841637247@qq.com> Co-authored-by: Yao Chi <later@usopp.net> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * Fix zeros like and ones_like api (#8632) * fix zeros_like and ones_like bug * refine * revert * refine * fix tensor_slice_view infer physic_shape bug * add test * refine * auto format by CI * fix bug * refine * auto format by CI * fix import error * fix bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix sbp print bug (#8689) * Add a normal priority with no transfer but different sbp * Fix the bug for printing no boxing edge * Do not use P for weights * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * eager_local_interpreter_with_infer_cache (#8619) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments * rename one::EagerBlobObjectList* to vm::EagerBlobObject* * refactor signature of InstructionsBuiler::Call * PhysicalRun * refactor InstructionsBuilder::Call * remove unused StatefulOpKernel::need_check_mem_case * remove EagerLocalTensorImpl::is_shape_synced_ * eager_local_interpreter_with_infer_cache * remove useless code * reslove comments * refactor TensorMeta::TensorMeta(const TensorMeta) * use small vector * add kMaxNumDims * fix error include * fix split Symbol LocalTensorMeta error * refactor SoftSync * move SmallVector from common/container_util.h to framework/instructions_builder.cpp * mone ONEFLOW_EAGER_ENABLE_LOCAL_INFER_CACHE to eager.h * add blank line * reslove comments * minor fix * refine * explicit scalar initialization * fix static check error * auto format by CI * of_format * reslove comment * refine * refine * refine Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gelu nn.Module bug and support tanh mode. (#8693) * add gelu2 api * refine test * refine docs * refine * restuct * delete useless headfile * format * rm doc of tensor.gelu (#8696) Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix bug in CrossFeatureInteraction LazyBackward (#8677) fix bug Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix floating-point scalar tensor in arange (#8673) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn functional fold (#8667) * add fold * update fold.py * add test * fix doc * fix comment Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * modify some file and improve the error message (#8566) * modify some file and improve the error message * modify the content * modify the content * auto format by CI * Update roi_align_op.cpp * Update roi_align_op.cpp * Update reshape_user_op_util.cpp * auto format by CI * Update roi_align_op.cpp Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * [OneEmbedding] add id_shuffle_copy_out (#8683) add id_shuffle_copy_out Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix add_param_group step key not match error (#8698) * fix add_param_group step key not match error * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add env ONEFLOW_EP_CUDA_DEVICE_FLAGS and ONEFLOW_EP_CUDA_STREAM_FLAGS (#8703) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix for docsv0.8 (#8710) * fix repeat op 0-size releated bug (both in FW and AD) (#8707) * fix repeat op 0-size releated bug (both in FW and AD) * refine * refine static check * refine * fix commnet * fix comment * refine * fix test * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support Dropout Scale in FusedMLPGrad[OneEmbedding] (#8633) * support alpha list * Remove redundant modify * remove redundant alpha set * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix bug of Tensor.type (#8697) * fix bug of tensor.type(flow.Tensor) * fix bug of tensor.type(flow.Tensor) about device * Fix tensor type doc (#8699) fix doc of tensor.type * add test for tensor.type(flow.Tensor) * move PyTensorMetaCls_CheckExact to header file Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * ONEFLOW_GRAPH_PLACE_TRAINING_STATE_ON_ALL_RANKS (#8706) * ONEFLOW_GRAPH_PLACE_TRAINING_STATE_ON_ALL_RANKS * auto format by CI Co-authored-by: liujuncheng <liujuncheng1022@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * define_mut_output_shape_and_mut_output_stride_in_infer_ctx (#8709) * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add qat conv modules (#8368) * add qat conv modules * add quantization related modules to doc * refine qatconv modules doc * add qat conv module tests * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add unsqueeze_multiple_op (#8714) * add unsqueeze_multiple_op * modify the format * Update functional_api.yaml Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * modify broadcast_like_op.cpp and add test (#8720) * modify broadcast_like_op.cpp and add test * modify broadcast_like_op.cpp * Update broadcast_like_op.cpp Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * JIT LR (#8500) * add example code * Update cosine_annealing_lr.py * enable self params transformer * enable pass ast to c++ api * enable jit backend for lr * enable jit global register and invoke * convert Global to Singleton for new merge * enable pybind11 walk on python ast * enable test all existent get_lr of oneflow in python * enable py_ast_wrapper pass ast from python to mlir * switch all ast to ast-wrapper in mlir scope * define python ast partially * partial python ast definition * trim asdl of python ast * mlir gen * add symbol table * from ast to jit done * switch llvm::errs() to mlir::emitError and convert switch to typeSwitch * trim duplicate namespace use * fix LIT header * add some docs * enable compare with or_else, if with return seamless in branch and mutable variable * trim code and refine struct * register pybind11 ast node for shared_ptr * enable cpp class in python * go through python to mlir to llvm to jit to run * add addf subf op * work well on stepLR linearLR exponentialLR coseineDecayLR cosineAnnealingLR constantLR * enable maxf minf conversion to llvm ir * rename LR_JIT to LRJITRegister * remove LR_JIT_Engine and swith Invoke to std::function ret by lookup * refine struct * enable bisect_right and python resigter api have dump option arg * add bisect_left and bisect_transformer specially, delete former test python script * remove c++17 standard * restore double hash to iterator * publish * publish * publish * use llvm classof and typeswitch rightly * trim * commit * commit * commit * commit * commit * commit * auto format by CI * Update ir.cpp * Update OneFlowLRJITRegistry.h * auto format by CI * Update AstMlirGen.h * Update lr_jit.cpp * auto format by CI * Naming conventions * auto format by CI * auto format by CI * deploy _ behind Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: yuhao <1171760467@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add logspace (#8599) * add logspace * add global test * restore rand * fix doc * rename consistent to global * adjust import order * add todo * Add hann_window (#8615) * add hann_window * rm useless include * add check * adjust import order * add ONEFLOW_VM_PENDING_HANDLE_WINDOW_SIZE (#8730) * add ONEFLOW_VM_PENDING_HANDLE_WINDOW_SIZE * add environment to vm.h Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix as strided bool type and view bug (#8713) * fix as_stride bug * refine * refine * refine * delete useless head file * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add functional binary cross entropy (#8708) * add gelu2 api * refine test * refine docs * refine * restuct * delete useless headfile * format * rm doc of tensor.gelu * add functional binary cross entropy Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support map_location in flow.load (#8666) * support map_location in flow.load Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix tests Signed-off-by: daquexian <daquexian566@gmail.com> * fix bug when map_location is None Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Add addcdiv (#8581) * add addcdiv * fix tensor_functions * fix inplace * add test number * rename consistent to global * Inner most dim case for cumsum cumprod op (#8403) * cumsum use cub scansum in some case * prod use cub scan * refine name * refine * optimize cum op * format * fix * get device properties by cuda stream class * revert useless code * refine * outer dim use parallel sweep algo * refine * fix a fraction of threads hit __syncthreads * revert * refine kernel define * refine * refine * refine * refine * move comment * fix * fix * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Define mut output dtype and mut output is dynamic in infer ctx (#8716) * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo * define_mut_output_dtype_and_mut_output_is_dynamic_in_infer_ctx * replce const DataType& with DataType * replace const DataType& with DataType ret * split TensorDesc4ArgNameAndIndex and MutTensorDesc4ArgNameAndIndex * refine * minor fix * refine * fix static check error * Update op_expr.cpp * Update op_expr.cpp * Update stateful_opkernel.cpp * refine * fix static check error * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Dev refactor fuse instruction policy (#8624) * ThreadLocalGuard * vm::InstructionPolicy * refactor fuse instruction policy * fix compile error (#8623) * fix compile error * change MirroredObject to Dependence * Modify DependenceVector * add instruction policy util * add instruction policy util * remove include * add include * rm fuse instruction type * Modifying variable properties * add stream_sequential_dependence_ to instruction_policy Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of batchnorm num_batches_tracked global error when loading state_dict (#8723) add condition for assign num_batches_tracked Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add launch master port limit (#8563) * add launch master port limit * Update python/oneflow/distributed/launch.py Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix docs import distance (#8691) * fix import distance * add functional apis * add smooth_l1_loss docs * refine activation.py * add deleted api * review * 添加oneflow, nn 等模块文档中遗漏的接口 (#8704) * docs: add api * docs(nn): refactor nn * review Co-authored-by: Guoliang Cheng <lmyybh_lazy@163.com> Co-authored-by: ChenQiaoling <48576019+Chenqll@users.noreply.github.com> * refactor control stream type (#8647) * refactor control stream type * auto format by CI * Add method implementation * refine * refien Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Define mut output tensor desc (#8717) * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo * define_mut_output_dtype_and_mut_output_is_dynamic_in_infer_ctx * define_mut_output_dtype_and_mut_output_tensor_desc * replce const DataType& with DataType * replace const DataType& with DataType ret * split TensorDesc4ArgNameAndIndex and MutTensorDesc4ArgNameAndIndex * refine * minor fix * fix merge error * fix warning error * refine * fix static check error * Update op_expr.cpp * Update op_expr.cpp * Update stateful_opkernel.cpp * refine * fix static check error * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Symbolic local tensor meta (#8662) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments * rename one::EagerBlobObjectList* to vm::EagerBlobObject* * refactor signature of InstructionsBuiler::Call * PhysicalRun * refactor InstructionsBuilder::Call * remove unused StatefulOpKernel::need_check_mem_case * remove EagerLocalTensorImpl::is_shape_synced_ * eager_local_interpreter_with_infer_cache * remove useless code * reslove comments * refactor TensorMeta::TensorMeta(const TensorMeta) * use small vector * Symbolic LocalTensorMeta * check shape in critical_sectio * add kMaxNumDims * fix error include * fix split Symbol LocalTensorMeta error * fix split cache and symbolic local tensor meta error * refactor SoftSync * move SmallVector from common/container_util.h to framework/instructions_builder.cpp * mone ONEFLOW_EAGER_ENABLE_LOCAL_INFER_CACHE to eager.h * add blank line * reslove comments * minor fix * refine * explicit scalar initialization * fix static check error * auto format by CI * of_format * reslove comment * refine * refine * refine * fix error * define MutOutputShape and MutOutputStride in InferContext * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo * fix static check error * define_mut_output_dtype_and_mut_output_is_dynamic_in_infer_ctx * define_mut_output_dtype_and_mut_output_tensor_desc * replce const DataType& with DataType * split const and mut func in LocalTensorMeta * replace const DataType& with DataType ret * split TensorDesc4ArgNameAndIndex and MutTensorDesc4ArgNameAndIndex * refine * minor fix * fix merge error * fix warning error * refine * fix static check error * Update op_expr.cpp * Update op_expr.cpp * split MutTensorMeta and MutLocalTensorMeta * Update stateful_opkernel.cpp * refine * fix static check error * refine * refine * reslove comment * refine * fix typo Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * fxi typo * use OpArgsVector Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * Feat general basic communication (#8437) * Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Fix a slight bug * Add at most 1 middle node for general basic communication * Add the cost for general basic communication * Add the slight penalty for eager * Skip initialization of boxing collector if not needed * Fix a bug * Dev nd nccl send recv boxing (#8467) * nd nccl_send_recv_boxing * rm print * support num_axes > 2 * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e5fa20b97229d6b0be46a7ff814dc8cd83. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug inform…
- Loading branch information