Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve neighbor allreduce #78

Merged
merged 40 commits into from
Apr 11, 2021
Merged

Improve neighbor allreduce #78

merged 40 commits into from
Apr 11, 2021

Commits on Feb 8, 2021

  1. Configuration menu
    Copy the full SHA
    61b472c View commit details
    Browse the repository at this point in the history

Commits on Feb 11, 2021

  1. Configuration menu
    Copy the full SHA
    7e8ea66 View commit details
    Browse the repository at this point in the history

Commits on Feb 19, 2021

  1. Fixed the self_weight under emtpy receiving case

    BichengYing authored and Hanbin Hu committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    1d43096 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c082ce9 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'improve_neighbor_allreduce' of https://github.com/ybc19…

    …91/bluefog into improve_neighbor_allreduce
    Hanbin Hu committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    50933fb View commit details
    Browse the repository at this point in the history
  4. Rename neighbor_weights to src_weights, and send_neighbors to dst_wei…

    …ghts for neighbor_allreduce
    Hanbin Hu committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    a3365b3 View commit details
    Browse the repository at this point in the history
  5. A script to test existing examples

    Hanbin Hu committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    cf17875 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    923df09 View commit details
    Browse the repository at this point in the history
  7. Reorganize CheckNeighborSendRecvPattern

    Hanbin Hu committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    42d0af7 View commit details
    Browse the repository at this point in the history
  8. Fix timeline_ptr for NCCL

    Hanbin Hu committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    3184e17 View commit details
    Browse the repository at this point in the history
  9. Fix timeline_ptr for NCCL

    Hanbin Hu committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    c85cbd5 View commit details
    Browse the repository at this point in the history
  10. Merge branch 'improve_neighbor_allreduce' of https://github.com/ybc19…

    …91/bluefog into improve_neighbor_allreduce
    Hanbin Hu committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    8d9fe31 View commit details
    Browse the repository at this point in the history
  11. Put dst_weights information into TensorTableEntry

    Hanbin Hu committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    7d60585 View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2021

  1. First Version of neighbor_allreduce dst_weight, existing problem: Fus…

    …ion Not Implemented, CUDA data_weight problem
    Hanbin Hu committed Feb 27, 2021
    Configuration menu
    Copy the full SHA
    79c57b3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    29a239c View commit details
    Browse the repository at this point in the history
  3. CPU Fusion for dst_weighted added

    Hanbin Hu committed Feb 27, 2021
    Configuration menu
    Copy the full SHA
    97517b3 View commit details
    Browse the repository at this point in the history

Commits on Feb 28, 2021

  1. Configuration menu
    Copy the full SHA
    364f5fd View commit details
    Browse the repository at this point in the history

Commits on Mar 4, 2021

  1. Configuration menu
    Copy the full SHA
    cc9faf4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7f47ce4 View commit details
    Browse the repository at this point in the history
  3. Add cuda source for scalebuffer

    Hanbin Hu committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    d84138b View commit details
    Browse the repository at this point in the history
  4. Scale buffer to modify itself

    Hanbin Hu committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    06c375d View commit details
    Browse the repository at this point in the history

Commits on Mar 5, 2021

  1. Add .o file to .gitignore

    Hanbin Hu committed Mar 5, 2021
    Configuration menu
    Copy the full SHA
    e4db92f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ec2a3fc View commit details
    Browse the repository at this point in the history
  3. make clean *.o files generated by nvcc

    Hanbin Hu committed Mar 5, 2021
    Configuration menu
    Copy the full SHA
    adaa191 View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2021

  1. Add fix for NCCL single entry

    Hanbin Hu committed Mar 15, 2021
    Configuration menu
    Copy the full SHA
    46e6431 View commit details
    Browse the repository at this point in the history
  2. Make setup.py more robust

    Hanbin Hu committed Mar 15, 2021
    Configuration menu
    Copy the full SHA
    ae57e75 View commit details
    Browse the repository at this point in the history

Commits on Mar 17, 2021

  1. Add timeout and cuda check

    Hanbin Hu committed Mar 17, 2021
    Configuration menu
    Copy the full SHA
    62dc99d View commit details
    Browse the repository at this point in the history
  2. Move test example

    Hanbin Hu committed Mar 17, 2021
    Configuration menu
    Copy the full SHA
    a836b02 View commit details
    Browse the repository at this point in the history

Commits on Mar 18, 2021

  1. Fix NCCL side dst_weight fusion bug

    Hanbin Hu committed Mar 18, 2021
    Configuration menu
    Copy the full SHA
    e02a3d0 View commit details
    Browse the repository at this point in the history
  2. Add agg to make matplotlib more stable

    Hanbin Hu committed Mar 18, 2021
    Configuration menu
    Copy the full SHA
    8e42ca2 View commit details
    Browse the repository at this point in the history

Commits on Mar 21, 2021

  1. Address comments for setup.py

    Hanbin Hu committed Mar 21, 2021
    Configuration menu
    Copy the full SHA
    e5f8722 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9f2f55d View commit details
    Browse the repository at this point in the history
  3. Better consideration for weight buffer size

    Hanbin Hu committed Mar 21, 2021
    Configuration menu
    Copy the full SHA
    d7a9310 View commit details
    Browse the repository at this point in the history

Commits on Mar 26, 2021

  1. Configuration menu
    Copy the full SHA
    2fe9621 View commit details
    Browse the repository at this point in the history
  2. Make src_weights as std::map, and simplify logic for PerformNeighborA…

    …llreduceCallback
    Hanbin Hu committed Mar 26, 2021
    Configuration menu
    Copy the full SHA
    562b231 View commit details
    Browse the repository at this point in the history
  3. Add TODO #80 and #81, and simplify the logic for dst_weight

    Hanbin Hu committed Mar 26, 2021
    Configuration menu
    Copy the full SHA
    6764efa View commit details
    Browse the repository at this point in the history
  4. Wrap CheckNeighborSendRecvPattern again

    Hanbin Hu committed Mar 26, 2021
    Configuration menu
    Copy the full SHA
    1e8db23 View commit details
    Browse the repository at this point in the history
  5. Add two more TODOs

    Hanbin Hu committed Mar 26, 2021
    Configuration menu
    Copy the full SHA
    3d90087 View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2021

  1. Address review comments

    Hanbin Hu committed Mar 27, 2021
    Configuration menu
    Copy the full SHA
    96887b5 View commit details
    Browse the repository at this point in the history

Commits on Apr 11, 2021

  1. Add condition variable to control the loop (#88)

    * Add condition variable to control the loop
    
    * Minor update on topology_setting in global_state
    
    * Add missing <condition_variable> header
    
    * Change cv.wait to cv.wait_for 10 seconds
    
    * Address comment and remove adjusting resetVersionWinMem in ibfrun
    BichengYing authored Apr 11, 2021
    Configuration menu
    Copy the full SHA
    36c073a View commit details
    Browse the repository at this point in the history