Skip to content

Latest commit

 

History

History

swav

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Training SWaV with decentralized averaging

This code trains SwAV model on ImageNet using collaborative SGD. In our code we use vissl and ClassyVision with some modifications.

Requirements (for all participants):

  • Install the library (vissl) from the root folder using the guide from source.
  • Install the library (ClassyVision) from the root folder.
  • Install hivemind (see main README).

How to run

  1. Get ImageNet by following the vissl guide.
  2. Run the first DHT peer (aka "coordinator") on a node that is accessible to all trainers: python run_initial_dht_node.py --listen_on [::]:1337. After that, you can get INITIAL_DHT_ADDRESS and INITIAL_DHT_PORT from the stdout.
  3. For all GPU trainers, run
python vissl/tools/run_distributed_engines.py \
    hydra.verbose=true config=pretrain/swav/swav_1node_resnet_submit \
    config.CHECKPOINT.CHECKPOINT_ITER_FREQUENCY=30000 \
    +config.OPTIMIZER.batch_size_for_tracking=64 \
    config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64  \
    +config.OPTIMIZER.lr=2.4 +config.OPTIMIZER.warmup_start_lr=0.3 \
    +config.OPTIMIZER.warmup_epochs=500 +config.OPTIMIZER.max_epochs=5000 \
    +config.OPTIMIZER.eta_min=0.0048 \
    +config.OPTIMIZER.exp_prefix="test_resnet50_swav_collaborative_experiment" \
    +config.OPTIMIZER.target_group_size=4 \
    +config.OPTIMIZER.max_allowed_epoch_difference=1 \
    +config.OPTIMIZER.total_steps_in_epoch=640 config.LOSS.swav_loss.queue.start_iter=98000 \
    +config.OPTIMIZER.report_progress_expiration=600 +config.DATA.TRAIN.DATA_PATHS=["$(IMAGENET_PATH)/train"] \
    config.OPTIMIZER.dht_listen_on_port=1124 config.OPTIMIZER.averager_listen_on_port=1125 \
    +config.OPTIMIZER.dht_initial_peers=["$(INITIAL_DHT_ADDRESS):$(INITIAL_DHT_PORT)"]