-
Notifications
You must be signed in to change notification settings - Fork 13
5. Run Dorylus
Previous page: 4. Setup Lambda Functions | Next page: None | Home: Home
Make sure you have built the system properly before running it (see 2. Build Dorylus), the dataset has been uploaded to all graphserver nodes (see 3. Prepare Input Dataset), and the Lambda functions are ready on AWS cloud (see 4. Setup Lambda Functions). Remote machines within the same context should be executing the same executable. Additionally, all graphservers should be using the same full copy of input dataset.
Don't forget to re-build when you changed something in the source code.
Refer to 2. Build Dorylus chapter for how to send the paramter files.
We do a setup-cluster
before building the system because the master node needs a proper dshmachines
file to sync the executable. All graphserver nodes receive the same copy of grpah dshmachines, and weightserver nodes receive the same copy of weight dshmachines. Coordserver is alone and do not need this info.
The layerconfig
file provided by user is related to the dataset and specifies the convolutional network you will be playing with. The file contains l
lines, where each line is a number, as shown below. This means your GCN has l
layers, where the i
-th layer has a feature dimension (length of the feature vector per vertex) specified by the number in i
-th line. Number of propagation steps = l - 1
.
602
300
2
Remember to do a parameter config update whenever you have changed some of the configurations. ONLY ports info & layerconfig
info need to be set manually; Other info like dshmachines
and IPs will be handled by the setup-cluster
command automatically.
You can run the system on remote shells. Each context can be invoked from its master node [0]. Coordiniation server and weight servers should be run before the graph servers start.
To run the weight servers, on weight context master node [0]:
Weight$ cd dorylus/
Weight$ ./run/run-onnode weight <dataset>
To run the graph servers, on graph context master node [0]:
Graph$ cd dorylus/
Graph$ ./run/run-onnode graph <Dataset> <--l=num_lambdas> [--e=num_epochs] [--p] [--s=staleness_bound] # Dataset name should match what you specified on 'send-dataset'.
--p: Enable async-pipeline
Graph$ ./run/run-onnode graph <Dataset> cpu # For CPU version
Graph$ ./run/run-onnode graph <Dataset> gpu # For GPU version
NOTE: Have to ./gnnman/build-system graph [MODE]
to use CPU or GPU version
When the graph server finishes, termination messages will] be automatically sent to coordserver and weightservers.
When having trouble like "Text file busy" or "Port / Address in use", it is probably that on worker nodes your server processes are still running. Kill them by:
Remote$ ./gnnman/kill-zombies <Context> # Specify the corresponding context you want to kill.
The run-onnode
script automatically kills zombie processes before running, so you do not need to worry about it every time you run.
Running logs and output results are stored on the graph master node [0]. Check them by:
Graph$ vim ~/logfiles/<RunMark>.<IK>.log # Log file of run # <RunMark>.
Graph$ vim ~/outfiles/<RunMark>.<IK>/output_<Idx> # Output results of node <Idx> of run # <RunMark>.
Clear the output & log & temporary files on graph master node by:
Graph$ ./gnnman/clear-out
Previous page: 4. Setup Lambda Functions | Next page: None | Home: Home