This is the repo for a demo to reproduce the experiment of D3: Differential Testing of Distributed Deep Learning with Model Generation.
To run the TensorFlow experiment, first do cd tensorflow/src
then follow the instructions in the README.md.
It will first generates models and inputs, then runs the experiment for distributed settings, and finally analyzes the results and generate a csv file including all the inconsistencies.
We use prior work muffin [1] to generate sequential models. We include a modified version of Muffin in the repo (link).
We use prior work from this repo [2] to build DLRM-like models in TensorFlow.
We recommend running TensorFlow experiment first because the PyTorch experiment needs to convert the sequential models generated by Muffin from TensorFlow to PyTorch.
To run the PyTorch experiment, first do cd pytorch/src
then follow the instructions in the README.md.
It will first generates or converts models and inputs, then runs the experiment for distributed settings, and finally analyzes the results and generates a csv file including all the inconsistencies.
[1] Gu Jiazhen, Luo Xuchuan, Zhou Yangfan, Wang Xin. Muffin: Testing Deep Learning Libraries via Neural Architecture Fuzzing. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).
[2] https://github.com/NodLabs/tensorflow-dlrm/tree/master