Incorrect example command in README , MNIST generate data script is broken and improving documentation in key parts of the code #7

rmcavoy · 2019-11-05T17:22:51Z

The readme says to run dnse the command
python main_sda.py --method dsnet --cfg cfg/digits-a.json --bb lenetplus --bs 256 --src MT --tgt MM --nc 10 --size 32 --log-itv 100 --dropout --hybridize
This needs a minor correction as "dsnet" should be replaced with "dsne" since dsnet is not currently an option for main_sda.py.

A larger problem is the the mnist gen_dataset.py script transforms the dataset into the incorrect format in several ways and consequentially errors out immediately when one tries to load the dataset with the example commands (I used dsne with train_sda.py)

It creates images that are batch, height, width .reshape(-1, rows, cols) (on line 16 of gen_dataset.py) instead of batch, height, width, channel (.reshape(-1, rows, cols,1)) which causes the resize transforms to fail
it stores a pkl of a dictionary with keys TR and TE that each contain their particular data in lists [img, lbl]. While this is the correct shape for deeper in the code (e.g. training_sda.py), load_pkl in utils/io.py requires that the data be unpacked with tr_x, tr_y, te_x, te_y = pkl.load(f)
It also by default uses self.__class__.__name__ instead of self.__class__.__name__.lower() when the cfg file expects a lower cast pickle file name and the class is all-caps by default.

As a more minor correction, it would also be nice if the gen_dataset.py file was located in the datasets folder as that is where the datasets should be located. It would also be good to provide a fully documented README example where you go through every step of the calculation (including the data moving and conversion) so users don't have to dig through the code and example inputs files to find out where they need to put the datasets and with what directory names.

Also for future maintainabity and expandability, it would be good if the commenting or variable naming in the important pieces of the code was expanded and clarified.

For example, at first glance for someone new to mxnet, the loop inside dnse's train_batch (ln 1018 of training_sda.py) appears to be using the zip function to loop over a list of samples in the minibatch but is actually looping over the list of ctx that the batch is split over. That may be obvious to someone very familiar with mxnet but will not be necessarily be obvious or readable at all for new users coming from e.g. pytorch where the gpu context split is handled seamlessly in the background. This confusion is also not helped by the naming scheme in e.g. dSNELoss (in custom layers) which refers to sizes as "bs_src" and "bs_tgt" or "N, K" and "M, K" which are only obvious, if A the user has not assumed the loop in train_batch is over the batch index and B the user has a good idea what variable refers to which already.

The text was updated successfully, but these errors were encountered:

kailigo · 2019-12-27T22:03:54Z

@rmcavoy . That is exactly what I have met. I spent several days figuring these out. I doubt if the authors have tested the code before uploading them.

Btw, @rmcavoy, did you manage to have the code run for the other datasets?

rmcavoy · 2019-12-30T19:08:32Z

No, I didn't bother to try to get it working for other datasets as I moved onto other things. WRT to a lack of testing, I would assume that they have a server where all the data is layed out and formatted in the correct manner so there was never any need for them to get these pieces running for them to use it.

joshuacwnewton · 2020-01-29T01:56:54Z

Thank you kindly for taking the time to point these issues out, @kailigo and @rmcavoy! You've potentially saved me a lot of time and troubleshooting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect example command in README , MNIST generate data script is broken and improving documentation in key parts of the code #7

Incorrect example command in README , MNIST generate data script is broken and improving documentation in key parts of the code #7

rmcavoy commented Nov 5, 2019 •

edited

Loading

kailigo commented Dec 27, 2019

rmcavoy commented Dec 30, 2019

joshuacwnewton commented Jan 29, 2020

Incorrect example command in README , MNIST generate data script is broken and improving documentation in key parts of the code #7

Incorrect example command in README , MNIST generate data script is broken and improving documentation in key parts of the code #7

Comments

rmcavoy commented Nov 5, 2019 • edited Loading

kailigo commented Dec 27, 2019

rmcavoy commented Dec 30, 2019

joshuacwnewton commented Jan 29, 2020

rmcavoy commented Nov 5, 2019 •

edited

Loading