Sorry, no CPU-only mode. You have to use an nvidia card to train models.
Test environment:
- GTX 1070
- 16 GB RAM + 8 GB Swap
- At least 30 GB free disk space
- (it can be less if you turn off some of the joblib disk caching)
- Docker 17.12.0-ce
- Nvidia-docker 2.0
- Transformer model comes from Yu-Hsiang Huang's implementation. His repo is included in "attention-is-all-you-need-pytorch" folder via git subtree.
- LSTNet model is largely inspired from GUOKUN LAI's implementation.
- The model structure is inspired by the work of Sean Vasquez and Arthur Suilin.
First build the image. Example command: docker build -t favorita .
Then spin up a docker container:
docker run --runtime=nvidia --rm -ti \
-v /mnt/Data/favorita_cache:/home/docker/labs/cache \
-v /mnt/Data/favorita_data:/home/docker/labs/data \
-p 6006:6006 favorita bash
- It is recommended to manually mount the data and cache folder
- port 6006 is for running tensorboard inside the container
Download and extract the data files from Kaggle into data
folder.
We're going to assume you're using the BASH prompt inside the container in the rest of this README.
python prepare_seq_data.py
For now there are two types of model ready to be trained:
- Transformer (fit_transformer.py)
- LSTNet (fit_lstnet.py)
The training scripts use Sacred to manage experiments. It is recommended to set a seed explicitly via CLI:
python fit_transformer.py with seed=93102
You can also use Mongo to save experiment results and hyper-parameters for each run. Please refer to the Sacred documentation for more details.
The CSV output will be saved in cache/preds/val/
and cache/preds/test/
respectively.
Training and validation loss curves, and some of the embeddings are logged in tensorboard format. Launch tensorboad via:
tensorboard --logdir runs
Then visit http://localhost:6006 for the web interface.
- Ensembling script: I made some changes to the outputs of model training scripts so they are more readable. But that means ensembling script needs to be updated as well. (For those who want to try: the ground truth for validation set is stored in
cache/yval_seq.npy
.) - Encoder/Decoder and Encoder/MLP models with LSTM, GRU, QRNN, SRU units: I tried a lot of different stuffs for this competition. But I feel the code could use some refactoring, so they are removed for now.
- Tabular data preparation and models: My GBM models is mediocre at best, so not really worth sharing here. But as I mentioned in the blog post. For those store/item combination that were removed by the 56-day nonzero filter, using a GBM model to predict values for them will give you a better score than predicting zeros.