Code for ACM MM'17 paper "Learning Fashion Compatibility with Bidirectional LSTMs" [paper].
Parts of the code are from an older version of Tensorflow's im2txt repo GitHub.
The corresponding dataset can be found on GitHub or Google Drive.
Author: Xintong Han
Contact: xintong@umd.edu
Polyvore.com is a popular fashion website, where user can create and upload outfit data. Here is an exmaple.
- TensorFlow 0.10.0 (instructions)
- NumPy (instructions)
- scikit-learn
Newer versions of Tensorflow prevent me from doing inference with my old code and restoring my models trained using this version. However, I have a commit that supports training using TensorFlow 1.0 or greater idd1e03e.
Download the dataset and put it in the ./data folder:
- Decompress polyvore.tar.gz into ./data/label/
- Decompress plyvore-images.tar.gz to ./data/, so all outfit image folders are in ./data/images/
- Run the following commands to generate TFRecords in ./data/tf_records/:
python data/build_polyvore_data.py
This model requires a pretrained Inception v3 checkpoint file to initialize the network.
This checkpoint file is provided by the TensorFlow-Slim image classification library which provides a suite of pre-trained image classification models. You can read more about the models provided by the library here.
Run the following commands to download the Inception v3 checkpoint.
# Save the Inception v3 checkpoint in model folder.
wget "http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz"
tar -xvf "inception_v3_2016_08_28.tar.gz" -C ${INCEPTION_DIR}
rm "inception_v3_2016_08_28.tar.gz"
./train.sh
The models will be saved in model/bi_lstm
Download the trained models from the final_model folder on Google Drive and put it in ./model/final_model/model.ckpt-34865.
To do all three kinds of tasks mentioned in the paper. We need to first extract the features of test images:
./extract_features.sh
And the image features will be in data/features/test_features.pkl.
You can also perform end-to-end inference by modifying the corresponding code. For example, input a sequence of images and output a compatibility score.
./fill_in_blank.sh
Note that we further optimized some design choices in the released model. It can achieve 73.5% accuracy, which is higher than the number reported in our paper.
./predict_compatibility.sh
Different from the training process where the loss is calculated in each mini batch, during testing, we get the loss againist the whole test set. This is pretty slow, maybe a better method could be used (e.g., using distance between LSTM predicted representation and the target image embedding).
We found that a late fusion of different single models (Bi-LSTM w/o VSE + VSE + Siamese) can achieve superior results on all tasks.
- Add multiple choice inference code.
- Add compatibility prediction inference code.
- Add image outfit generation code. Very similar to compatibility prediction, you can try to do it yourself if in a hurry.
- Release trained models.
- Release Siamese/VSE models for comparison.
- Polish the code.
If this dataset helps your research, please cite our paper:
@inproceedings{han2017learning,
author = {Han, Xintong and Wu, Zuxuan and Jiang, Yu-Gang and Davis, Larry S},
title = {Learning Fashion Compatibility with Bidirectional LSTMs},
booktitle = {ACM Multimedia},
year = {2017},
}