Team Members: Suzy Anil, Isha Singh, Alisa Tian, Dingkun Yang
A competitive feature amongst retail platforms is product matching which allows companies to offer products at rates competitive to other retailers selling similar products. There are many methods that combine deep learning and traditional machine learning methods to analyze image and text information to calculate similarity between products, however there is little research comparing the effectiveness of integrating multimodal data (product images and descriptions) under this domain (Łukasik et al., 2021). Here, we compare the performance of both unimodal and multimodal models. We trained separate models for text (SBERT and DistilBERT) and images (ResNet50 and MobileNet); the DistilBERT and ResNet50 models outperform the other two in terms of F1 score and accuracy. The multimodal model used joint embeddings from DistilBERT and MobileNet to predict product labels, which outperformed both unimodal implementations. The integration of product images and titles offer the most useful information to find product matches on a particular platform.
Click on the image to watch the presentation
Shopee is the leading e-commerce platform in Southeast Asia and Taiwan; their platform contains products from vendors all over the world, predominantly in Singapore and Indonesia. In 2021, the company launched a Kaggle competition aimed at improving product matching algorithms to optimize their customers’ online shopping experience (Dane et al., 2021).
We used the following methods to train our models:
- SBERT
- DistilBERT
- ResNet50
- MobileNet
- Joint Embeddings of DistilBERT and MobileNet
The following table shows the performance of the models trained on the Shopee dataset. The DistilBERT and ResNet50 models outperform the other two in terms of F1 score and accuracy. The multimodal model used joint embeddings from DistilBERT and MobileNet* to predict product labels , which outperformed both unimodal implementations. The integration of product images and titles offer the most useful information to find product matches on a particular platform.
Note: Due to computational restritions, we substitued ResNet50 to MobileNet for the multimodal model.
Performance on Test Set
Model Type | Model | F1 Score | Accuracy |
---|---|---|---|
Text | SBERT | 0.43 | 0.45 |
Text | DistilBERT | 0.48 | 0.45 |
Image | ResNet50 | 0.45 | 0.48 |
Image | MobileNet | 0.38 | 0.40 |
Text & Image | Multimodal | 0.50 | 0.53 |
To reproduce our results, please follow the steps below:
- Clone the repository
- Install the requirements in
requirements.txt
usingpip install -r requirements.txt
- If you cannot access data in
00_source_data
in this repo, download the data from the Shopee Kaggle competition - Under
10_code
, run01_train_test_split.ipynb
to split the data into train, validation and test sets - Under
10_code
, run02_Bert_Model.ipynb
to train and use the embeddings from SBERT and DistilBERT - Under
10_code
, run03_ResNet50_Embeddings.ipynb
to train and use the embeddings from ResNet50 - Under
10_code
, run04_MobileNet_Embeddings.ipynb
to train and use the embeddings from MobileNet - Under
10_code
, run05_Multimodal_Model_Embeddings.ipynb
to train and use the embeddings from DistilBERT and MobileNet