Shopee - Price Match Guarantee: Match products with descriptions and images

Machine learning project

Duke University (MIDS) - Spring 2023

Team Members: Suzy Anil, Isha Singh, Alisa Tian, Dingkun Yang

Project Overview

A competitive feature amongst retail platforms is product matching which allows companies to offer products at rates competitive to other retailers selling similar products. There are many methods that combine deep learning and traditional machine learning methods to analyze image and text information to calculate similarity between products, however there is little research comparing the effectiveness of integrating multimodal data (product images and descriptions) under this domain (Łukasik et al., 2021). Here, we compare the performance of both unimodal and multimodal models. We trained separate models for text (SBERT and DistilBERT) and images (ResNet50 and MobileNet); the DistilBERT and ResNet50 models outperform the other two in terms of F1 score and accuracy. The multimodal model used joint embeddings from DistilBERT and MobileNet to predict product labels, which outperformed both unimodal implementations. The integration of product images and titles offer the most useful information to find product matches on a particular platform.

Presentation

Click on the image to watch the presentation

Data

Shopee is the leading e-commerce platform in Southeast Asia and Taiwan; their platform contains products from vendors all over the world, predominantly in Singapore and Indonesia. In 2021, the company launched a Kaggle competition aimed at improving product matching algorithms to optimize their customers’ online shopping experience (Dane et al., 2021).

Link to Data

Data Split

Methods

We used the following methods to train our models:

SBERT
DistilBERT
ResNet50
MobileNet
Joint Embeddings of DistilBERT and MobileNet

Results

The following table shows the performance of the models trained on the Shopee dataset. The DistilBERT and ResNet50 models outperform the other two in terms of F1 score and accuracy. The multimodal model used joint embeddings from DistilBERT and MobileNet* to predict product labels , which outperformed both unimodal implementations. The integration of product images and titles offer the most useful information to find product matches on a particular platform.

Note: Due to computational restritions, we substitued ResNet50 to MobileNet for the multimodal model.

Performance on Test Set

Model Type	Model	F1 Score	Accuracy
Text	SBERT	0.43	0.45
Text	DistilBERT	0.48	0.45
Image	ResNet50	0.45	0.48
Image	MobileNet	0.38	0.40
Text & Image	Multimodal	0.50	0.53

Reproducibility

To reproduce our results, please follow the steps below:

Clone the repository
Install the requirements in requirements.txt using pip install -r requirements.txt
If you cannot access data in 00_source_data in this repo, download the data from the Shopee Kaggle competition
Under 10_code, run 01_train_test_split.ipynb to split the data into train, validation and test sets
Under 10_code, run 02_Bert_Model.ipynb to train and use the embeddings from SBERT and DistilBERT
Under 10_code, run 03_ResNet50_Embeddings.ipynb to train and use the embeddings from ResNet50
Under 10_code, run 04_MobileNet_Embeddings.ipynb to train and use the embeddings from MobileNet
Under 10_code, run 05_Multimodal_Model_Embeddings.ipynb to train and use the embeddings from DistilBERT and MobileNet

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
00_source_data/shopee-product-matching		00_source_data/shopee-product-matching
10_code		10_code
15_after_processed		15_after_processed
30_results		30_results
40_docs		40_docs
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shopee - Price Match Guarantee: Match products with descriptions and images

Machine learning project

Duke University (MIDS) - Spring 2023

Team Members: Suzy Anil, Isha Singh, Alisa Tian, Dingkun Yang

Project Overview

Presentation

Data

Data Split

Methods

Results

Reproducibility

About

Releases

Packages

Languages

License

Yer1k/Shopee-Product-Price-Match-Guarantee

Folders and files

Latest commit

History

Repository files navigation

Shopee - Price Match Guarantee: Match products with descriptions and images

Machine learning project

Duke University (MIDS) - Spring 2023

Team Members: Suzy Anil, Isha Singh, Alisa Tian, Dingkun Yang

Project Overview

Presentation

Data

Data Split

Methods

Results

Reproducibility

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages