Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

coveooss/shopper-intent-prediction-nature-2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Shopper Intent Prediction from Clickstream E‑Commerce Data with Minimal Browsing Information

Public Data Release 1.0.0

Overview

This repo contains the description of the data released in conjunction with our Nature Scientific Reports paper Shopper Intent Prediction from Clickstream E‑Commerce Data with Minimal Browsing Information.

Data Download

The dataset is available for research and educational purposes here. To obtain the dataset, you are required to fill out a form with information about you and your institution, and agree to the Terms And Conditions for fair usage of the data.

For convenience, Terms And Conditions are also included in a pure txt format in this repo: usage of the data implies the acceptance of these Terms And Conditions.

Data Structure

The dataset is provided as one big text file (.csv), inside a zip archive containing an additional copy of the Terms And Conditions. The final dataset contains 5.433.611 individual events, and it is the first dataset of this kind to be released to the research community. A sample file is included in this repository, showcasing the data structure.

Field Type Description
session_id_hash string Hashed identifier of the shopping session. A session groups together events that are at most 30 minutes apart: if the same user comes back to the target website after 31 minutes from the last interaction, a new session identifier is assigned.
event_type enum The type of event according to the Google Protocol, one of { pageview , event }; for example, an add event can happen on a page load, or as a stand-alone event.
product_action enum One of { detail, add, purchase, remove, click }. If the field is empty, the event is a simple page view (e.g. the FAQ page) without associated products.
product_skus_hash string If the event is a product event, hashed identifiers of all products in the event (e.g. all the products in a transaction), pipe separated.
server_timestamp_epoch_ms int Epoch time, in milliseconds. The epoch time has been shifted in time to further anonymize the data.
hashed_url string Hashed url of the current web page.

We refer the reader to the original paper for an extended explanation of how to use the dataset for the clickstream prediction challenge. Usage of this data implies the acceptance of the Terms And Conditions as set forward in the download page.

Contacts

For questions about the paper, please refer to the corresponding author, Lucas Lacasa.

For questions about the dataset, please reach out to Jacopo Tagliabue.

Acknowledgments

The original paper is a product of collaboration between industry and academia, over a dataset gently provided by Coveo. The authors of the paper are:

  • Borja Requena - Institut de Ciencies Fotoniques, The Barcelona Institute of Science and Technology
  • Giovanni Cassani - Department of Cognitive Science and Artificial Intelligence, Tilburg University
  • Jacopo Tagliabue - Coveo AI Labs
  • Ciro Greco - Coveo AI Labs
  • Lucas Lacasa - School of Mathematical Sciences, Queen Mary University of London

The authors wish to thank Richard Tessier and Coveo's legal team for supporting our research and believing in this data sharing initiative.

How to Cite our Work

If you make use of this dataset, please cite our work:

@article{Requena2020,
author = {Requena, Borja and Cassani, Giovanni and Tagliabue, Jacopo and Greco, Ciro and Lacasa, Lucas},
title = {Shopper intent prediction from clickstream e-commerce data with minimal browsing information},
year = {2020},
journal = {Scientific Reports},
pages   = {2045-2322},
volume  = {10},
doi = {10.1038/s41598-020-73622-y}
}

Releases

No releases published

Packages

No packages published