Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SasRec tutorial #186

Merged

Conversation

spirinamayya
Copy link
Contributor

Added sasrec tutorial

@blondered
Copy link
Collaborator

What does os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8" do? Let's write a comment

@blondered
Copy link
Collaborator

Let's process data exactly the same as in baselines tutorial, not changes in code or outputs

@blondered
Copy link
Collaborator

The image of the model is too small, it's barely readable. Let's please make it more user friendly.
Also we need to make in MTS boards, since Miro is gonna be off soon.
Loss should be dropped from the scheme because we are going to support a bunch of losses with the same model architecture.
And we can drop timeline mask because we plan to stop multiplying on it and start adding it as a mask to multi-head-attention together with the causal mask.

@blondered
Copy link
Collaborator

Let's make the following structure of the tutorial (which is different from baselines tutorial because we have much more information for user):

  • Prepare data
  • Model description (just a few paragraphs and maybe an image from the paper, no preprocessing for now). Let's write that it's a causal model in contrast with BERT4Rec. I can give you some papers to look at, we will also add them to links.
  • Recommendations (one paragraph, maybe an image). Just to give the overall idea of the model
  • RecTools implementation (one paragraph on our features). Now we write that we did exactly the authors architecture. We give your image of the net here. We write about supported losses here (in contrast to original model for now we have only cross-entropy loss but we will support other variants. And everything that is in your "additional details" section goes here
  • Model application. You're good here. But let's pick a user with one interactions exactly in this section. Let's show that user was not present in train at all. And write a few words: why (and when) this is possible. Let's also add an example of user that was present in original train dataset but could not get recommendations (his item is rare and unknown). We have on_unsupported_targets flag in recommend method so that we don't get en error.
  • Links
  • Under the hood: Dataset processing (your "Preprocessing")
  • Under the hood: Transformer layers (your "Self-attention block structure")
  • Under the hood: ... (whatever hardcore we want to show further)

@blondered
Copy link
Collaborator

Let's add table of contents. Here's an example: https://github.com/MobileTeleSystems/RecTools/blob/experimental/sasrec/examples/8_debiased_metrics.ipynb
Links do not work in github so we don't add them now. But we still need to show the structure of the tutorial

@blondered
Copy link
Collaborator

We will also show some basic Lightning functionality to this tutorial. And add custom blocks usage. But in the next PRs.

I really liked your Preprocessing sections btw. Looks great

@blondered
Copy link
Collaborator

blondered commented Sep 6, 2024

Links:
Turning dross into gold loss: https://arxiv.org/abs/2309.07602
gSASRec: https://arxiv.org/pdf/2308.07192

I think we should also rename SasRec to SASRec everywhere in the tutorial since it's more common. We will rename the class too at some point.

As for model description, I like gSASRec paper description from Sasha Petrov here:

"Transformer [38]-based models have recently outperformed other
models in Sequential Recommendation [17, 24–26, 29, 36]. Two
of the most popular Transformer-based recommender models are
BERT4rec [36] and SASRec [17]. The key differences between the
modelsinclude different attention mechanism (bi-directional vs. unidirectional), different training objective (Item Masking vs. Shifted
Sequence), different loss functions (Softmax loss vs. BCE loss), and,
RecSys ’23, September 18–22, 2023, Singapore, Singapore Aleksandr Petrov and Craig Macdonald
importantly, different negative sampling strategies (BERT4Rec does
not use sampling, whereas SASRec samples 1 negative per positive)"

So we can say that SASRec is a transformer-based sequential model with unidirectional attention mechanism and "Shifted Sequence" training objective.
In our implementation we don't provide negative sampling for now and use softmax loss instead.
Also you need to explain in words what is happening in the model (we are using item embeddings from user interactions sequence and feed them to multi-head self-attention to acquire user sequence latent represenation). You can rephase this but you need to explain it anyway :)

@blondered
Copy link
Collaborator

Let's fix the image and we are merging

@blondered blondered self-requested a review September 13, 2024 12:09
@blondered blondered merged commit afd0463 into MobileTeleSystems:experimental/sasrec Sep 13, 2024
7 checks passed
@spirinamayya spirinamayya deleted the tutorial/sasrec2 branch October 1, 2024 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants