-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use TO_RS eval setting with sequential model #718
Comments
Sorry, we can only support TO_LS eval setting with sequential model currently, TO_RS is not supported. We will consider it in the later version. |
@chenyushuo I am wondering if this enhancement has been assigned to any collaborator, if not I would be happy to take it. In general, there are multiple temporal associated data partitions:
Would you mind clarifying which one(s) we would like to implement? |
@rowedenny Firstly, thanks so much for your enthusiasm for making contributions to RecBole. |
I assume we can directly reimplement
|
@rowedenny Hi! Firstly, thanks for your warm-hearted advice and we have read your code carefully. If my understand is correct, in your implemention, the data splitting is base on the user number, which is a little different from the split strategy in RecBole. To be more specific, let's take an example: If the split ratio is [0.8, 0.1, 0.1], and the original data is like this (NOTE: it is not the atomic file, just an example):
In your strategy, data will be splitted as:
In RecBole, data will be splitted as:
(I must admit that it's quite hard to describe the difference between them with words, hope you can understand. ) By the way, the problem of data split (by ratio) for sequential models is that: compared with other models, we need to make data augmentation for sequential models. And there are some details need to be discussed about data augmentation and data split (by ratio) like: should the data augmentation done before or after the data split (by ratio)? That's why we don't support data split (by ratio) for sequential models until now. If you find some referrence (paper or other projects' implemention) about this feature or you have some advice about it, please let us know, thx! |
Hi @2017pxy, firstly thank you for your time in checking the code. Given the discussion above, here are my replies.
Yes, I assume we need data augmentation before the split. So it means that the total number of sequence (train + valid + test) will be the same with leave-one-out. By the way, I notice that the data augmentation for the sequential dataset adds a new attribute |
@rowedenny Hi, I think your split strategy is reasonable, we will carefully consider your design and discuss the implementation details, and it may take some time. About the
Following your design, we create a variable called |
I see. So please allow me to confirm two things:
Thanks for pointing out my mistake and sorry for the inconvience. |
I have looked through my implementation of TO_RS, with the fixation of Thank you! |
I am sorry to tell you that your understanding is not right. The index of If you read the code, you will find that we rewrite the
Then, the
And in Hope you can understand. |
Yes. Thank you for the clarification. For |
@rowedenny Yes, you are right. |
I want my sequential model to train on the first 75% of ML dataset and test on the rest. The split should be temporal ordered and ratio-based. In the end, it throws an error.
Steps to reproduce:
returns
I thought this should work. I tried same script with SASRec and SHAN (both sequential), same result. It works though with general NeuMF.
I've got a feeling that I don't understand sth, however nothing related is mentioned in the documentation...
The text was updated successfully, but these errors were encountered: