[Question] Add features to .inter atomic file #608

mayaKaplansky · 2020-12-22T21:24:49Z

Hi
I see I can have user features and item features, but my dataset has interaction features.
Can I express them in the .inter file as additional columns?

linzihan-backforward · 2020-12-23T01:55:20Z

Yes, you can add any feature columns in the .inter file.
Also , there are a few dataset paramaters that you can set to control the pipeline, such as:

load_col: 
          inter: [user_id, item_id, rating, timestamp]

More instruction can be found in our doc.

hyp1231 · 2020-12-23T01:59:07Z

Thanks for Zihan's comment about selectively loading interaction features from atomic files to Dataset.

As for accessing features in models, generally, you can just fetch them from Interaction.
Besides, if you are developing sequential models and want to get historical interaction features, please see #546 for details.

mayaKaplansky · 2020-12-23T07:00:15Z

Thanks
Is 'rating' a mandatory field?
I have other features.
Also, as for timestamp. is it OK to just have the sequential order as the time (1,2,3)? Is it OK that all the sessions are marked like this, just to indicate the order within the session?

rowedenny · 2020-12-23T19:48:32Z

To my best knowledge, 'rating' is NOT a mandatory field, as long as you do not specify it in the config file.
For the timestamps, you can mark each interaction with a numeric value. If you create your dataset object as an instance of SequentialDataset, then it will sort based on the user field and time filed (user firstly) with ascending order, which indicates the order.

Just a reminder that if the user in your dataset has multiple sessions, then the interactions in different sessions will be mixed since they are all marked with the same timestamp. Other than this minor tip, I think it should be fine.

For more details, please refer to the following code: https://github.com/RUCAIBox/RecBole/blob/master/recbole/data/dataset/sequential_dataset.py#L74

mayaKaplansky · 2020-12-24T18:53:14Z

Thanks for the timestamp tip!

Can you explain what you mean by "As for accessing features in models, generally, you can just fetch them from Interaction." And what's the difference between features and historical features?
If I have features in the interaction file, and I want to use GRU4Rec, should I use the original one or the GRU4recf?
Thanks!

rowedenny · 2020-12-24T21:36:14Z

Since I happened to code this part, I think I am appropriate to reply:
The trickly part is related to the dataloader implementation:

If I remember correctly, in the general recommendation models, the implementations automatically fetch all the interaction features, however, in the sequential recommendation, the original dataloader only fetch the fields including user, item, and timestamp, but neglect the rest fields. That is the part that I PR for. So going back to your question, if your model is a sequential model, and the feature is an interaction feature, then it is the historical feature we have talked about.
If I remember correctly, the features applying in the GRU4recf are the user/item profile features, instead of interaction features. To get access to them, you need to 1) specify a field name attribute for the model 2) in the calculate_loss, using interaction[self.field_name] to fetch the features

mayaKaplansky · 2020-12-25T09:03:39Z

Thanks!
for #1 - I understand that after the PR I can add new features to the interaction file, but I couldn't figure if I should specify that somewhere in the code

for #2 - So if my features are only on the interaction (and not user or item) I should use Gru4Rec and not GRU3recf. What do you mean by "to get access to them" - do you mean that the model will consider them?

rowedenny · 2020-12-25T16:02:10Z

Please allow me to reply with an example, say for the following atom file,

user_id:token	item_id:token	rating:float	timestamp:float
1	1193	5	978302107
1	661	3	978302108
1	743	2	978302109

After the data augmentation, we expect to generate the following two sequences,

user_id:token	item_id_sequence:token	ITEM_ID	FEATURE_SEQUENCE_FIELD_NAME
1	1193	661	5
1	1193, 661	743	5, 3

Going back to your questions

You need to specify the suffix to generate the necessary X_sequence_filed_name(s). For example, to generate the item_sequence to predict the target item. Recbole has implemented the mapping, as long as you specify "LIST_SUFFIX: _list", as shown in config, then it will generate the sequence field name by adding the suffix to the corresponding field name, say item_id:token --> item_id_sequence:token. I think given the PR above, it will also do for the other interaction features.
Now we need to fetch it within the model to calculate the loss. Firstly assign an attribute to the model to specify the filed name, then get access to the feature_seq from interaction. More concretely, here is an example to get access to the feature seq

def __init__(self, config, dataset):
    super(MyModel, self).__init__(config, dataset)
    self.FEATURE = config['FEATURE_FIELD']
    self.FEATURE_SEQ = self.FEATURE + config['LIST_SUFFIX']

and then you can get access to it, for example, within function calculate_loss,

def calculate_loss(self, interaction): 
    feature_seq = interaction[self.FEATURE_SEQ]

mayaKaplansky · 2020-12-27T19:35:50Z

Thank you!
Is it correct that the changes you suggest above are only if I want to predict additional features?
If I am interested to only predict the next item_id, but use the features as additional info that can influence the model (used for learning), then I don't need to do these changes?

rowedenny · 2020-12-27T22:09:48Z

Please allow me to confirm your user case, are you dealing with the case like:
For the movie recommendation, you would like to not only consider the movie that a user comments, but also the rating such that to which level the user likes the movie? (I believe this is the use the features as additioinal info that you describe)
In that case, you definitely need the changes.

rowedenny · 2020-12-28T19:40:41Z

Be free to correct me if I am wrong.

Only the basic fields of the data frame have been pre-registered in the abstract_recommender.py
For example, if your model is inherited from SequentialRecommender, then the attributes

self.USER_ID = config['USER_ID_FIELD']
self.ITEM_ID = config['ITEM_ID_FIELD']
self.ITEM_SEQ = self.ITEM_ID + config['LIST_SUFFIX']
self.ITEM_SEQ_LEN = config['ITEM_LIST_LENGTH_FIELD']
self.POS_ITEM_ID = self.ITEM_ID
self.NEG_ITEM_ID = config['NEG_PREFIX'] + self.ITEM_ID

will get access to the corresponding fields in the dataframe. Other than that, all the customized fields need to be explicitly specified by the user.

Say if you wanna fetch the field named NUM_OF_TIMES, then you need to 1) register the filed name in the config 2) assign an attribute within your customized model, e.g self.NUM_OF_TIMES = config['NUM_OF_TIMES_FIELD'] 3) fetch the field via interaction[self.NUM_OF_TIMES]

mayaKaplansky · 2020-12-28T19:52:23Z

Thanks, I guess we are waiting for an answer in the other thread :)

ShanleiMu · 2020-12-29T03:37:26Z

Thanks for @rowedenny 's replies.

@mayaKaplansky If you want to use the additional inter feature fields in your sequential recommender. You can follow this #608 (comment) of rowedenny.

mayaKaplansky · 2020-12-29T06:54:11Z

Thanks! Use in a way that the model will use them for learning, or use in a way that they can be predicted?
I don't want to predict them, just use them for learning.
If I don't change as #608 (comment) of rowedenny, then what do your changes do in the model?

mayaKaplansky · 2020-12-29T20:45:46Z

Thank you for all your help.
so in abstract_recommender.py this is how it looks like now:

class SequentialRecommender(AbstractRecommender):
    """
    This is a abstract sequential recommender. All the sequential model should implement This class.
    """
    type = ModelType.SEQUENTIAL

    def __init__(self, config, dataset):
        super(SequentialRecommender, self).__init__()

        # load dataset info
        self.USER_ID = config['USER_ID_FIELD']
        self.ITEM_ID = config['ITEM_ID_FIELD']
        self.ITEM_SEQ = self.ITEM_ID + config['LIST_SUFFIX']
        self.ITEM_SEQ_LEN = config['ITEM_LIST_LENGTH_FIELD']
        self.POS_ITEM_ID = self.ITEM_ID
        self.NEG_ITEM_ID = config['NEG_PREFIX'] + self.ITEM_ID
        self.max_seq_length = config['MAX_ITEM_LIST_LENGTH']
        self.n_items = dataset.num(self.ITEM_ID)
        self.JobGroup = config['JobGroup_FIELD']
        self.JobGroup_SEQ = self.JobGroup + config['LIST_SUFFIX']
        self.AgeGroup = config['AgeGroup_FIELD']
        self.AgeGroup_SEQ = self.AgeGroup + config['LIST_SUFFIX']
        self.GenderID = config['GenderID_FIELD']
        self.GenderID_SEQ = self.GenderID + config['LIST_SUFFIX']
        self.PatientLocationID = config['PatientLocationID_FIELD']
        self.PatientLocationID = self.PatientLocationID + config['LIST_SUFFIX']

Is this OK?

You also explained I need to specify that in the config file which I assume you meant:

# Selectively Loading
load_col:
    inter: [session_id, item_id, timestamp, PatientLocationID,GenderID,AgeGroup, JobGroup]

And your last instruction was: fetch the field via interaction[self.NUM_OF_TIMES]
I couldn't find where I should do the fix, can you elaborate?

many thanks1

rowedenny · 2020-12-30T19:10:33Z

The customized model looks OK for me. A minor tip, you may create a model inherited from SequentialRecommender instead of AbstractRecommender, and then create the additional fields you need. The benefit is that class SequentialRecomender has created specific functions, e.g data augmentation, and also correspondingly sample class RepeatableSampler has pre-defined based on the class of the model.
For the NUM_OF_TIMES, I notice that you raise another thread for the session, yet I reply in this thread. However the idea to fetch the customize field still works, for example, if you wanna fetch job_group when in the function calculate_loss of the customized model, you can call interaction[self.JobGroup], and then it expected to make it.
Finally I strongly suggest you examine the field values. I would like to firstly shrink the dataset into a smaller one, say only one user with several interactions, and then print out the dataframe. Next to check if the field value that fetches from interaction is identical.

mayaKaplansky · 2021-01-03T13:30:13Z

Thank you!

mayaKaplansky added the bug Something isn't working label Dec 22, 2020

hyp1231 added FAQ Frequently Asked Questions and removed bug Something isn't working labels Dec 23, 2020

mayaKaplansky mentioned this issue Dec 28, 2020

[Question] How does a session start and end? #617

Closed

chenyushuo closed this as completed Jan 13, 2021

mayaKaplansky mentioned this issue Jan 13, 2021

[💡SUG] Get a prediction #632

Closed

Sherry-XLL added the dataset label Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Add features to .inter atomic file #608

[Question] Add features to .inter atomic file #608

mayaKaplansky commented Dec 22, 2020

linzihan-backforward commented Dec 23, 2020

hyp1231 commented Dec 23, 2020

mayaKaplansky commented Dec 23, 2020

rowedenny commented Dec 23, 2020

mayaKaplansky commented Dec 24, 2020

rowedenny commented Dec 24, 2020 •

edited

Loading

mayaKaplansky commented Dec 25, 2020

rowedenny commented Dec 25, 2020

mayaKaplansky commented Dec 27, 2020

rowedenny commented Dec 27, 2020

rowedenny commented Dec 28, 2020

mayaKaplansky commented Dec 28, 2020

ShanleiMu commented Dec 29, 2020

mayaKaplansky commented Dec 29, 2020

mayaKaplansky commented Dec 29, 2020

rowedenny commented Dec 30, 2020 •

edited

Loading

mayaKaplansky commented Jan 3, 2021

[Question] Add features to .inter atomic file #608

[Question] Add features to .inter atomic file #608

Comments

mayaKaplansky commented Dec 22, 2020

linzihan-backforward commented Dec 23, 2020

hyp1231 commented Dec 23, 2020

mayaKaplansky commented Dec 23, 2020

rowedenny commented Dec 23, 2020

mayaKaplansky commented Dec 24, 2020

rowedenny commented Dec 24, 2020 • edited Loading

mayaKaplansky commented Dec 25, 2020

rowedenny commented Dec 25, 2020

mayaKaplansky commented Dec 27, 2020

rowedenny commented Dec 27, 2020

rowedenny commented Dec 28, 2020

mayaKaplansky commented Dec 28, 2020

ShanleiMu commented Dec 29, 2020

mayaKaplansky commented Dec 29, 2020

mayaKaplansky commented Dec 29, 2020

rowedenny commented Dec 30, 2020 • edited Loading

mayaKaplansky commented Jan 3, 2021

rowedenny commented Dec 24, 2020 •

edited

Loading

rowedenny commented Dec 30, 2020 •

edited

Loading