Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA: Add Interaction features of the history items for sequential dataloader #547

Merged
merged 4 commits into from
Dec 5, 2020
Merged

Conversation

rowedenny
Copy link
Contributor

@rowedenny rowedenny commented Dec 5, 2020

I have added the function as we discussed, and confirm that the feature loaded is correct.

The only issue is, I have not decided how to add SEQ features of the history items, say a number of comments for several movies in the Moview-Len. I am not sure about two aspects:

  1. Whether we need to define the max length of token sequences
  2. Whether we need to represent the history sequences as a 2D or 3D tensor.

@hyp1231 hyp1231 changed the title Add Interaction features of the history items for sequential dataloader FEA: Add Interaction features of the history items for sequential dataloader Dec 5, 2020
Copy link
Member

@hyp1231 hyp1231 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! It looks really good.
There is only one small issue, which can be fixed easily (referring to the inline reviews below). After fixing, we'll be delightful to merge this PR. :D

As for the concerns about SEQ features:

  1. The max length of SEQ features has been stored in Dataset.field2seqlen, referring to docs about property field2seqlen and arg seq_len.
  2. Maybe it's appropriate to be represented as a 3D tensor.

HOWEVER, as we are refactoring DataLoader in our 0.2.x branch, where the codes may be changed (but also accelerate a lot), we recommend this part remains NotImplemented temporarily. Thanks for your cautious!

@@ -57,6 +57,21 @@ def __init__(self, config, dataset,
self.target_time_field = self.time_field
self.item_list_length_field = config['ITEM_LIST_LENGTH_FIELD']

for field in dataset.inter_feat:
if field != self.iid_field and field != self.time_field:
Copy link
Member

@hyp1231 hyp1231 Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If field == self.uid_field, the corresponding attribute f'{field}_list_field' should not be setted.

Maybe this line can be replaced by:

if field not in [self.uid_field, self.iid_field, self.time_field]:

@@ -136,6 +151,18 @@ def augmentation(self, uid_list, item_list_index, target_index, item_list_length
for field in self.dataset.inter_feat:
if field != self.iid_field and field != self.time_field:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the aforementioned review.

@rowedenny rowedenny requested a review from hyp1231 December 5, 2020 03:58
@rowedenny
Copy link
Contributor Author

I agree that 3D tensor could be better, yet looking through the function _dict_to_interaction,
https://github.com/RUCAIBox/RecBole/blob/master/recbole/data/dataset/dataset.py#L1592
it currently cannot handle the 3D tensor, and that's is the reason I would treat it NotImplemented temporarily.

Probably, some new FeatureType need to be registered, and then to be applied.

Having said that, I may point out that the current bottleneck is we cannot easily incorporate the customized dataloader, and datasampler, since they all reply on the ModelType, which I think is not flexible. I am wondering if Recbole would refactorize the API to adapt to customized essential modules in near future?

@hyp1231
Copy link
Member

hyp1231 commented Dec 5, 2020

I agree that 3D tensor could be better, yet looking through the function _dict_to_interaction,
https://github.com/RUCAIBox/RecBole/blob/master/recbole/data/dataset/dataset.py#L1592
it currently cannot handle the 3D tensor, and that's is the reason I would treat it NotImplemented temporarily.

Probably, some new FeatureType need to be registered, and then to be applied.

Having said that, I may point out that the current bottleneck is we cannot easily incorporate the customized dataloader, and datasampler, since they all reply on the ModelType, which I think is not flexible. I am wondering if Recbole would refactorize the API to adapt to customized essential modules in near future?

Thanks for pointing this issue out! We are considering refactoring APIs about DataLoader/Sampler selection to make it easier to incorporate custom modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Model cannot get access to interaction features other than item and time.
2 participants