Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA: Add docs/ to RecBole #735

Merged
merged 1 commit into from
Feb 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@
saved/
*.lprof
*.egg-info/
docs/build/
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Binary file added docs/source/asset/afm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/autoint.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/bert4rec.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/bpr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/caser.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/cdae.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/cke.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/convncf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/data_flow_en.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/dcn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/deepfm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/dgcf.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/din.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/dmf.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/dssm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/enmf.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/evaluation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fdsa.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/ffm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fnn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fossil.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fpmc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fwfm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/gcmc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/gcsan.png
Binary file added docs/source/asset/gru4rec.png
Binary file added docs/source/asset/gru4recf.png
Binary file added docs/source/asset/hgn.jpg
Binary file added docs/source/asset/hrm.jpg
Binary file added docs/source/asset/kgat.png
Binary file added docs/source/asset/kgcn.png
Binary file added docs/source/asset/kgnnls.png
Binary file added docs/source/asset/ksr.jpg
Binary file added docs/source/asset/ktup.png
Binary file added docs/source/asset/lightgcn.png
Binary file added docs/source/asset/line.png
Binary file added docs/source/asset/lr.png
Binary file added docs/source/asset/macridvae.png
Binary file added docs/source/asset/mkr.png
Binary file added docs/source/asset/multidae.png
Binary file added docs/source/asset/multivae.png
Binary file added docs/source/asset/nais.png
Binary file added docs/source/asset/narm.png
Binary file added docs/source/asset/neumf.png
Binary file added docs/source/asset/nextitnet.png
Binary file added docs/source/asset/nfm.jpg
Binary file added docs/source/asset/ngcf.jpg
Binary file added docs/source/asset/nncf.png
Binary file added docs/source/asset/npe.jpg
Binary file added docs/source/asset/pnn.jpg
Binary file added docs/source/asset/repeatnet.jpg
Binary file added docs/source/asset/ripplenet.jpg
Binary file added docs/source/asset/s3rec.png
Binary file added docs/source/asset/sasrec.png
Binary file added docs/source/asset/shan.jpg
Binary file added docs/source/asset/spectralcf.png
Binary file added docs/source/asset/srgnn.png
Binary file added docs/source/asset/stamp.png
Binary file added docs/source/asset/transrec.png
Binary file added docs/source/asset/widedeep.png
Binary file added docs/source/asset/xdeepfm.png
74 changes: 74 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import sphinx_rtd_theme
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))


# -- Project information -----------------------------------------------------

project = 'RecBole'
copyright = '2020, RecBole Contributors'
author = 'AIBox RecBole group'

# The full version, including alpha/beta/rc tags
release = '0.2.0'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx_copybutton',
]

autodoc_mock_imports = ["pandas", "pyecharts"]
# autoclass_content = 'both'

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = 'en'

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
# html_theme = 'alabaster'


html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
201 changes: 201 additions & 0 deletions docs/source/developer_guide/customize_dataloaders.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
Customize DataLoaders
======================
Here, we present how to develop a new DataLoader, and apply it into our tool. If we have a new model,
and there is no special requirement for loading the data, then we need to design a new DataLoader.


Abstract DataLoader
--------------------------
In this project, there are three abstracts: :class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader`,
:class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin`, :class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin`.

In general, the new dataloader should inherit from the above three abstract classes.
If one only needs to modify existing DataLoader, you can also inherit from the it.
The documentation of dataloader: :doc:`../../recbole/recbole.data.dataloader`


AbstractDataLoader
^^^^^^^^^^^^^^^^^^^^^^^^^^
:class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader` is the most basic abstract class,
which includes three functions: :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.pr_end`,
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._shuffle`
and :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._next_batch_data`.
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.pr_end` is the max
:attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.pr` plus 1.
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._shuffle` is leverage to permute the dataset,
which will be invoked by :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.__iter__`
if the parameter :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.shuffle` is True.
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._next_batch_data` is used to
load the next batch data, and return the :class:`~recbole.data.interaction.Interaction` format,
which will be invoked in :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.__next__`.

In :class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader`,
there are two functions to assist the conversion of :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._next_batch_data`,
one is :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._dataframe_to_interaction`,
and the other is :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._dict_to_interaction`.
They both use the functions with the same name in :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.dataset`.
The :class:`pandas.DataFrame` or :class:`dict` is converted into :class:`~recbole.data.interaction.Interaction`.

In addition to the above three functions, two other functions can also be rewrite,
that is :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.setup`
and :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.data_preprocess`.

:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.setup` is used to tackle the problems except initializing the parameters.
For example, reset the :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.batch_size`,
examine the :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.shuffle` setting.
All these things can be rewritten in the subclass.
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.data_preprocess` is used to process the data,
e.g., negative sampling.

At the end of :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.__init__`,
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.setup` will be invoked,
and then if :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.real_time` is ``True``,
then :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.data_preprocess` is recalled.

NegSampleMixin
^^^^^^^^^^^^^^^^^^^^^^^^^^
:class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin` inherent from
:class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader`, which is used for negative sampling.
It has three additional functions upon its father class:
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._batch_size_adaptation`,
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._neg_sampling`
and :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.get_pos_len_list`.

Since the positive and negative samples should be framed in the same batch,
the original batch size can be not appropriate.
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._batch_size_adaptation` is used to reset the batch size,
such that the positive and negative samples can be in the same batch.
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._neg_sampling` is used for negative sampling,
which should be implemented by the subclass.
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.get_pos_len_list` returns the positive sample number for each user.

In addition, :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.setup`
and :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.data_preprocess` are also changed.
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.setup` will
call :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._batch_size_adaptation`,
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.data_preprocess` is used for negative sampling
which should be implemented in the subclass.

NegSampleByMixin
^^^^^^^^^^^^^^^^^^^^^^^^^^
:class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin` inherent
from :class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin`,
which is used for negative sampling by ratio.
It supports two strategies, the first one is ``pair-wise sampling``, the other is ``point-wise sampling``.
Then based on the parent class, two functions are added:
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin._neg_sample_by_pair_wise_sampling`
and :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin._neg_sample_by_point_wise_sampling`.


Example
--------------------------
Here, we take :class:`~recbole.data.dataloader.user_dataloader.UserDataLoader` as the example,
this dataloader returns user id, which is leveraged to train the user representations.


Implement __init__()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:meth:`__init__` can be used to initialize some of the necessary parameters.
Here, we just need to record :attr:`uid_field`.

.. code:: python

def __init__(self, config, dataset,
batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
self.uid_field = dataset.uid_field

super().__init__(config=config, dataset=dataset,
batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)

Implement setup()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Because of some training requirement, :attr:`self.shuffle` should be true.
Then we can check and revise :attr:`self.shuffle` in :meth:`~recbole.data.dataloader.user_dataloader.setup`.


.. code:: python

def setup(self):
if self.shuffle is False:
self.shuffle = True
self.logger.warning('UserDataLoader must shuffle the data')

Implement pr_end() and _shuffle()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Since this dataloader only returns user id, these function can be implemented readily.

.. code:: python

@property
def pr_end(self):
return len(self.dataset.user_feat)

def _shuffle(self):
self.dataset.user_feat = self.dataset.user_feat.sample(frac=1).reset_index(drop=True)

Implement _next_batch_data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This function only require return user id from :attr:`user_feat`,
we only have to select one column, and use :meth:`_dataframe_to_interaction` to convert
:class:`pandas.DataFrame` into :class:`~recbole.data.interaction.Interaction`.


.. code:: python

def _next_batch_data(self):
cur_data = self.dataset.user_feat[[self.uid_field]][self.pr: self.pr + self.step]
self.pr += self.step
return self._dataframe_to_interaction(cur_data)


Complete Code
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code:: python

class UserDataLoader(AbstractDataLoader):
""":class:`UserDataLoader` will return a batch of data which only contains user-id when it is iterated.

Args:
config (Config): The config of dataloader.
dataset (Dataset): The dataset of dataloader.
batch_size (int, optional): The batch_size of dataloader. Defaults to ``1``.
dl_format (InputType, optional): The input type of dataloader. Defaults to
:obj:`~recbole.utils.enum_type.InputType.POINTWISE`.
shuffle (bool, optional): Whether the dataloader will be shuffle after a round. Defaults to ``False``.

Attributes:
shuffle (bool): Whether the dataloader will be shuffle after a round.
However, in :class:`UserDataLoader`, it's guaranteed to be ``True``.
"""
dl_type = DataLoaderType.ORIGIN

def __init__(self, config, dataset,
batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
self.uid_field = dataset.uid_field

super().__init__(config=config, dataset=dataset,
batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)

def setup(self):
"""Make sure that the :attr:`shuffle` is True. If :attr:`shuffle` is False, it will be changed to True
and give a warning to user.
"""
if self.shuffle is False:
self.shuffle = True
self.logger.warning('UserDataLoader must shuffle the data')

@property
def pr_end(self):
return len(self.dataset.user_feat)

def _shuffle(self):
self.dataset.user_feat = self.dataset.user_feat.sample(frac=1).reset_index(drop=True)

def _next_batch_data(self):
cur_data = self.dataset.user_feat[[self.uid_field]][self.pr: self.pr + self.step]
self.pr += self.step
return self._dataframe_to_interaction(cur_data)


Other more complex Dataloader development can refer to the source code.
Loading