All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project uses Semantic Versioning.
- Versions of PyTorch Lightning between
1.6.0
and1.9.X
are now supported
HybridModel
andHybridPretrainedModel
now take additional optional parametersuser_metadata
anduser_metadata_layers_dims
get_data.py
now includesget_user_metadata
- Added
item_metadata_layers_dims
anduser_metadata_layers_dims
parameters toHybridPretrainedModel
andHybridModel
and removedmetadata_layers_dims
- Updated notebooks and examples to include usage of
user_metadata
- a
Value Error
is now raised whenitem_metadata
contains nulls
- Deprecated PyTorch Lightning unit test
- option to
force_split
tostratified_split
- better type hints for
Callable
s - added methods
get_user_predictions
anduser_user_similarity
to theBasePipeline
- added
_get_user_embeddings
method to all model classes
- default
Dockerfile
image to betorch@1.10.0
with CUDA 11.3 - check if index is in-bound for
get_item_predictions
anditem_item_similarity
before calling the model - added
enable_model_summary
anddetect_anomaly
parameters toCollieMinimalTrainer
and deprecatedweights_summary
andterminate_on_nan
to more closely match the newpytorch_lightning
API - clarified error message when user has a single interaction when using
stratified_split
- updated all examples, tests, and notebooks with post-1.5.0 PyTorch Lightning APIs
- device error when running metrics for a
MultiStagePipeline
models CollieMinimalTrainer
model summary to work with later versions of PyTorch Lightning
- default
num_workers
forInteractions
DataLoaders
- string support for Adagrad optimizer in model pipelines
- added property
max_epochs
toCollieTrainer
andCollieMinimalTrainer
withsetter
method CollieTrainer
's defaultmax_epochs
from1000
to10
- used new API for setting verbosity in
ModelSummary
inCollieMinimalTrainer
- multi-stage model template
MultiStagePipeline
- multi-stage models
HybridModel
andColdStartModel
- optimizers and learning rate schedulers are now reset upon each call of
fit
inCollieMinimalTrainer
, matching the behavior inCollieTrainer
HybridPretrainedModel
now includes bias terms from theMatrixFactorizationModel
in score calculationitem_item_similarity
now uses a more efficient, on-device calculation for item-item similarity- optimizers are now stepped using the
optimizer_step
method forCollieMinimalTrainer
_get_item_embeddings
methods now return atorch.tensor
type on device_move_external_data_to_device
optional method to all models
MultiOptimizer.step
method
- GitHub URL in
read_movielens_posters_df
to point to new repo name
- name of library to
collie
!
- name change warning from
collie_recs -> collie
- support for explicit data with
ExplicitInteractions
andexplicit_evaluate_in_batches
- warnings for invalid adaptive loss vs.
num_negative_samples
combinations
- default
Dockerfile
image to betorch@1.9.0
with CUDA 10.2
hybrid_matrix_factorization_model.py
deprecated filename
- new model architectures
CollaborativeMetricLearningModel
,MLPMatrixFactorizationModel
, andDeepFM
- filename for
HybridPretrainedModel
tohybrid_pretrained_matrix_factorization.py
. The former model filepath is now deprecated and will be removed in future version0.6.0
collie.model.base
is now split into its own directory with the same name- reduced boilerplate docstrings required for models
- all
model.freeze() -> model.eval()
- bumped version of
sphinx-rtd-theme
to0.5.2
CollieMinimalTrainer
for a faster, simpler version ofCollieTrainer
remove_duplicate_user_item_pairs
argument toInteractions
- renamed
BasePipeline.hparams.n_epochs_completed_ -> BasePipeline.hparams.num_epochs_completed
- a proper
ValueError
is now raised if notrain
data is passed into a model - loss docstrings that incorrectly stated
**kwargs
would be accepted
- disable automated batching in
ApproximateNegativeSamplingInteractionsDataLoader
andHDF5InteractionsDataLoader
convert_to_implicit
will now remove duplicate user/item pairs in DataFrame
- duplicate user/item pairs in
Interactions
are now dropped from the COO matrix during instantiation
- ability to run
stratified_split
without anyjoblib.Parallel
parallelization - data quality checks to
Interactions.__init__
to assertusers
anditems
andmat
are notNone
andratings
does not contain any0
s (if so, those rows will now automatically be filtered out) - increased test coverage
- header table to all Jupyter notebooks with links to Colab and GitHub
- default
processes
forstratified_split
is now-1
- default
k
value inmapk
is now set to10
- when GPU is available but not set,
CollieTrainer
now sets it to1
rather than-1
- all models now check that
train_loader
andval_loader
attributes are consistent during initialization - default
unseen_items_only
inBasePipeline.get_item_predictions
method is nowFalse
- docs in
get_recommendation_visualizations
to be clearer get_recommendation_visualizations
data quality checks have been moved to the beginning of the function to potentially fail fastercreate_ratings_matrix
no longer raisesValueError
ifusers
anditems
do not start at0
- refactored
adaptive_hinge_loss
kwargs
option for methods that did not explicitly need them
- typo in
cross_validation.py
error message head
andtail
methods ininteractions/datasets.py
to no longer error withn < 1
or largen
valuesnum_users
andnum_items
are no longer incorrectly incremented whenmeta
key is provided inHDF5Interactions
- type hints for
device
now also include instances oftorch.device
- the type of metadata tensors sent to
HybridPretrainedModel
are now consistent across all input options - removed ineffective quality checks in
HybridPretrainedModel.save_model
- no longer use deprecated
nn.sigmoid
in library - a
relu
final activation layer now works inNeuralCollaborativeFiltering
model df_to_html
now outputs proper HTML when multiplehtml_tags
options are specified- tutorial notebooks now fully run on Colab without having to only rely on previously-saved data
- add value of
1e-11
toBasePipeline.get_item_predictions
denominator to avoid potentialNaN
s
- various badges to
README
- links to documentation where relevant
- Colab links to tutorial notebooks and
README
quickstart
ApproximateNegativeSamplingInteractionsDataLoader
uses asampler
instead of abatch_sampler
for greater speed increases- base
Dockerfile
image to thetorch@1.8.1
version read_movielens_posters_df
now works as expected even whendata/movielens_posters.csv
file does not exist- renamed
LICENSE.txt -> LICENSE
- renamed
pull_request_template.md -> PULL_REQUEST_TEMPLATE.md
- GitHub Actions and templates in
.github
- Collie library for open sourcing
- Initial project scaffolding