Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch migration: Remove tensorflow components, add FATE estimators #164

Merged
merged 24 commits into from
Apr 8, 2021

Conversation

timokau
Copy link
Collaborator

@timokau timokau commented Oct 9, 2020

Description

See this comment for a description of the current status.

Motivation and Context

Tensorflow 1 is deprecated and we need to move away from it. This PR is an attempt to evaluate pytorch as an alternative. For now I don't try to fit the existing API (at least not yet).

How Has This Been Tested?

Lints & tests.

Does this close/impact existing issues?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@timokau timokau marked this pull request as draft October 9, 2020 15:37
@timokau timokau changed the title Pytorch poc Proof of Concept: FETA in pytorch Oct 9, 2020
@kiudee
Copy link
Owner

kiudee commented Oct 13, 2020

Already looks very clean - well done!

@timokau
Copy link
Collaborator Author

timokau commented Oct 29, 2020

Just as a little status update, since this has been going on for a while: I think this is turning out quite nicely. I have implemented FETA ranking and (nearly, not using the proper loss function yet) FETA discrete choice. That shows flexibility on one axis (result type). I plan to also implement the same for FATE to show the flexibility on the other axis and finish the proof of concept. At that point we could evaluate and see where to take it from there.

So in summary, things are moving along but are not quite ready for review/discussion yet. Hopefully soon-ish.

@timokau
Copy link
Collaborator Author

timokau commented Oct 31, 2020

Okay, I think this is sufficient as a proof of concept now. I have implemented FATE and FETA, each in the ranking and discrete choice variant.

I replaced a lot of the inheritance in the current tensorflow implementation with composition. I have split the code into "scoring modules" and estimators. The scoring modules are themselves composed of smaller modules, which makes them easier to reuse/understand/test.

I have based the estimator implementation on skorch, which takes care of a lot of the boilerplate for us. We no longer have to care about training loops, instantiating optimizers or passing the parameters to uninitialized classes. We get #116 basically for free.

The actual "heavy lifing" of the computation (the pairwise utilities) is disentangled from the FETA/FATE architecture (the "data flow" part), so its easy to modify or replace. For now its just a simple 1-layer linear network. This decomposition of scorer/estimator/utility removes a lot of duplication. It would be very easy to add a new scorer (for example based on graph neural networks) and "throw" it at the existing Ranking/Discrete choice estimators. It would also be very easy to derive a new utility function architecture and "throw" that at the FATE module.

If you want to look at the implementation, here are the most interesting files:

  • modules/scoring.py takes care of the high level assembly for the FATE and FETA scoring modules.
  • estimators.py derives ranking and discrete choice estimators from the scorers.
  • *_losses.py, *_datasets.py defines some losses and test datasets. modules/feta_support.py and modules/embedding.py are contain the more low-level aspects of FETA and FATE.

What do you think @kiudee? There are still things to improve of course, but I think its sufficient as a proof of concept.

@timokau
Copy link
Collaborator Author

timokau commented Oct 31, 2020

Also CC @prithagupta if you are interested in this.

@timokau timokau changed the title Proof of Concept: FETA in pytorch Proof of Concept: FATE & FETA ranking and discrete choice in pytorch Oct 31, 2020
poc/modules/scoring.py Outdated Show resolved Hide resolved
instances
)
pairs = torch.cat((instances, context_per_object), dim=-1)
utilities = self.pairwise_utility_module(pairs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean decomposition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

# TODO use a more powerful pairwise utility module
def __init__(self, n_features, pairwise_utility_module=PairwiseLinearUtility):
super().__init__()
self.mean_aggregated_utilty = MeanAggregatedUtility(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle we can even think about making this modular. One could use different aggregation functions here or even a learned aggregation operator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be interesting :) The composition architecture should make experiments like that much easier.

@timokau
Copy link
Collaborator Author

timokau commented Nov 4, 2020

What is your general verdict @kiudee? Should I continue down this path, implementing more of the existing learners and functionality and eventually replacing the current implementation? Or rather try something else?

@timokau timokau force-pushed the pytorch-poc branch 2 times, most recently from 6911866 to e08acfc Compare November 18, 2020 18:17
timokau added a commit to timokau/cs-ranking that referenced this pull request Nov 19, 2020
I think default values for internal functions just hinder understanding.
Changed the parameter names to be less domain specific, since we are
just talking about a point in the ball for the purposes of this
function. Since this is an internal function, we can require an already
initialized random state.

Result of this discussion / explanation:
kiudee#164 (comment)
timokau added a commit to timokau/cs-ranking that referenced this pull request Nov 19, 2020
Thereby fixing a bug when the number of instances is not a multiple of
10.

Result of this discussion
kiudee#164 (comment)
timokau added a commit to timokau/cs-ranking that referenced this pull request Nov 19, 2020
I think default values for internal functions just hinder understanding.
Changed the parameter names to be less domain specific, since we are
just talking about a point in the ball for the purposes of this
function. Since this is an internal function, we can require an already
initialized random state.

Result of this discussion / explanation:
kiudee#164 (comment)
timokau added a commit to timokau/cs-ranking that referenced this pull request Nov 19, 2020
Thereby fixing a bug when the number of instances is not a multiple of
10.

Result of this discussion
kiudee#164 (comment)
@timokau
Copy link
Collaborator Author

timokau commented Nov 24, 2020

Another status update: I'm experimenting with experiments. We should be able to reproduce the experiments of the main papers with the new implementation, and I'd like to be able to do that in an easily reproducible way (that could possibly be repeated on each release). I'm trying to use Sacred for the purpose. I'm abusing the "named configuration" system a bit, but currently you can do things like

python3 experiment.py -m sacred with feta_variable_choice_estimator pareto_choice_problem dataset_params.n_instances=10000

you can pick "named configurations" for an estimator and a dataset. You can then overwrite all parameters on the command line. Sacred will run the experiment, store everything that is needed to reproduce it and also store the results in a database:
2020-11-24-212138_1918x1058_scrot

@timokau
Copy link
Collaborator Author

timokau commented Dec 1, 2020

Some more progress: I've added some metrics and played with the experiments a bit. Here I was trying to see how far I could push the current feta implementation with its defaults and just 1000 pareto instances (which was further than expected):
training plot
I ended up stopping the training even though the informedness still seemed to be rising very slightly. I ran the experiment with

python3 poc/experiment.py -m sacred with feta_variable_choice_estimator pareto_choice_problem dataset_params.n_instances=1000 dataset_params.n_objects=30

I also created an upstream PR for the Sacred logger for skorch: skorch-dev/skorch#725

poc/experiment.py Outdated Show resolved Hide resolved
"n_instances": int(1e5),
}
dataset_type = "variable_choice"
# TODO set cross-validation parameters as in the paper
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure if I understand the training and validation strategy of "Leaning Choice Functions" correctly. I understand that it uses and outer 5-fold cross validation for hyper-parameter optimization, but I'm not sure how the "inner" validation works. From Table 3 I would guess that it works like this:

  • Generate (110000 / 4) * 5 = 137,500 instances (to get that 100,000 + 10,000 split in each cross-validation step).
  • Split this into 5 folds of 27,500 instances each.
  • For each fold F (outer cross-validation loop):
    • Combine the other 4 folds into one dataset of 110,000 instances.
    • Split this into training (100,000) and test (10,000) data
    • For each set of hyperparameters
      • Train a model, test it on the test data.
    • Validate the best (according to test data) model on fold F
  • Report the average performance of the outer cross-validation.

However I'm almost sure that is incorrect 😄 Could you clarify @kiudee?

Python 3.7 is not officially supported anymore. Python 3.9 is released
already, but let's update to 3.8 first.
In preparation for the pytorch migration.
In preparation of adding new entries to the list.
@timokau
Copy link
Collaborator Author

timokau commented Apr 8, 2021

I have fixed the copy and paste issue that you found.

Just for completeness, I'll summarize the results of our private discussions here too:

  • I have removed the poetry2nix specific workaround.
  • I have restructured and reduced the history to avoid adding components just to remove them again in the same PR (related to the first open question here). Sorry for all the email notifications and force pushes.
  • I have "lifted" the level of abstraction of the specialized estimator classes (related to the second open question here).
  • I have removed the train/validation split in the tests (related to this comment).

The PR is ready for review again.

Edit: I forgot to mention the module names. We also discussed alternatives for the names of the first_order and zeroth_order module. In the end I decided to go with instance_reduction and object_mapping.

@timokau timokau requested a review from kiudee April 8, 2021 10:26
Copy link
Owner

@kiudee kiudee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great clean commit history. Looks ready to be merged.

@timokau
Copy link
Collaborator Author

timokau commented Apr 8, 2021

Thank you for the reviews :)

This is part of the ongoing pytorch migration. We will use skorch as the
basis for our pytorch estimators. That will make it easier to be
compliant with the scikit-learn estimator API.

Ranking and  (general/discrete) choice estimators are often based on
some sort of scoring. The task specific estimators make it easy to
derive concrete estimators from a scoring module.
This adds a scoring module and the derived estimators for the FATE
approach. The architecture is modular, so it should be easy to
experiment with new ways to put the estimators together. This is a big
commit. Multiple smaller ones that add the separate components (some of
which are structural or can be useful outside of FATE and therefore
could be considered features on their own) would probably have been
better. Splitting it up now would take more time and is not worth it in
this case though.
This simplifies interchangeable use of pytorch estimators and other
estimators.
The "linear" implementations have been removed. The existing estimators
do not expect `epochs` or `validation_split` parameters. The `verbose`
parameter is accepted by some estimators, but defaults to `False` and is
not expected by any of the ranking or discrete choice estimators.
The configuration is based on the configuration of the old (tensorflow
based) fate estimators in the tests. The tensorflow tests used a 10%
validation split, but still verified the performance in-sample. The
validation was not actually used. Therefore I haven't kept that
behavior.

The performance isn't the same. Especially the performance on the choice
task seems worse if we trust the test results. We shouldn't read too
much into that yet. The test is mostly for basic functionality and not a
reliable performance indicator. The sample size is small.
The binder logo (badge.svg vs badge_logo.svg) differs between the two
files, but either should be good.
There is now a pytorch implementation of the FATE estimators.
This is similar to the "optimizer_common_args" dictionary that used to
exist. This version contains skorch-specific arguments, which also
includes the train split and the number of epochs. There are only the
FATE based estimators now, but this would get repetitive when the other
approaches are included again.
@timokau
Copy link
Collaborator Author

timokau commented Apr 8, 2021

I wanted to run the checks and lints once more before merging and noticed some formatting issues. I did not run black consistently because I had a newer version in my environment and that made a lot of unrelated formatting changes. I have fixed the formatting with the black version that is defined in the .pre-commt-config.yaml. The linters (including black) give a thumbs-up now and the test suite passes.

Please take another look.

@kiudee kiudee self-requested a review April 8, 2021 11:03
@timokau
Copy link
Collaborator Author

timokau commented Apr 8, 2021

Thanks again 🚀

@timokau timokau merged commit ac836cb into kiudee:pytorch-migration Apr 8, 2021
@timokau timokau deleted the pytorch-poc branch April 8, 2021 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants