Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Reduce and explicitly define our public API #115

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

timokau
Copy link
Collaborator

@timokau timokau commented Apr 17, 2020

Description

This is a work in progress. I haven't quite figured out how githubs draft PRs work.

With the absolute imports in the tests we could have csrank/__init__.py empty now. The test suite would still pass. I think that would actually be my preference: Start with an empty top-level, add re-exports when experience shows that it would be more convenient.

However @kiudee advocated for some top-level imports on slack

[...] We should think about what our library looks like for a user. In scikit-learn, they group everything by topic (e.g. model_selection, linear_model etc) and inside each sub-module the hierarchy is flattened.

My opinion (please feel free to discuss): If we opt for a flat hierarchy, then there should be as few top-level class imports as possible. For example we should hide:

  • Dataset generators like DiscreteChoiceDatasetGenerator etc
  • Model cores like FATENetworkCore etc
    behind their appropriate modules.
    To summarize: Only the directly usable learners should be visible as top-level imports. Everything else is hidden in sub-modules

[...] the idea is that to a user it looks like he is importing learners directly from csrank. For dataset generators (s)he needs to use a module. The same thing for abstract classes like the cores
That way the immediate tab completion is very clean
accessing the sub modules is of course always possible

Which is also also fine by me.

Motivation and Context

Currently we re-export everything at the toplevel. That is suboptimal. It pollutes our public namespace with a lot of irrelevant definitions. It provides multiple ways to import things, while there should be one preferred way, ideally only one way to do it. It makes tab import less useful. It necessitates star imports.

Therefore we want to reduce our public API to an explicitly defined subset.

How Has This Been Tested?

Ran the testsuite and the pre-commit hooks.

Does this close/impact existing issues?

Closes #105.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@timokau
Copy link
Collaborator Author

timokau commented Apr 17, 2020

@kiudee, I would take your slack comments to mean that the top-level csrank.__all__ should contain

  • Choice Functions
    • AllPositive
    • FATEChoiceFunction
    • FATELinearChoiceFunction
    • FETAChoiceFunction
    • FETALinearChoiceFunction
    • GeneralizedLinearModel
    • PairwiseSVMChoiceFunction
    • RankNetChoiceFunction
  • Discrete Choice Functions
    • RandomBaselineDC
    • CmpNetDiscreteChoiceFunction
    • FATEDiscreteChoiceFunction
    • FATELinearDiscreteChoiceFunction
    • FETADiscreteChoiceFunction
    • FETALinearDiscreteChoiceFunction
    • GeneralizedNestedLogitModel
    • MixedLogitModel
    • ModelSelector
    • MultinomialLogitModel
    • NestedLogitModel
    • PairedCombinatorialLogit
    • PairwiseSVMDiscreteChoiceFunction
    • RankNetDiscreteChoiceFunction
  • Object Rankers
    • CmpNet
    • ExpectedRankRegression
    • FATEObjectRanker
    • FATELinearObjectRanker
    • FETAObjectRanker
    • FETALinearObjectRanker
    • ListNet
    • RankNet
    • RankSVM
    • RandomBaselineRanker

Is that understanding correct?

@codecov
Copy link

codecov bot commented Apr 17, 2020

Codecov Report

Merging #115 into master will decrease coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #115      +/-   ##
==========================================
- Coverage   60.01%   59.97%   -0.04%     
==========================================
  Files         116      116              
  Lines        7642     7636       -6     
==========================================
- Hits         4586     4580       -6     
  Misses       3056     3056              
Impacted Files Coverage Δ
csrank/__init__.py 100.00% <100.00%> (ø)
csrank/tests/test_fate.py 92.59% <100.00%> (ø)
csrank/tests/test_ranking.py 100.00% <100.00%> (ø)
csrank/tests/test_util.py 100.00% <100.00%> (ø)

@kiudee kiudee marked this pull request as draft May 30, 2020 08:01
@timokau
Copy link
Collaborator Author

timokau commented May 30, 2020

For the public record, @kiudee agreed in slack. Also the python conventions seem to agree with a two-level hierarchy: https://www.python.org/dev/peps/pep-0423/#avoid-deep-nesting

@timokau
Copy link
Collaborator Author

timokau commented May 30, 2020

I'm sorry, its been a while and I don't quite remember what I meant myself with comment #115 (comment). Does this mean all those functions should be directly accessible from csrank or below their respective submodules?

I.e. do we want

Option 1

from csrank.choicefunction import FATEChoiceFunction
from csrank.objectranking import FATEObjectRanker

Option 2

from csrank.choicefunction import FATE
from csrank.objectranking import FATE as FATEObject

Option 3

from csrank import FATEChoiceFunction
from csrank import FATEObjectRanker

I have kidn of operated under the assumption (1) and implemented that for now. But the comment could also be read as option (3). If we go this way, I think we should

  • move most other top-level modules (callbacks, constants, layers, learner, losses, metrics_np, metrics, numpy_util, tensorflow_util, tunable, tuning) under the util namespace
  • maybe reneame dataset_reader to datasets, since it also provides generation functionality

Then we would have no direct exports from csrank, and only the top-level modules

  • choicefunction
  • objectranking
  • discretechoice
  • util
  • datasets

I think this would improve tab-completion and discoverability of our API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Organize imports
2 participants