Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
* initial commit with some ideas on how to do multilabel. * Additional hacking on multilabel model. * Added multilabel type aliases. Updated MultilabelModel but it's broken. * Added additional commentst. * Made SparseLabelDepFeatures type alias a little nicer but abused notation a little. * New plan. No label-dependent features for now. * moved MultilabelModel to multilabel package. * updated comments * Removed the requirements that SparseMultiLabelPredictor is Closeable. * Added comment to predictor. * Updated skeleton of MultilabelModel and small change to RegressionFeatures. * Added B <: U in helper methods. * Added a few comments to model and added test skeleton. * removed parameters to Auditor. Just use defaults because the values don't matter because there are no sub models or super models. * Added import to companion object Label type and Auditor. * Added additional test. * SparseMultiLabelPredictor was made package private for testing. * updated privacy of type aliases in multilabel package object * Serialization test * First test passing * Hopefully code complete for the case class. * Added comment. * lessened privileges. * Adding more multi-label tests * Success report test case * Addressing JMorra's PR comments. * More tests * java Serializable needs to be here * Adding MultilabelModel parsing stuff, plugins, VW version, etc. * Empty label problems test * More explicit val name * MultilabelModel parsing is compiling. * Added changed from Iterable[(String, Double)] to Sparse. * new line at EOF. * VW compiling but still a few holes to fill in. Added Namespaces trait. * VW compiling. Added test shell. Fill in shell. * Number of new changes * Test passing. It appears we don't need the dummy classes in test mode. * Updated VwSparseMultilabelPredictor. It now seems to be fully working as show in the test VwMultilabelModelTest. * Added some test comments. * Added comments to VwSparseMultilabelPredictor. * One more passing test, moving to performance testing now * updated split * Merging updates * Figured out the gist of a few more tests, terrible code though * First pass over all tests * Simplified some of the tests * Refactoring * Refactoring common patterns into the companion object * committing VwMultilabelRowCreator and updating other stuff to use it. * All tests pass, code structured a bit better * Wasn't compiling after merge * labels not in training set should be reported * Renamed missingLabels->labelsNotInTrainingSet to conform to new signature * Added some unit tests. Still plenty more to do. * Adding PR template * Addressing comments * Getting everything to compile. Still some work to be done., * VW multi-label model parsing working correctly. Tests prove it! * exposed VW parameters to VwSparseMultilabelPredictor * removed TODO. * updated tests to add coverage. * Added additional tests. * End to end testing working. Need to clean it up. * a little cleanup. * Removed implicit fn com.eharmony.aloha.factory.ScalaJsonFormats.lift(JsonReader). It was turning JsonFormats to JsonReaders and back to JsonFormats without writing capabilities. * simplifying tests. * Updated Travis and made docs:tut succeed. * Bumped stack to 3M for tut until 0.5.5+ can be used. * Bumped stack to 4M for tut until 0.5.5+ can be used. * partial EitherAuditor implementation * Added documentation in project/plugins.sbt about stack overflows. * working * some comments * Lots of tests, documentation, etc. Ready for PR. * updated scaladoc * whitespace * Added -Xss4m to tut scalac options, updated docs deps to: ioProto % "compile->test;compile->compile" * add sbt -v to docs:tut in .travis.yml * vw param function skeleton. * Trying -Xss8m * trying sbt-microsites 0.7.3 with no -Xss params. * Changed comments in project/plugins.sbt. Attempting Ubuntu Trusty on Travis. * updated the VW SHA-256 hash because were using trusty and VW wasn't being cached. * non working VwMultilabelModel.updatedVwParams. Skeleton laid out. * quadratics and cubics seem to be working. * removed println * made ignore_linear more concise * lots of stuff working. More tests to write for VwMultilabelParamAugmentation. * tested higher order interactions. * removed extra whitespace in string output. * working but will change regex padding to use zero-width positive lookahead. * added different padding. * Updated documentation and tests. Looks good. * Updated VW label NS algo. Added test for when a NS can't be found. * hacky solution to flags with options referencing files. Use tmp files :-( * Looks good. * Precompute positive and negative dummy class strings. * Adding numUniqueLabels parameter to updatedVwParams to add VW's --ring_size parameter * stateful row creator and reservoir sampling. * provided concrete implementations of iterator and vector apply method. * more purity in test. * separated pure and impure code. * no more. good enough. * It's never good enough, even on a football Saturday. * Seq -> List * VwDownsampledMultilabelRowCreator and supporting infrastructure and tests. * Made Rand use Int indices, made k < 2^15 in neg label sampling. Updated StatefulRowCreator API. * Addressing PR comments. Removed VW params from VW multi-label model code. * Downsampling can now operate over 2^31 - 1 (2 billion) labels. * changed Iterator.isEmpty to hasNext. Updated docs. * forgot logical not in if statement. * removed toShort from Rand. * addressing PR comments. Changed name of multilabel model to 'SparseMultilabel'. Generalized StatefulRowCreator. * 5.0.1-SNAPSHOT -> 5.1.0-SNAPSHOT * removed SCM section from pom content because new sbt-git version pulled in from microsites plugin automatically adds it. Leaving it in breaks deploy. See sbt/sbt-git#117 for details.
- Loading branch information