-
-
Notifications
You must be signed in to change notification settings - Fork 205
Below are some frequently asked questions about JSAT.
JSAT is meant to be a general purpose machine learning library, that is easy to develop code for and with. This is different compared to other libraries like Weka and Orange, which are more designed to be used with a GUI by non-programmers.
It could be! The original reason JSAT has no dependencies is because I like implementing things. But as JSAT got bigger, the lack of dependencies became an unintended feature. A number of JSAT users started because JSAT had no dependencies, and other libraries were causing dependency conflicts in very large enterprise style projects. I now keep JSAT dependency free to so that it is easy to include in any project, and will never cause a dependency conflict.
Yes! That is in the plans, but was delayed for medical reasons.
No, I will not be changing the license of JSAT away from GPL. I have spent many years working on JSAT and made it available publicly (under the GPL) without any compensation. I consider the sharing requirements of the GPL as my "compensation" for the code I've released. That if you use JSAT to make something and distributed it, you must release your code as well. I am aware that the GPL "doesn't work" for some people, and they are free to ask me about alternative licensing if they wish.
You can always ask about things that are on my TODO list, or ask if ideas you have would work. Or if you have small typo / bug fixes feel free to just open a pull request. If you are doing a small change, copyright does not generally apply. If you are going to contribute some more significant code, I follow a policy similar to the GNU projects. I ask that you either contribute under the Public Domain or ask me about signing a ownership and licensing agreement. I'll end up with ownership of the code, you will have license to the code you contributed to do with as you please. This makes my life much easier.
JSAT has a number of slightly more niche algorithms implemented in it that are often not available in other libraries. Below is a list of some of the particularly useful ones that I think have high utility, and other implementations that I'm aware of. I don't stalk other projects, so if I missed something please assume it is an accident - and let me know so I can update the table!
Algorithm | Utility | Also Available In |
---|---|---|
Extra Random Trees | Classification and Regression problems | Scikit-learn, Weka |
DC-SVM | Fast multi-threaded approximate and exact SVM solver | only author's webpage |
NewGLMNET | Fast L1 and Elastic Net regularized logistic regression | LibLinear has L1 version, but not elastic net |
Support Passive Aggressive | Native multi-class version of popular Passive Aggressive classifier | no other implementation exists |
t-SNE | Popular data visualization algorithm | Scikit-learn |
LargeViz | New and easier to use data visualization algorithm, related to t-SNE | No other larger libraries |
Elkan & Hamerly k-means | Faster exact k-means clustering. | Scikit-learn has Elkan version. |
Elkan Kernel K-Means | Faster exact kernel k-means clustering | no other implementations exist |
Adaptive Multi-Hyperplane Machine | Non-linear classifier with training time similar to linear algorithms | BudgetedSVM |
RBF Kernel Merging Approximatino | Fast and useful budgeted kernel method | BudgetedSVM |
Modest AdaBoost | Version of AdaBoost that tends to overfit less | none |
DCDs & LogisticRegressionDCD | Fast linear SVM and LR solvers | Liblinear, scikit-learn |
HDBSCAN | Useful clustering algorithm that improves upon DBSCAN | random independent implementations, but not in any libraries |
LSDBC | Useful clustering algorithm that improves upon DBSCAN | no other implementations |
KernelRLS | Regression algorithm | dlib |