tSNE: Update to use new implementation #292
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
Once biolab/orange3#3192 is merged, the tSNE widget here will no longer work due to the API change.
Should not be merged until biolab/orange3#3192.
Description of changes
tSNE widget now works. I've removed the MulticoreTSNE code as well since the current implementation also implements Barnes-hut and is about as fast as MulticoreTSNE.
I've set the step size to 50 (chosen for no reason in particular), so the visualization is updated every 50 iterations. This is primarily done so the optimization can be stopped in between, otherwise one would have to wait until all iterations finished before the widget became responsive again.
One thing to note is the early exaggeration phase. The previous version was limited in the sense that the number of early exaggeration iterations was hardcoded to be 250 (in sklearn and MulticoreTSNE themselves). The early exaggeration factor was 1, so it behaved like the regular optimization. However, this mean that we could not optmimize for any less than 250 steps. With the new implementation, there is no such limitation, so we can actually run a single iteration if we want. Both the previous and current implementation completely remove the early exaggeration phase. This is generally not a good idea and can lead to worse visualizations. The point of early exaggeration is to correct poor initializations and clump similar points together.
I've set the optimization scheme to automatically switch to FFT interpolation when the number of points exceeds 10k. This was arbitratily chosen and is likely not the best cutoff point for Barnes-Hut. However, I am certain that FFT is faster than BH at 10k points.
Also, I've removed the old code with the TODO to remove once merged into core. I am fairly certain that code has long since been merged into core.
Also, from what I could tell, the previous implementation always
Includes