Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tSNE: Update to use new implementation #292

Merged
merged 3 commits into from
Nov 8, 2018

Conversation

pavlin-policar
Copy link
Collaborator

Issue

Once biolab/orange3#3192 is merged, the tSNE widget here will no longer work due to the API change.

Should not be merged until biolab/orange3#3192.

Description of changes

tSNE widget now works. I've removed the MulticoreTSNE code as well since the current implementation also implements Barnes-hut and is about as fast as MulticoreTSNE.

I've set the step size to 50 (chosen for no reason in particular), so the visualization is updated every 50 iterations. This is primarily done so the optimization can be stopped in between, otherwise one would have to wait until all iterations finished before the widget became responsive again.

One thing to note is the early exaggeration phase. The previous version was limited in the sense that the number of early exaggeration iterations was hardcoded to be 250 (in sklearn and MulticoreTSNE themselves). The early exaggeration factor was 1, so it behaved like the regular optimization. However, this mean that we could not optmimize for any less than 250 steps. With the new implementation, there is no such limitation, so we can actually run a single iteration if we want. Both the previous and current implementation completely remove the early exaggeration phase. This is generally not a good idea and can lead to worse visualizations. The point of early exaggeration is to correct poor initializations and clump similar points together.

I've set the optimization scheme to automatically switch to FFT interpolation when the number of points exceeds 10k. This was arbitratily chosen and is likely not the best cutoff point for Barnes-Hut. However, I am certain that FFT is faster than BH at 10k points.

Also, I've removed the old code with the TODO to remove once merged into core. I am fairly certain that code has long since been merged into core.

Also, from what I could tell, the previous implementation always

Includes
  • Code changes
  • Tests
  • Documentation

@pavlin-policar pavlin-policar changed the title [NOMERGE] tSNE: Update to use new implementation tSNE: Update to use new implementation Sep 12, 2018
@mstrazar
Copy link
Contributor

@lanzagar @pavlin-policar
It would be really useful to have this soon, to handle larger datasets. Are there any news on this?

@codecov-io
Copy link

codecov-io commented Oct 21, 2018

Codecov Report

Merging #292 into master will increase coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #292      +/-   ##
==========================================
+ Coverage   61.36%   61.41%   +0.04%     
==========================================
  Files          28       28              
  Lines        6264     6240      -24     
==========================================
- Hits         3844     3832      -12     
+ Misses       2420     2408      -12

create_annotated_table, create_groups_table, ANNOTATED_DATA_SIGNAL_NAME)
create_annotated_table, create_groups_table, ANNOTATED_DATA_SIGNAL_NAME,
get_unique_names,
)


RE_FIND_INDEX = r"(^{} \()(\d{{1,}})(\)$)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used anymore.

@lanzagar lanzagar merged commit 7383050 into biolab:master Nov 8, 2018
@pavlin-policar pavlin-policar deleted the tsne-update branch November 9, 2018 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants