-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3001][MLLIB] Improve Spearman's correlation #1917
Conversation
Jenkins, test this please. |
QA tests have started for PR 1917. This patch merges cleanly. |
QA results for PR 1917: |
Jenkins, retest this please. |
QA tests have started for PR 1917. This patch merges cleanly. |
QA results for PR 1917: |
QA tests have started for PR 1917. This patch merges cleanly. |
QA results for PR 1917: |
Jenkins, test this please. |
QA tests have started for PR 1917. This patch merges cleanly. |
QA results for PR 1917: |
Jenkins, retest this please. |
QA tests have started for PR 1917. This patch merges cleanly. |
QA results for PR 1917: |
QA tests have started for PR 1917. This patch merges cleanly. |
QA results for PR 1917: |
QA tests have started for PR 1917. This patch merges cleanly. |
QA results for PR 1917: |
Merged into master and branch-1.1. |
The current implementation requires sorting individual columns, which could be done with a global sort. result on a 32-node cluster: m | n | prev | this ---|---|-------|----- 1000000 | 50 | 55s | 9s 10000000 | 50 | 97s | 76s 1000000 | 100 | 119s | 15s Author: Xiangrui Meng <meng@databricks.com> Closes #1917 from mengxr/spearman and squashes the following commits: 4d5d262 [Xiangrui Meng] remove unused import 85c48de [Xiangrui Meng] minor updates a048d0c [Xiangrui Meng] remove cache and set a limit to cachedIds b98bb18 [Xiangrui Meng] add comments 0846e07 [Xiangrui Meng] first version (cherry picked from commit 2e069ca) Signed-off-by: Xiangrui Meng <meng@databricks.com>
The current implementation requires sorting individual columns, which could be done with a global sort. result on a 32-node cluster: m | n | prev | this ---|---|-------|----- 1000000 | 50 | 55s | 9s 10000000 | 50 | 97s | 76s 1000000 | 100 | 119s | 15s Author: Xiangrui Meng <meng@databricks.com> Closes apache#1917 from mengxr/spearman and squashes the following commits: 4d5d262 [Xiangrui Meng] remove unused import 85c48de [Xiangrui Meng] minor updates a048d0c [Xiangrui Meng] remove cache and set a limit to cachedIds b98bb18 [Xiangrui Meng] add comments 0846e07 [Xiangrui Meng] first version
This PR updates UC-Spark-Authz plugin to 0.1.5 Change list: apple-cloud-services/uc-spark-authz@6e9000b...5998e71
The current implementation requires sorting individual columns, which could be done with a global sort.
result on a 32-node cluster: