Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get raw vectors for further processing in Word2Vec #3309

Closed
wants to merge 4 commits into from

Conversation

tkaessmann
Copy link

e.g. clustering

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

}

/**
* Returns the strings with it´s raw vectors for further processing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it´s -> its (btw, the character ´ does not belong to ascii, which should be avoid in the code base)

@@ -461,4 +461,13 @@ class Word2VecModel private[mllib] (
.tail
.toArray
}

/**
* Returns the strings with it's raw vectors for further processing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's -> its


/**
* Returns the strings with its raw vectors for further processing
* (e.g. clustering)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for making one more iteration! The doc could be more concise, e.g.

Returns a map of words to their vector representations.

for further processing (e.g. cluster) is not necessary because we don't restrict how users will use it.

Usually, if Returns ... contains all the information, we don't need @return.

Another question I have is on the return type. We use Array[Float] to save space in computation. Shall we return Map[String, Vector] here, just to be consistent with others? @Ishiihara

@mengxr
Copy link
Contributor

mengxr commented Nov 24, 2014

ok to test

@mengxr
Copy link
Contributor

mengxr commented Nov 24, 2014

test this please

@SparkQA
Copy link

SparkQA commented Nov 24, 2014

Test build #23790 has started for PR 3309 at commit e3a3142.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 24, 2014

Test build #23790 has finished for PR 3309 at commit e3a3142.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23790/
Test PASSed.

mengxr pushed a commit to mengxr/spark that referenced this pull request Nov 25, 2014
e.g. clustering

Author: tkaessmann <tobias.kaessmann@s24.com>

Closes apache#3309 from tkaessmann/branch-1.2 and squashes the following commits:

e3a3142 [tkaessmann] changes the comment for getVectors
58d3d83 [tkaessmann] removes sign from comment
a5be213 [tkaessmann] fixes getVectors to fit code guidelines
3782fa9 [tkaessmann] get raw vectors for further processing
@mengxr
Copy link
Contributor

mengxr commented Nov 25, 2014

LGTM. I've merged this into branch-1.2. I made a new PR #3437 with the same content to the master branch. Usually a PR should be sent to the master branch unless it is for a specific branch.

andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Nov 25, 2014
e.g. clustering

Author: tkaessmann <tobias.kaessmann@s24.com>

Closes apache#3309 from tkaessmann/branch-1.2 and squashes the following commits:

e3a3142 [tkaessmann] changes the comment for getVectors
58d3d83 [tkaessmann] removes sign from comment
a5be213 [tkaessmann] fixes getVectors to fit code guidelines
3782fa9 [tkaessmann] get raw vectors for further processing
@asfgit asfgit closed this in 9ce2bf3 Nov 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants