-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add word2vec.PathLineSentences for reading a directory as a corpus (#…
…1364) (#1423) * issue #1364 first commit, corpus from a directory added method models.word2vec.LineSentencePath method to read an entire directory's files in the same style as models.word2vec.LineSentence * test for word2vec.LineSentencePath issue #1364 initial attempt at test, including files. test just splits the lee_background.cor file into two parts and puts them in a directory, then makes sure they match the unsplit file as loaded by word2vec.LineSentence * better handling of input for LineSentencePath no longer sensitive to an input without a trailing os-specific slash * LineSentencePath renamed PathLineSentences in word2vec.py . Test updated as well * LineSentencePath rename to PathLineSentences in models.word2vec . Tests also updated * fix whitespace style error had only 1 space before an inline comment, flagged by travis CI build * updated PathLineSentences test and test data Removed LineSentencePath directory, created PathLineSentences lee corpus duplicates were in LineSentencePath, was wasting space made new small corpus to test PathLineSentences, put in directory changed test to read both files manually, combine, and compare to PathLineSentences (rather than having a separate single file to match the entire contents of the PathLineSentences test_data directory * word2vec.PathLineSentences single file support changed PathLineSentences to support a single file in addition to a directory, raises a warning to use LineSentence when a single file is given as a parameter. added corresponding test. * fixing style issues * fix style issue
- Loading branch information
1 parent
3e38e33
commit b818c91
Showing
4 changed files
with
79 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
this is important text | ||
it is very important | ||
you are learning a lot | ||
from reading this text. | ||
it much be hard to be so special! | ||
we envy you, with your knowledge of this text file, | ||
thank you. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters