We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello, dear medallia staffs. Thank you for your nice Java code. It is beautiful, neatly but seems not precious.
I computed the accuracy rate, and it is 20% lower than the original version. I trained on text8 with the same parameters, which are:
Java
File f = new File("text8"); if (!f.exists()) throw new IllegalStateException("Please download and unzip the text8 example from http://mattmahoney.net/dc/text8.zip"); List<String> read = Common.readToList(f); List<List<String>> partitioned = Lists.transform(read, new Function<String, List<String>>() { @Override public List<String> apply(String input) { return Arrays.asList(input.split(" ")); } }); Word2VecModel model = Word2VecModel.trainer() .setMinVocabFrequency(5) .useNumThreads(20) .setWindowSize(8) .type(NeuralNetworkType.CBOW) .setLayerSize(200) .useNegativeSamples(25) .setDownSamplingRate(1e-4) .setNumIterations(15) .setListener(new TrainingProgressListener() { @Override public void update(Stage stage, double progress) { System.out.println(String.format("%s is %.2f%% complete", Format.formatEnum(stage), progress * 100)); } }) .train(partitioned); try(final OutputStream os = Files.newOutputStream(Paths.get("vectors.bin"))) { model.toBinFile(os); }
C
./word2vec -train text8 -output vectors.bin -cbow 1 -size 200 -window 8 -negative 25 -hs 0 -sample 1e-4 -threads 8 -binary 1 -iter 15
Use the same judge program and test file:
./compute-accuracy vectors.bin 30000 < questions-words.txt
Your Java implementation:
capital-common-countries: ACCURACY TOP1: 58.30 % (295 / 506) Total accuracy: 58.30 % Semantic accuracy: 58.30 % Syntactic accuracy: nan % capital-world: ACCURACY TOP1: 36.78 % (534 / 1452) Total accuracy: 42.34 % Semantic accuracy: 42.34 % Syntactic accuracy: nan % currency: ACCURACY TOP1: 12.69 % (34 / 268) Total accuracy: 38.77 % Semantic accuracy: 38.77 % Syntactic accuracy: nan % city-in-state: ACCURACY TOP1: 25.21 % (396 / 1571) Total accuracy: 33.16 % Semantic accuracy: 33.16 % Syntactic accuracy: nan % family: ACCURACY TOP1: 55.23 % (169 / 306) Total accuracy: 34.80 % Semantic accuracy: 34.80 % Syntactic accuracy: nan % gram1-adjective-to-adverb: ACCURACY TOP1: 8.07 % (61 / 756) Total accuracy: 30.64 % Semantic accuracy: 34.80 % Syntactic accuracy: 8.07 % gram2-opposite: ACCURACY TOP1: 9.48 % (29 / 306) Total accuracy: 29.39 % Semantic accuracy: 34.80 % Syntactic accuracy: 8.47 % gram3-comparative: ACCURACY TOP1: 38.25 % (482 / 1260) Total accuracy: 31.13 % Semantic accuracy: 34.80 % Syntactic accuracy: 24.63 % gram4-superlative: ACCURACY TOP1: 23.91 % (121 / 506) Total accuracy: 30.60 % Semantic accuracy: 34.80 % Syntactic accuracy: 24.50 % gram5-present-participle: ACCURACY TOP1: 22.08 % (219 / 992) Total accuracy: 29.53 % Semantic accuracy: 34.80 % Syntactic accuracy: 23.87 % gram6-nationality-adjective: ACCURACY TOP1: 63.17 % (866 / 1371) Total accuracy: 34.50 % Semantic accuracy: 34.80 % Syntactic accuracy: 34.25 % gram7-past-tense: ACCURACY TOP1: 26.35 % (351 / 1332) Total accuracy: 33.47 % Semantic accuracy: 34.80 % Syntactic accuracy: 32.64 % gram8-plural: ACCURACY TOP1: 44.25 % (439 / 992) Total accuracy: 34.39 % Semantic accuracy: 34.80 % Syntactic accuracy: 34.17 % gram9-plural-verbs: ACCURACY TOP1: 18.15 % (118 / 650) Total accuracy: 33.53 % Semantic accuracy: 34.80 % Syntactic accuracy: 32.90 % Questions seen / total: 12268 19544 62.77 %
Original C implementation:
capital-common-countries: ACCURACY TOP1: 82.81 % (419 / 506) Total accuracy: 82.81 % Semantic accuracy: 82.81 % Syntactic accuracy: nan % capital-world: ACCURACY TOP1: 62.26 % (904 / 1452) Total accuracy: 67.57 % Semantic accuracy: 67.57 % Syntactic accuracy: nan % currency: ACCURACY TOP1: 23.13 % (62 / 268) Total accuracy: 62.22 % Semantic accuracy: 62.22 % Syntactic accuracy: nan % city-in-state: ACCURACY TOP1: 44.68 % (702 / 1571) Total accuracy: 54.96 % Semantic accuracy: 54.96 % Syntactic accuracy: nan % family: ACCURACY TOP1: 75.82 % (232 / 306) Total accuracy: 56.52 % Semantic accuracy: 56.52 % Syntactic accuracy: nan % gram1-adjective-to-adverb: ACCURACY TOP1: 17.20 % (130 / 756) Total accuracy: 50.40 % Semantic accuracy: 56.52 % Syntactic accuracy: 17.20 % gram2-opposite: ACCURACY TOP1: 21.90 % (67 / 306) Total accuracy: 48.71 % Semantic accuracy: 56.52 % Syntactic accuracy: 18.55 % gram3-comparative: ACCURACY TOP1: 64.60 % (814 / 1260) Total accuracy: 51.83 % Semantic accuracy: 56.52 % Syntactic accuracy: 43.54 % gram4-superlative: ACCURACY TOP1: 39.72 % (201 / 506) Total accuracy: 50.95 % Semantic accuracy: 56.52 % Syntactic accuracy: 42.86 % gram5-present-participle: ACCURACY TOP1: 39.52 % (392 / 992) Total accuracy: 49.51 % Semantic accuracy: 56.52 % Syntactic accuracy: 41.99 % gram6-nationality-adjective: ACCURACY TOP1: 87.24 % (1196 / 1371) Total accuracy: 55.08 % Semantic accuracy: 56.52 % Syntactic accuracy: 53.94 % gram7-past-tense: ACCURACY TOP1: 38.21 % (509 / 1332) Total accuracy: 52.96 % Semantic accuracy: 56.52 % Syntactic accuracy: 50.73 % gram8-plural: ACCURACY TOP1: 67.54 % (670 / 992) Total accuracy: 54.21 % Semantic accuracy: 56.52 % Syntactic accuracy: 52.95 % gram9-plural-verbs: ACCURACY TOP1: 37.38 % (243 / 650) Total accuracy: 53.32 % Semantic accuracy: 56.52 % Syntactic accuracy: 51.71 % Questions seen / total: 12268 19544 62.77 %
Can you give me any suggestions or ideas about this? I am ready to help you if needed.
Thank you.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hello, dear medallia staffs.
Thank you for your nice Java code. It is beautiful, neatly but seems not precious.
I computed the accuracy rate, and it is 20% lower than the original version.
I trained on text8 with the same parameters, which are:
Java
C
Use the same judge program and test file:
Your Java implementation:
Original C implementation:
Can you give me any suggestions or ideas about this? I am ready to help you if needed.
Thank you.
The text was updated successfully, but these errors were encountered: