Skip to content

jasoncox576/Object-Placement

Repository files navigation

Object-Placement

Generating Text Corpus with Bigram Tokens:
To generate a corpus with bigrams, you will need to run fix_file.py with the path in the source pointing to the desired text corpus.

Generating Train/Test sets:
To generate all of the train and test files, run

python3 data_gen.py

The resulting csv files in the directory will be labeled n_train.csv or n_test.csv depending on the set that it belongs to.

Here's how each set is generated:

1:
Train: Only 4/4 agree cases, 75% of original data randomly sampled. All four of those cases are merged into one.
Test: Only 4/4 agree cases, the other 25% of the data. All four of those cases are merged into one.

2:
Train: Same as previous 1_train.csv.
Test: All of the 3/4 agreement cases from the original dataset. Expect slightly lower accuracy when testing on this set.

3:
Train: Same as before, still 75% of 4/4 cases.
Test: All of the 2/4 agreement cases. Because there's so much disagreement among annotators, expect significantly hindered accuracy of no fault of the model.

4:
Train: Remove all instances that have a 'similar' item anywhere in them.
Test: Put all of those instances that had a 'similar' item in the test set.

5: Train: 75% of everything, randomly sampled
Test: 25% of everything, randomly sampled

Training and Evaluation

Then, you can use python3 word2vec_train.py
to generate all of the models, trained with validation, which will be saved to separate directories. python3 word2vec_test.py to load and evaluate all models on all train/test sets. The results can be found in accs_final.csv.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published