GitHub - phueb/MissingAdjunct: Generate pseudo-English sentences for research in semantic composition

A Python-based tool for generating a corpus of pseudo-English sentences with experimenter-controlled statistics.

About

This repository generates a corpus for training and evaluating distributional semantic models on a task that requires inferring a missing adjunct (e.g. instrument, location).

Usage

Install the repository into your virtual environment using, e.g. pip.

To use the corpus in your project, see load_corpus.py.

Then,

for sentence in corpus.get_sentences():
    print(sentence)

Note: Use a different random seed to produce corpora that differ slightly in their random uniform distributions over agents and themes. The purpose of seeds is to enable statistical hypothesis testing.

Compatibility

Developed on Ubuntu 18.04 and Python 3.7

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
images		images
items		items
missingadjunct		missingadjunct
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
compute_entropy.py		compute_entropy.py
feature_selection_problem.py		feature_selection_problem.py
load_corpus.py		load_corpus.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Usage

Compatibility

About

Releases 3

Languages

phueb/MissingAdjunct

Folders and files

Latest commit

History

Repository files navigation

About

Usage

Compatibility

About

Topics

Resources

Stars

Watchers

Forks

Releases 3

Languages