Grex: automatic grammar extraction from treebank

Grex is a tool for automatic grammar rule extraction from treebanks created by Santiago Herrera, Caio Corro and Sylvain Kahane. A full description of the method can be found in the paper: https://arxiv.org/abs/2403.17534

If you use this software, please cite the following work:

@inproceedings{herrera2024grex,
    title = "Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks",
    author = "Herrera, Santiago and Corro, Caio and Kahane, Sylvain",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    url = "https://arxiv.org/abs/2403.17534",
}

Usage

You first need to install the Skglm library: https://contrib.scikit-learn.org/skglm/auto_examples/index.html

At the moment, there are only two scripts:

autogramm_agreement.py: search rules of morphological agreements (e.g. when is there a number agreement between a word and its head?)
autogramm_activation.py: search rules that activate a feature (e.g. when is the subject after the verb?)

Examples are given in run.sh. The extracted rules are exported in json format.

Common arguments

--treebank: directory where treebanks are stored
--treebank-filter: list of treebanks to use (partial match will be used)
--feature-filter: features to remove (must be lowercased + partial match will be used)
--dep-filter: filter the dataset. For example, --dep-filter=head_upos=VERB,mod_upos=NOUN will check only dependencies between a VERB and a NOUN
--json: output file
--error: error file

autogramm_agreement.py

Search rules that forces two features to agree.

--feature1: first feature
--feature2: second feature

autogramm_activation.py

--feature-name: name of the feature
--feature-value: value of the feature we are interested in

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
html		html
pyautogramm		pyautogramm
.gitignore		.gitignore
README.md		README.md
autogramm_activation.py		autogramm_activation.py
autogramm_agreement.py		autogramm_agreement.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grex: automatic grammar extraction from treebank

Usage

Common arguments

autogramm_agreement.py

autogramm_activation.py

About

Releases

Packages

Languages

FilippoC/grex-lrec-coling-2024

Folders and files

Latest commit

History

Repository files navigation

Grex: automatic grammar extraction from treebank

Usage

Common arguments

autogramm_agreement.py

autogramm_activation.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages