Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplified format (shorthands) for dictionary entries #6

Open
jaumeortola opened this issue Feb 12, 2024 · 3 comments
Open

simplified format (shorthands) for dictionary entries #6

jaumeortola opened this issue Feb 12, 2024 · 3 comments
Assignees

Comments

@jaumeortola
Copy link
Member

jaumeortola commented Feb 12, 2024

A simplified format makes it easier to edit and maintain the dictionary.

We need:

  • to define a format
  • rules to expand the inflected forms from the simplified format (inflected forms for regular verbs, plurals for nouns, etc.)
  • ways to write the exceptions (everything that doesn't fit the regular inflected forms).

To be sure that everything works as expected, we need scripts to convert from simplified format to expanded format, and vice versa. The results must be identical.

Verbs

simplified format: recharge=verb=all
expanded format: recharge=recharge/VB,recharged/VBD,recharging/VBG,recharged/VBN,recharge/VBP,recharges/VBZ=all
The rules are defined here (I will re-write and improve those rules)

Nouns

All tagging possibilities for nouns are here: NN-counted.txt
If we come up with a format for the 8 first common cases, we cover 99% of the nouns in the dict. [But only of those that are regular, or that can be derived with simple rules.]

  55516 NN,NNS
  23761 NN
   8560 NN:UN,NNS
   8260 NN:U,NNS
   4533 NN:UN
   2757 NN:U
   1218 NN,NNS,NNS
    531 NN,NN:U,NNS

For nouns with only one form and one tag (lines 2, 5 and 6), we can use just the actual tag

NN    Noun, singular count noun: bicycle, earthquake, zipper
NNS   Noun, plural: bicycles, earthquakes, zippers
NN:U  Nouns that are always uncountable		#new tag - deviation from Penn, examples: admiration, Afrikaans
NN:UN Nouns that might be used in the plural form and with an indefinite article, depending on their meaning	#new tag - deviation from Penn, examples: establishment, wax, afternoon
NNP   Proper noun, singular: Denver, DORAN, Alexandra
NNPS  Proper noun, plural: Buddhists, Englishmen
@jaumeortola
Copy link
Member Author

Proposal: shorthands for nouns

=noun= nouns NN with a regular plural NNS
=noun_UN= nouns NN:UN with a regular plural NNS
=noun_U= nouns NN:U with a regular plural NNS (this is contradictory?: U means always uncountable)

For lemmas with only one form, use just the POS tag:
=NN= (Most of these words are usually adjectives, tagged as nouns as well)
=NN:U=
=NN:UN=

For all other cases (irregular plurals, more than one plural, etc.) use the full inflected forms with tags.
addendum=addendum/NN,addenda/NNS,addendums/NNS=all

@AzadehSafakish

@AzadehSafakish
Copy link
Collaborator

=noun_U= nouns NN:U with a regular plural NNS (this is contradictory?: U means always uncountable)

It is contradictory. By the book, anything tagged with NN:U should not have a plural form, otherwise it's NN:UN.
But in reality, our dictionary has plenty of NN:U/NNS pairs, so having a label for that makes sense.

Is there a label for proper nouns (NNP/NNPS), or do those fall under =noun=?
The distinction between NN and NNP, NNS and NNPS has always been very useful.

@jaumeortola
Copy link
Member Author

We would need to distinguish the proper nouns some way. This would be coherent?
=proper_noun= nouns NNP with a regular plural NNPS
For lemmas with only one form, use just the POS tag:
=NNP=
=NNPS=

But if NNP+NNPS is not so frequent, maybe it is misleading. Then just write out both forms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants