GitHub - nlp-compromise/fr-compromise: linguistique computationnelle modeste

fr-compromise

linguistique computationnelle modeste

npm install fr-compromise

_{travaux en cours! • work-in-progress!}

_{voir: italien • allemand •espagnol • anglais
• portugais}

fr-compromise est un port de compromise en français

L'objectif de ce projet est de fournir un petit POS-tagger de base basé sur des règles.

_{(this project is a small, basic, rules-based POS tagger!)}

import tal from 'fr-compromise'

let doc = tal(`Je m'baladais sur l'avenue le cœur ouvert à l'inconnu`)
doc.match('#Noun').out('array')
// [ 'je', 'avenue', 'cœur', 'inconnu' ]

ou côté client:

<script src="https://unpkg.com/fr-compromise"></script>
<script>
  let txt = `J'avais envie de dire bonjour à n'importe qui`
  let doc = frCompromise(txt) // espace de noms global 
  console.log(doc.sentences(1).json())
  // { text:'J'avais...', terms:[ ... ] }
</script>

API

fr-compromise inclut toutes les méthodes de compromise/one:

cliquez pour voir l'API

Output

.text() - return the document as text
.json() - return the document as data
.debug() - pretty-print the interpreted document
.out() - a named or custom output
.html({}) - output custom html tags for matches
.wrap({}) - produce custom output for document matches

Utils

.found [getter] - is this document empty?
.docs [getter] get term objects as json
.length [getter] - count the # of characters in the document (string length)
.isView [getter] - identify a compromise object
.compute() - run a named analysis on the document
.clone() - deep-copy the document, so that no references remain
.termList() - return a flat list of all Term objects in match
.cache({}) - freeze the current state of the document, for speed-purposes
.uncache() - un-freezes the current state of the document, so it may be transformed

Accessors

.all() - return the whole original document ('zoom out')
.terms() - split-up results by each individual term
.first(n) - use only the first result(s)
.last(n) - use only the last result(s)
.slice(n,n) - grab a subset of the results
.eq(n) - use only the nth result
.firstTerms() - get the first word in each match
.lastTerms() - get the end word in each match
.fullSentences() - get the whole sentence for each match
.groups() - grab any named capture-groups from a match
.wordCount() - count the # of terms in the document
.confidence() - an average score for pos tag interpretations

Match

(match methods use the match-syntax.)

.match('') - return a new Doc, with this one as a parent
.not('') - return all results except for this
.matchOne('') - return only the first match
.if('') - return each current phrase, only if it contains this match ('only')
.ifNo('') - Filter-out any current phrases that have this match ('notIf')
.has('') - Return a boolean if this match exists
.before('') - return all terms before a match, in each phrase
.after('') - return all terms after a match, in each phrase
.union() - return combined matches without duplicates
.intersection() - return only duplicate matches
.complement() - get everything not in another match
.settle() - remove overlaps from matches
.growRight('') - add any matching terms immediately after each match
.growLeft('') - add any matching terms immediately before each match
.grow('') - add any matching terms before or after each match
.sweep(net) - apply a series of match objects to the document
.splitOn('') - return a Document with three parts for every match ('splitOn')
.splitBefore('') - partition a phrase before each matching segment
.splitAfter('') - partition a phrase after each matching segment
.lookup([]) - quick find for an array of string matches
.autoFill() - create type-ahead assumptions on the document

Tag

.tag('') - Give all terms the given tag
.tagSafe('') - Only apply tag to terms if it is consistent with current tags
.unTag('') - Remove this term from the given terms
.canBe('') - return only the terms that can be this tag

Case

.toLowerCase() - turn every letter of every term to lower-cse
.toUpperCase() - turn every letter of every term to upper case
.toTitleCase() - upper-case the first letter of each term
.toCamelCase() - remove whitespace and title-case each term

Whitespace

.pre('') - add this punctuation or whitespace before each match
.post('') - add this punctuation or whitespace after each match
.trim() - remove start and end whitespace
.hyphenate() - connect words with hyphen, and remove whitespace
.dehyphenate() - remove hyphens between words, and set whitespace
.toQuotations() - add quotation marks around these matches
.toParentheses() - add brackets around these matches

Loops

.map(fn) - run each phrase through a function, and create a new document
.forEach(fn) - run a function on each phrase, as an individual document
.filter(fn) - return only the phrases that return true
.find(fn) - return a document with only the first phrase that matches
.some(fn) - return true or false if there is one matching phrase
.random(fn) - sample a subset of the results

Insert

.replace(match, replace) - search and replace match with new content
.replaceWith(replace) - substitute-in new text
.remove() - fully remove these terms from the document
.insertBefore(str) - add these new terms to the front of each match (prepend)
.insertAfter(str) - add these new terms to the end of each match (append)
.concat() - add these new things to the end
.swap(fromLemma, toLemma) - smart replace of root-words,using proper conjugation

Transform

.sort('method') - re-arrange the order of the matches (in place)
.reverse() - reverse the order of the matches, but not the words
.normalize({}) - clean-up the text in various ways
.unique() - remove any duplicate matches

Lib

(these methods are on the main nlp object)

nlp.tokenize(str) - parse text without running POS-tagging
nlp.lazy(str, match) - scan through a text with minimal analysis
nlp.plugin({}) - mix in a compromise-plugin
nlp.parseMatch(str) - pre-parse any match statements into json
nlp.world() - grab or change library internals
nlp.model() - grab all current linguistic data
nlp.methods() - grab or change internal methods
nlp.hooks() - see which compute methods run automatically
nlp.verbose(mode) - log our decision-making for debugging
nlp.version - current semver version of the library
nlp.addWords(obj) - add new words to the lexicon
nlp.addTags(obj) - add new tags to the tagSet
nlp.typeahead(arr) - add words to the auto-fill dictionary
nlp.buildTrie(arr) - compile a list of words into a fast lookup form
nlp.buildNet(arr) - compile a list of matches into a fast match form

docs

Les Numeros:

fr-compromise peut analyser les nombres écrits et numériques:

let doc = nlp(`j'ai moins quarante dollars`).debug()
doc.numbers().add(50)
doc.text()
// "j'ai dix dollars"

number docs

Lemmatisation:

il peut conjuguer des mots à leur racine:

let doc=nlp('Nous jetons les chaussures')
doc.compute('root')
doc.found('{jeter} les {chaussure}')
// true

root docs

Analyse de date:

à l'aide le plugin fr-compromise-dates, il peut transformer des dates en langage naturel en dates au format ISO

import plg from 'fr-compromise-dates'
nlp.plugin(plg)
let opts = { timezone: 'UTC', today: '2023-03-02' }

let doc=nlp('Je peux emprunter votre voiture entre le 2 mai et le 14 juillets')
let res=doc.dates().json()[0]
/*
  {
    text: 'entre le 2 mai et le 14 juillet',
    dates: [
      {
        start: '2023-05-02T00:00:00.000Z',
        end: '2023-07-14T23:59:59.999Z'
      }
    ]
  }
*/
// true

root docs

Contribuant

Veuillez rejoindre pour aider! - please join to help!

help with first PR1

git clone https://github.com/nlp-compromise/fr-compromise.git
cd fr-compromise
npm install
npm test
npm watch

Voir aussi

benob/french-tagger - python french tagger
opennlp-french - Java tagger w/ french model
TreeTagger - Perl tagger w/ french model

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
.github/workflows		.github/workflows
builds		builds
data		data
demo		demo
learn		learn
plugins/dates		plugins/dates
scripts		scripts
src		src
tests		tests
types		types
.esformatter		.esformatter
.eslintrc		.eslintrc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
add-verbs.js		add-verbs.js
changelog.md		changelog.md
package-lock.json		package-lock.json
package.json		package.json
rollup.config.js		rollup.config.js
scratch.js		scratch.js
tmp.js		tmp.js
verbs.jsonl		verbs.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

API

cliquez pour voir l'API

Output

Utils

Accessors

Match

Tag

Case

Whitespace

Loops

Insert

Transform

Lib

Les Numeros:

Lemmatisation:

Analyse de date:

Contribuant

Voir aussi

About

Contributors 3

Languages

License

nlp-compromise/fr-compromise

Folders and files

Latest commit

History

Repository files navigation

API

cliquez pour voir l'API

Output

Utils

Accessors

Match

Tag

Case

Whitespace

Loops

Insert

Transform

Lib

Les Numeros:

Lemmatisation:

Analyse de date:

Contribuant

Voir aussi

About

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages