Skip to content

Latest commit

 

History

History
37 lines (25 loc) · 1.36 KB

README.md

File metadata and controls

37 lines (25 loc) · 1.36 KB

Free Text Tagger

This repository contains all scripts to extract contextual information from a free text.

Given a free text, the script is able to extract information about 4 categories: activities, emotions, interactions and places. For each of these categories there is a dictionary, which contains a list of sub-categories.

Text given in input is parsed and then matched to the sub-categories by handwritten rules, which take into account syntactic information (lemmas, Parts-Of-Speech, dependency structure, ...).

Requirements

  • Requires Python 3.x
  • Requires the following Python libraries:

Input

  • Text (string)

-- choose how to pass string to the main script --

Output

For each category returns a matches list containing:

  • a numeric id for the matched sub-category
  • a number that states the point in the sentence where the match starts
  • a number that states the point in the sentence where the match ends

e.g. "We're playing games" will return this output:

  • [(5133706519360878345, 2, 3), (5133706519360878345, 2, 4), (5133706519360878345, 3, 4)]

  • 5133706519360878345 is the id for the sub-category 'leisure'

  • 2,3 is the span for 'playing'

  • 2,4 is the span for 'playing games'

  • 3,4 is the span for 'games'

! notice that in the span interval, the first number is included, the second one is NOT included