Skip to content

quentinms/French-Whine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How it works

Getting the news

We search Google News for the word "grève" in the french edition which gives us a list of articles (title + newspaper link) that we can fetch as an RSS.

 Extracting the info

From the list of titles, we try to determine who is on strike. The way we do this is by parsing the title, and then, thanks to a Part-of-Speech tagger, we are able to determine what is the subject, what is the verb, etc. in the sentence. It is then relatively easy to extract the name of the striker with a Regex.

There are however still a few issues: first of all, articles' titles are usually not grammatically correct sentences which sometimes confuses the POS Tagger, and secondly, POS taggers are pretty good in English, but not so much in French.

Finally, the list of strikers is translated by Bing for the English version of the website.

Libraries used

TODO

  • Better regex to extract useful information
  • Find a more suited POS tagger.
  • Reduce RAM so it can run on free Heroku

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published