Fantasy Sports NLG

About

A system for hierarchical natural language generation in the domain of fantasy sports news. Automatically generates templates from a source corpus and uses data-to-text strategies to generate summaries of a player’s performance in a game.

We gathered a parallel corpus of data and text about a particular player and game to generate these templates. We then generated text hierarchically by performing document planning, microplanning, and surface realization.

Templates improve microplanning and surface realization by maintaining syntactic and semantic structure from the source corpus. Where text aligns with data in our table we can create slots in the templates for inserting future data and generating new text. The downside of templates is that they lack variation so we break the templates from the source text into chunks which we can recombine creating a balance between these aspects.

Dataset

We scrapped rotoworld.com for news updates about players and footballdb.com for player statistics. This allows us to construct a parallel corpus for learning data-to-text generation. We annotated one week of news updates as our gold corpus to compare our approaches against. We ended up with 2,118 training samples and 160 labeled test samples.

Methods

For the document planning, microplanning, and surface realization tasks we create features following guidelines in Angeli et al. (2010) Section 3.

Several approaches to template generation were explored including:

MsApriori Frequent-Chunksets
Combinatorics
Recurrent Neural Networks
Logistic Regression
Naive Bayes

Execution

Script location: ./analysis/src

template_generation_script.py
news_nlg_script.py

Execute template_generation_script.py. This script:

Generates templates
Chunks templates
Extracts and generates facts dataset
Trains model
Pickles model

Sample execution:

Processing documents to create training data
Output intermediate templates to (../data/intermediate_templates.csv)
Vectorized data shapes: (13511, 703) (13511, 21) (13511,)
Context n-gram naive bayes accuracy: 0.9818666271926578
Context tags naive bayes accuracy: 0.7793649618829103
Output templates to (../data/output_templates.csv)
Pickling template tag disambiguation model
Training chunk type selection
record (8920, 48) (8920, 1)
passing (305, 60) (305, 1)
receptions (1185, 60) (1185, 1)
rushing (518, 60) (518, 1)
game (1731, 60) (1731, 1)
Chunk type accuracy: 0.9345405724508477
Pickling template chunk type model

Execute news_nlg_script.py

Uses trained model to generate news articles.
Default file: /data/text_output.csv'
Sample output:

Artificial News Article using NLG:

Tavon Austin caught two passes for 79 yards and a touchdown in Week 2 against the Giants.

View project presentation

https://bit.ly/2Eq9VuP

Paper

docs/Maxwell-Singh - fantasy nlg.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.vs		.vs
analysis		analysis
docs		docs
football_db_spider		football_db_spider
nlp		nlp
rotoworld_spider		rotoworld_spider
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fantasy Sports NLG

About

Dataset

Methods

Execution

Script location: ./analysis/src

Artificial News Article using NLG:

View project presentation

Paper

About

Releases

Packages

Languages

CpuKnows/fantasy-nlg

Folders and files

Latest commit

History

Repository files navigation

Fantasy Sports NLG

About

Dataset

Methods

Execution

Script location: ./analysis/src

Artificial News Article using NLG:

View project presentation

Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages