Skip to content

6.8610 Final Project Proposal: A Library Learning Approach to Investigate the Usefulness of Conceptual Abstractions in Educational Videos

License

Notifications You must be signed in to change notification settings

Hramir/educational_concept_librarian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

educational_concept_librarian

6.8610 Final Project: Analyzing Educational Video Content through Hierarchical Graphs of Activities and Concepts

Abstract

The rise of digital education platforms comes with an increase in quantity of online educational resources -- not necessarily quality. Identifying the features of the most effective educational resources could have great benefits for improving online education by helping instructors who create content maximize its value to learners. This project aims to identify the linguistic and structural features of high-quality educational content through a library learning-inspired approach. We use large language models (LLMs) to extract conceptual hierarchies (graphs) from the transcripts of educational YouTube videos. We extract features from the graphs using (1) Fully Hyperbolic Neural Networks, (2) fine-tuned BERT, and (3) latent Dirichlet allocation, and use these features for supervised prediction of perceived teaching quality (estimated using counts of likes and views, and inferred comment sentiment). We also analyze the graphs jointly with teaching quality metrics to derive insights about what strategies for organizing video content maximize its perceived value to viewers. We find that successful videos describe the main concepts being taught in terms of elementary (tending to first appear early in playlist "curricula") and widely-referenced supporting concepts.

Overview of code

  • Run data_scraper/playlist_data_scraper.py to collect the dataset from YouTube. Note that our dataset is also available for direct download (see "accompanying dataset" below)
  • Run data_scraper/comment_sentiment.py to perform sentiment analysis on comment data and calculate average sentiment (Use parse_csv() to operate on existing csv, parse_current_directory() to generate csv from .txt files)
  • Feature extraction for the baseline Transcript-LDA model (LDA topic modeling on video transcripts) is done using lda_baseline/lda.py
  • Extract activity-concept hierarchies using the OpenAI API: llm_librarian/gpt_librarian_v2.py
  • Curate the concept library with BERT, and perform hypothesis tests, using an array of scripts in library_postprocessing/. See also the detailed readme on this part of the codebase, along with the resulting processed dataset, within the google drive folder linked under "accompanying dataset" below
  • Extract Concept-LDA graph-based features for regression using library_postprocessing/conceptual_lda.py
  • Extract Fully Hyperbolic Neural Network (FHNN) graph-based features using code in fhnn/
  • Extract Concept-BERT graph-based features using code in sentence_representation/
  • Run supervised regression models to predict like-to-view ratio and average comment sentiment using transcript_score_regression.ipynb
  • Finetune BERT for regression using finetune.py in sentence_representation/ (Experiments used A100 GPU for BERT fine-tuning, weaker GPUs/CPU can be used to fine-tune but will greatly increase runtime)

Accompanying dataset

About

6.8610 Final Project Proposal: A Library Learning Approach to Investigate the Usefulness of Conceptual Abstractions in Educational Videos

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •