GitHub - associatedpress/local-ai-ksat: Clip2Story is a prototype web application that transcribes news video clips, summarizes transcripts using OpenAI, and feeds summaries as the first draft of a story into a CMS.

Clip2Story

Table of Contents

About The Project
Getting Started
Usage
Roadmap
Contributing
License
Contact

About The Project

Clip2Story is a prototype web application that transcribes news video clips, summarizes transcripts using OpenAI, and feeds summaries as the first draft of a story into a CMS.

This project was originally built for KSAT-TV in San Antonio, Texas. The Associated Press and Stanford University collaborated to develop this application as part of the Local News AI Initiative, funded by the John S. and James L. Knight Foundation, which aims to leverage AI for the benefit of local news.

The development team thanks the staff at KSAT-TV and Graham Media Group for proposing this project, and for their participation, feedback, and encouragement.

Project Objectives

Reduce Workloads: Decreasing the burden for journalists to post new stories on digital platforms.
Build New Capabilities: Allowing journalists to experiment with generating text content from different types of videos.
Cost Effectiveness: Running the system should not incur expenses greater than the savings or revenue gains generated by using it.

How It Operates

Clip2Story functions through this process:

Video Upload: The system takes an input of pre-edited video clips uploaded via the web application or via the Trint transcription service.
Transcription: A transcript of the video is generated via a call to the Trint API.
Approval of Transcript: After a transcript is completed, the system awaits a journalist to review and/or edit the transcript for accuracy. (The dashboard is seen in the above image.)
Summarization: The validated transcript is summarized via a call to OpenAI's GPT 3.5 Turbo model via API. (GPT prompts are controlled via the Django Administration interface, see image below.)
Keywording: Relevant tags for the transcript are generated via a call to OpenAI's GPT 3.5 Turbo model via API.
Publication: The summary and keywords are uploaded to the Arc XP CMS via API as a draft story for review by a journalist.

(back to top)

Built With

(back to top)

Getting Started

To do initial configuration and setting up of third-party apps, prompts, and user accounts; See the administrators documentation.

Production

This application was originally designed to be hosted on Google Cloud Platform. See GCP Deployment for the details. There is a job running every 30 minutes on Cloud Run that checks for and deletes old videos. See Management Tasks and Jobs for details.

Prerequisites

Django
Postgres
Trint API access
Arc XP API access
OpenAI API access
Google Cloud Platform (for production hosting)

Installation

Install Postgres.app.

Create a local version of the database. For this step, you may need to locate the createdb command on your computer. This will vary depending on the version of Postgres.app that you installed.

# First try plain old createdb

createdb summarizerdb

# If the above doesn't work, try locating the createdb command in
# Postgres.app folder. Below is an example if you're on Postgres.app version 15

/Applications/Postgres.app/Contents/Versions/15/bin/createdb summarizerdb

Grab a copy of the codebase and install Python requirements.

git clone git@github.com:associatedpress/local-ai-ksat.git
cd local-ai-ksat
pipenv install --dev

NOTE: All of the following commands should be executed on the command line, from the top-level local-ai-ksat/ directory, unless otherwise specified.

Set up a .env file to store secrets and other project-specific environment variables.

cp env.template .env

Add your database username to the .env file:

echo DJANGO_DB_USER=$(whoami) >> .env

Migrate your local database (this will create all tables, fields, etc.).

# If you haven't already done so, activate the virtual environment
# by running "pipenv shell" from the top-level "local-ai-ksat/" directory

# Then navigate to the clip2story/ directory and update the database
cd clip2story/
python manage.py migrate

Create a superuser for the Django admin database

python manage.py createsuperuser

IMPORTANT: Any time you install a new application dependency using pipenv install, you must regenerate the requirements.txt file used in the production deployment by running pipenv lock -r > requirements.txt. And of course, commit that update along with any code updates in order to make the new software dependencies available in production.

(back to top)

Usage

For day-to-day usage, use the below commands.

Note, you may also need to occassionally migrate your database, per the instructions above in Setup

# Activate the virtual environment
cd local-ai-ksat/
pipenv shell

# Fire up the dev server
cd clip2story/
python manage.py runserver

Use your superuser credentials to log into:

the app at http://localhost:8000
Django admin: http://localhost:8000/admin

(back to top)

Roadmap

At the end of the MVP development period, these were the features that we thought would be useful to have in the future:

Integration with a variety of other transcription services, especially an OpenAI Whisper model because it performed really well in a separate project for Michigan Radio.
Integration with a variety of other content management systems
Native UI management and help guides that do not use the Django interface

(back to top)

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the GNU GENERAL PUBLIC LICENSE. See LICENSE for more information.

(back to top)

Contact

The Associated Press does not provide technical support for this open-source application.

Serdar Tumgoren - @zstumgoren - tumgoren@stanford.edu

Project Link: https://github.com/associatedpress/local-ai-ksat

Original Developers

Ryan Leahy - @RyanLeahy - Gonzaga University
Ozge Terzioglu - @ozterz - Stanford University
Kalyn Epps - Stanford University

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
clip2story		clip2story
docs		docs
Dockerfile		Dockerfile
Dockerfile.delete_recording_job		Dockerfile.delete_recording_job
LICENSE		LICENSE
Pipfile		Pipfile
README.md		README.md
cloudbuild-delete-job.yaml		cloudbuild-delete-job.yaml
cloudbuild.yaml		cloudbuild.yaml
env.template		env.template
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clip2Story

About The Project

Project Objectives

How It Operates

Built With

Getting Started

Production

Prerequisites

Installation

Usage

Roadmap

Contributing

License

Contact

Original Developers

About

Releases

Packages

Contributors 2

Languages

License

associatedpress/local-ai-ksat

Folders and files

Latest commit

History

Repository files navigation

Clip2Story

About The Project

Project Objectives

How It Operates

Built With

Getting Started

Production

Prerequisites

Installation

Usage

Roadmap

Contributing

License

Contact

Original Developers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages