Table of Contents
Clip2Story is a prototype web application that transcribes news video clips, summarizes transcripts using OpenAI, and feeds summaries as the first draft of a story into a CMS.
This project was originally built for KSAT-TV in San Antonio, Texas. The Associated Press and Stanford University collaborated to develop this application as part of the Local News AI Initiative, funded by the John S. and James L. Knight Foundation, which aims to leverage AI for the benefit of local news.
The development team thanks the staff at KSAT-TV and Graham Media Group for proposing this project, and for their participation, feedback, and encouragement.
- Reduce Workloads: Decreasing the burden for journalists to post new stories on digital platforms.
- Build New Capabilities: Allowing journalists to experiment with generating text content from different types of videos.
- Cost Effectiveness: Running the system should not incur expenses greater than the savings or revenue gains generated by using it.
Clip2Story functions through this process:
- Video Upload: The system takes an input of pre-edited video clips uploaded via the web application or via the Trint transcription service.
- Transcription: A transcript of the video is generated via a call to the Trint API.
- Approval of Transcript: After a transcript is completed, the system awaits a journalist to review and/or edit the transcript for accuracy. (The dashboard is seen in the above image.)
- Summarization: The validated transcript is summarized via a call to OpenAI's GPT 3.5 Turbo model via API. (GPT prompts are controlled via the Django Administration interface, see image below.)
- Keywording: Relevant tags for the transcript are generated via a call to OpenAI's GPT 3.5 Turbo model via API.
- Publication: The summary and keywords are uploaded to the Arc XP CMS via API as a draft story for review by a journalist.
To do initial configuration and setting up of third-party apps, prompts, and user accounts; See the administrators documentation.
This application was originally designed to be hosted on Google Cloud Platform. See GCP Deployment for the details. There is a job running every 30 minutes on Cloud Run that checks for and deletes old videos. See Management Tasks and Jobs for details.
- Django
- Postgres
- Trint API access
- Arc XP API access
- OpenAI API access
- Google Cloud Platform (for production hosting)
Install Postgres.app.
Create a local version of the database. For this step, you may need to locate the createdb
command on your computer. This will vary depending on the version of Postgres.app that you installed.
# First try plain old createdb
createdb summarizerdb
# If the above doesn't work, try locating the createdb command in
# Postgres.app folder. Below is an example if you're on Postgres.app version 15
/Applications/Postgres.app/Contents/Versions/15/bin/createdb summarizerdb
Grab a copy of the codebase and install Python requirements.
git clone git@github.com:associatedpress/local-ai-ksat.git
cd local-ai-ksat
pipenv install --dev
NOTE: All of the following commands should be executed on the command line, from the top-level
local-ai-ksat/
directory, unless otherwise specified.
Set up a .env
file to store secrets and other project-specific environment variables.
cp env.template .env
Add your database username to the .env
file:
echo DJANGO_DB_USER=$(whoami) >> .env
Migrate your local database (this will create all tables, fields, etc.).
# If you haven't already done so, activate the virtual environment
# by running "pipenv shell" from the top-level "local-ai-ksat/" directory
# Then navigate to the clip2story/ directory and update the database
cd clip2story/
python manage.py migrate
Create a superuser for the Django admin database
python manage.py createsuperuser
IMPORTANT: Any time you install a new application dependency using
pipenv install
, you must regenerate therequirements.txt
file used in the production deployment by runningpipenv lock -r > requirements.txt
. And of course, commit that update along with any code updates in order to make the new software dependencies available in production.
For day-to-day usage, use the below commands.
Note, you may also need to occassionally migrate your database, per the instructions above in Setup
# Activate the virtual environment
cd local-ai-ksat/
pipenv shell
# Fire up the dev server
cd clip2story/
python manage.py runserver
Use your superuser credentials to log into:
- the app at http://localhost:8000
- Django admin: http://localhost:8000/admin
At the end of the MVP development period, these were the features that we thought would be useful to have in the future:
- Integration with a variety of other transcription services, especially an OpenAI Whisper model because it performed really well in a separate project for Michigan Radio.
- Integration with a variety of other content management systems
- Native UI management and help guides that do not use the Django interface
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the GNU GENERAL PUBLIC LICENSE. See LICENSE
for more information.
The Associated Press does not provide technical support for this open-source application.
Serdar Tumgoren - @zstumgoren - tumgoren@stanford.edu
Project Link: https://github.com/associatedpress/local-ai-ksat
- Ryan Leahy - @RyanLeahy - Gonzaga University
- Ozge Terzioglu - @ozterz - Stanford University
- Kalyn Epps - Stanford University