"Automatic Meeting Summarizer" model is a machine learning model that is used to summarize the input video file and provide the "minutes of meeting" to the client. The model is trained to provide summary in desired dialect.
The MDPI research paper provided us with an approach for speaker diarization and transcription of the input file. Use of SyncNet was a challenging task as the model takes a specific type of input that didn't match the requirements of our input files. Also the model was super slow and its training time was large. On surfing multiple models for trnascription, Whisper was the one that was most accurate and efficient. For Summarization and Translation, we used langchain, which is LLM of GPT model. Both summarization and translation can be achieved by this model.
- https://www.kaggle.com/datasets/wiradkp/mini-speech-diarization
- https://huggingface.co/datasets/knkarthick/AMI#dataset-creation
- http://groups.inf.ed.ac.uk/ami/download/
Transcription Model: The model takes video file, audio file or YouTube video as an input and generates its transcript in English using Whisper Model.
Summarization Model: The summarization model generates concise and most relevant point from a given video file. Langchain model is used for summarization and M2M 100 model of Facebook for translation. The model can take a video or audio file, as well as YouTube video, essentially in english, as an input. The summary is generated in english and then according to user needs the text can be translated from one language to another.
Our model is capable of taking audio, video and YouTube video files as input, transcribe them, and generate summary. The model has multilingual output which can be choosen by the user as per requirement.