A simple tool to detect whether an audio file was generated by NotebookLM.
At Listen Notes, we've encountered a growing number of spammers submitting fake, NotebookLM-generated podcasts to our platform. Check out this list of fake podcasts generated by Notebook LM.
We hoped the NotebookLM team would provide a tool to help detect NotebookLM-generated audio. However, after a week of back-and-forth emails, we lost patience.
It's now Friday (Oct 4, 2024), and since we won't hear back from the NotebookLM team until next week, we decided to put together this simple script. Luckily, it seems to work!
Update:
- October 9, 2024: After further emails with the NotebookLM team, it’s become evident that they are unable to provide tools or guidance to curb the spread of spammy, fake podcasts generated by NotebookLM. This is understandable, as they are typical 9-to-5 Google employees who enjoy a healthy work-life balance. NotebookLM remains an experimental project, and if it fails, the team members can easily transition to another project or team within Google, continuing their careers without significant disruption. There's little incentive for them to address issues that don't directly impact their performance reviews. Unfortunately, this leaves the podcasting industry vulnerable, but it's not a pressing concern for a handful of Googlers.
- Notebook LM: A threat to the Podcasting World
$ pip install -r requirements.txt
To detect whether an audio file is AI-generated or human-produced, run the following command:
$ python notebooklm_detector.py --action predict --file_path [filename].mp3
You’ll see output like this:
$ The audio is: AI Generated
or
$ The audio is: Human
You can train the model and regenerate model.pkl
by following these steps:
- Place NotebookLM-generated audio files (mp3, wav, or mp4) in the datasets/ai/ folder.
- Place human-produced audio files in the datasets/human/ folder.
To train the model, run:
$ python notebooklm_detector.py --action train --dataset_path datasets