Curator

Automated normalization and curating of media collections. Written in Python 3.x.

Curator is a collection of stateless CLI tools, following the Unix philosophy, to organize large collections of heterogeneous media. Each tool creates a plan made of tasks with clearly defined input and output files, which the user can optionally review before applying.

Install the package via:

pip install git+https://github.com/AlexAltea/curator.git

Credits

Acknowledgements to people who contributed code/ideas to the project:

Victor Garcia Herrero: Mathematician, Machine Learning expert and tamer of scoring functions.

Features

Curator can automatically rename and link media files, edit container metadata, remux and merge streams. Reducing manual labor and achieve reliable results across different media from potentially different sources, some tools rely on signal processing and machine learning (e.g. Whisper, LangID).

Highlighted use cases (current and planned):

Filter media by container and stream metadata (all).
Rename files based on existing filenames (curator-rename).
Merge streams from multiple related containers (curator-merge).
Detect audio/subtitle language from sound and text data (curator-tag).
Rename files based on existing metadata and databases (curator-rename).
Synchronize audio/subtitle streams (curator-merge and curator-sync).
Remove scene banners from subtitles (curator-clean).
Detect watermarks in video streams (curator-clean and curator-merge).
Select highest quality audio/video streams (curator-merge).

Below you can find a description and examples of all tools provided by Curator:

Auto

flowchart LR
    Convert --> Merge --> Sync --> Tag --> Rename

Merge

Merges all streams with identical names into a single container, except for:

Video streams, if one already exists.
Audio streams, if one with the same language tag already exists.

Requires all video containers to be MKV.

Rename

Update filenames according to a pattern made of the following variables:

Key	Description
`@ext`	File extension of the input media.
`@dbid`	When using a database, the ID of the match, e.g. `imdbid-tt12345678`.
`@name`	Localized name of the media.
`@oname`	Original name of the media (needs database).
`@tags`	Tags present in the input media filename enclosed by square brackets, if any.
`@year`	Year the media was released.

Sync

Synchronize streams via data cross-correlation.

Every synchronization task involves (A) a reference stream, and (B) the stream we want to synchronize. We name this relationship as A ← B. Curator can only handle the following types of synchronization tasks:

Video ← Audio:
Comparing lip movement timestamps with ASR timestamps.
Audio ← Audio:
Comparing sound data.
Audio ← Subtitle:
Comparing ASR timestamps with uniquely matching text timestamps.
Subtitle ← Subtitle:
Comparing text timestamps.

The synchronization plan (SyncPlan) will create a tree of synchronization tasks (SyncTask) for every media file it processes. For example, with an input Media("movie.mkv") with streams: #0 (video), #1 (audio:eng), #2 (audio:spa), #3 (subtitle:eng), #4 (subtitle:spa), it will genarate the following sync proposals:

#0 ← #1
#1 ← #2
#1 ← #3
#3 ← #4

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github		.github
curator		curator
docs		docs
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
publish.bat		publish.bat
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curator

Credits

Features

Auto

Merge

Rename

Sync

Tag

About

Releases

Sponsor this project

Packages

Languages

License

AlexAltea/curator

Folders and files

Latest commit

History

Repository files navigation

Curator

Credits

Features

Auto

Merge

Rename

Sync

Tag

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages