KoroKoro 👀

KoroKoro is an automated pipeline for converting 2D videos into detailed 3D models using advanced techniques.

Introduction

KoroKoro uses a mix of advanced deep learning techniques to convert 30-second videos around an object to a fully interactive 3D object.

View live demo here

How does it work?

Video Ingestion & Processing

Given an input video, 40 frames are extracted [default, can be changed in extract_frames], these 40 frames are processed using the process_data method in DataProcessing class to generate a NeRF-compatible dataset that includes a transforms.json file.

Image Transformation

Given a frame, if the object of interest is among the 80 COCO classes, YOLOv8 predicts the bounding box coordinates of the object otherwise GroundingDINO handles the bounding box prediction taking a natural language prompt — the description/title of the object. This title is set in config/config.yaml.

Given a frame and the xy coordinates of the bounding box around the object, SegmentAnythingv2 creates an accurate mask of the object, the mask is then used to extract only the object leaving the other areas/background empty.

See algorithm below

if object in coco_classes:
  detect_with_yolov8()
  if successful():
    segment_with_sam2()
  else:
    detect_with_groundingdino()
    if successful():
      segment_with_sam2()
else:
  detect_with_groundingdino()
  if successful:
    segment_with_sam()

3D Reconstruction

Processed inputs from the previous steps are fed to Nerfstudio's implementation of Gaussian Splat — splatfacto.

The resulting splats are finally exported to a .ply file

Prerequisites

Conda / Miniconda

NB: Tested on GPU compute with A10 (24GB) & Google Colab T4 (16GB)

Installation

Install conda if not installed

sudo apt-get install libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6

curl -sL \
  "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > \
  "Miniconda3.sh"

bash Miniconda3.sh

# ⚠️ Might be different on your computer, please take note of mininconda installation directory

source /root/miniconda3/bin/activate

General setup

git clone https://github.com/Daheer/KoroKoro.git
cd KoroKoro

# This will setup the environment
bash setup.sh

# Activate the environment
conda activate korokoro

Run Locally

Configure the category, title & video_output fields in config.yaml

category: MS COCO class name if available e.g. book, otherwise set to others
title: natural language of the object e.g. blue backpack
video_output: path to the input video

Run local pipeline

python KoroKoro/pipeline/local.py

Run with Supabase Database Connection

Simply run the command below, it will fetch products from the queue in Supabase and generate 3D models

python KoroKoro/pipeline/stage_01.py
python KoroKoro/pipeline/stage_02.py

Google Colab

Install xterm

!pip install colab-xterm

Load xterm extension

%load_ext colabxterm

Launch xterm terminal

%xterm

Continue from start of Installation instructions

Project Structure

📦KoroKoro
├─ .gitignore
├─ Dockerfile
├─ KoroKoro
│  ├─ __init__.py
│  ├─ components
│  │  ├─ __init__.py
│  │  ├─ data_ingestion.py
│  │  ├─ data_processing.py
│  │  ├─ data_transformation.py
│  │  ├─ initialization.py
│  │  ├─ model_trainer.py
│  │  └─ post_processing.py
│  ├─ config
│  │  ├─ __init__.py
│  │  └─ configuration.py
│  ├─ entity
│  │  └── __init__.py
│  ├─ logger.py
│  ├─ pipeline
│  │  ├── __init__.py
│  │  ├── local.py
│  │  ├── stage_01.py
│  │  └── stage_02.py
│  └─ utils
│     ├─ __init__.py
│     └─ constants.py
├─ GroundingDINO
│  ├─ groundingdino
│  │  ├─ __init__.py
│  │  ├─ config
│  │  ├─ datasets
│  │  ├─ models
│  │  └─ util
│  ├─ LICENSE
├─ config
│  └── config.yaml
├─ docker-compose.yml
├─ README.md
├─ requirements.txt
├─ setup.py
└─ setup.sh

Improvements - v1 to v2

Input	KoroKoro Version 1	KoroKoro Version 2

Setup Time	45 minutes	15 minutes
Processing Time	25 minutes	5 minutes
Training Time	5 minutes	5 minutes

Contributing

There are areas where this project can be improved including

Incorporate Trellis
Lighterweight .obj files -> right now, the resulting obj models are heavy (> 100MB) and I have to use sharding to save them in Supabase's storage bucket which limits file uploads to 50MB
Use Segment Anything to improve segmentation

Please reach out to me @ suhayrid6@gmail.com, I'd be happy to walk you through the project, including the Supabase database configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KoroKoro 👀

Introduction

How does it work?

Video Ingestion & Processing

Image Transformation

3D Reconstruction

Prerequisites

Installation

Install conda if not installed

General setup

Run Locally

Run local pipeline

Run with Supabase Database Connection

Google Colab

Project Structure

Improvements - v1 to v2

Contributing

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
GroundingDINO		GroundingDINO
KoroKoro		KoroKoro
config		config
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.py		setup.py
setup.sh		setup.sh

Daheer/KoroKoro

Folders and files

Latest commit

History

Repository files navigation

KoroKoro 👀

Introduction

How does it work?

Video Ingestion & Processing

Image Transformation

3D Reconstruction

Prerequisites

Installation

Install conda if not installed

General setup

Run Locally

Run local pipeline

Run with Supabase Database Connection

Google Colab

Project Structure

Improvements - v1 to v2

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages