cmusatyalab · rgyhuang · Dec 16, 2023 · Dec 21, 2023 · Dec 21, 2023 · Dec 21, 2023
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,8 @@
 **/__pycache__/
+
 compare_aligned_idx_copy.py
 *.ipynb
+*.pkl
 quetzal/external/RANSAC_flow/model/pretrained/
 # src/external/RANSAC_flow/model/pretrained/
 weights/
@@ -12,4 +14,14 @@ gradio_UX.py
 # Builds
 **/*.egg-info/
 **/build/
-**/dist/
+**/dist/
+
+# dataset
+**/data/
+data.tar
+
+# annotations
+**/annotations/
+
+# secrets
+.env
diff --git a/README.md b/README.md
@@ -2,21 +2,24 @@
 
 Quetzal provides automatic frame alignment between two drone footages taken along similar routes.
 
-Quetzal offers automated frame alignment for drone footage captured along similar routes. The Quetzal app features a file-system GUI designed for organizing and sharing various projects and videos, enabling users to compare two videos from the database. Additionally, it incorporates zero-shot object detection ([GroundingSAM](https://github.com/IDEA-Research/Grounded-Segment-Anything)), allowing users to search for objects within video frames based on text prompts provided.
+Quetzal offers automated frame alignment for drone footage captured along similar routes. The Quetzal app features a file-system GUI designed for organizing and sharing various projects and videos, enabling users to compare two videos from the database. Additionally, it incorporates zero-shot object detection ([GroundingSAM](https://github.com/IDEA-Research/Grounded-Segment-Anything)), allowing users to search for objects within video frames based on text prompts provided and annotate frames with interesting observations or changes.
 
 
 **File-explorer view**
 
-<img src="./demo/demo_file_explorer.png" height="400" />
+<img src="./demo/demo_file_explorer.png" height="400" width="650" />
 
 **Video Comparison View**
 
-<img src="./demo/demo_frame_alignment.png" height="400" />
+<img src="./demo/demo_frame_alignment.png" height="400" width="650" />
 
 **Object Detection View**
 
-<img src="./demo/demo_object_detection.png" height="400" />
+<img src="./demo/demo_object_detection.png" height="400" width="650"/>
 
+**Object Annotation View**
+
+<img src="./demo/demo_object_annotation.png" height="400" width="650">
 
 
 **Future Plans:** Support change detection between two drone footages.
@@ -77,15 +80,15 @@ run ./mkdocs.sh on root dir
 To run the Quetzal app on a GPU server, use the following command:
 
 ```bash
-python3 -m quetzal_app -l "127.0.0.1" -p PORT_NUM --dataset-root /data/root --metadata-root /meta_data/root --cuda --cuda-device 0 -u USER_NAME
+python3 -m quetzal_app -l "127.0.0.1" -p PORT_NUM --dataset-root ./data/root --metadata-root ./meta_data/root --cuda --cuda-device 0 -u USER_NAME
 ```
 
 ### CPU Only:
 
 For systems without a GPU, execute the following command:
 
 ```bash
-python3 -m quetzal_app -l "127.0.0.1" -p PORT_NUM --dataset-root /data/root --metadata-root /meta_data/root -u USER_NAME
+python3 -m quetzal_app -l "127.0.0.1" -p PORT_NUM --dataset-root ./data/root --metadata-root ./meta_data/root -u USER_NAME
 ```
 
 The default user is "default_user".
@@ -94,24 +97,24 @@ The default user is "default_user".
 Quetzal provides a CLI for setting up the dataset directory. You can use the following commands to initialize the directory, add a user, or import/convert an existing directory to create a Quetzal-compatible meta-dataset directory.
 
 ### Initialize Dataset and Meta-dataset Directories
-To initialize the dataset folder at "/data/root" and meta-dataset at "/meta_data/root":
+To initialize the dataset folder at "./data/root" and meta-dataset at "./meta_data/root":
 
 ```bash
-python3 -m quetzal init -d /data/root -m /meta_data/root
+python3 -m quetzal init -d ./data/root -m ./meta_data/root
 ```
 
 ### Add a User Directory for the Dataset
-To add a user directory for the dataset at "/data/root":
+To add a user directory for the dataset at "./data/root":
 
 ```bash
-python3 -m quetzal user -d /data/root -m /meta_data/root -u USER_NAME
+python3 -m quetzal user -d ./data/root -m ./meta_data/root -u USER_NAME
 ```
 
 ### Import Dataset and Create Meta-dataset
-To import a dataset from "/data/root" and create a meta-dataset at "/meta_data/root":
+To import a dataset from "./data/root" and create a meta-dataset at "./meta_data/root":
 
 ```bash
-python3 -m quetzal import -d /data/root -m /meta_data/root
+python3 -m quetzal import -d ./data/root -m ./meta_data/root
 ```
 
 The default root dataset directory is "./data/home/root," and the default meta-dataset directory is "./data/meta_data/root."
@@ -183,14 +186,29 @@ The File-system GUI of Quetzal mirrors the user-friendly design found in many po
 
 To enter a directory you've previously selected, simply click on it again. To navigate back or to a different parent directory, use the breadcrumb navigation at the top.
 
-For video files, the information section includes an "Analyze" option. This feature allows you to register the video for comparison. Once analyzed, you have the choice to mark the video as a "Use as Query" or "Use as Database" for comparison purposes. After selecting one video for each category, proceed by clicking "RUN COMPARISON" on the left side of the screen to move to the comparison page
+For video files, the information section includes an "Analyze" option. This feature allows you to register the video for comparison. Once analyzed, you have the choice to mark the video as a "Use as Query" or "Use as Database" for comparison purposes. After selecting one video for each category, proceed by clicking "RUN COMPARISON" on the left side of the screen to move to the comparison page. To run the real-time matching algorithm, fill out the information to your redis server in `./quetzal_app/page/page_video_comparison_stream.py` (line 485), select your query and database videos, and click "STREAM MATCHING".
 
 The comparison page showcases the frame from the query video on the left and the aligned frame from the database video on the right, arranged side by side. Additionally, for the image on the right, a slider is provided, enabling you to overlay and compare the two frames directly.
 
-To delve into zero-shot object detection on the current frame, click on the "object-detection" tab.
+To delve into zero-shot object detection on the current frame, click on the "object-detection" tab. 
+
+To annotate images, click on the "object-annotation" tab. Here, users can use zero-shot object detection to generate editable bounding boxes. After editing the bounding boxes, users can also segment the image. Saving the annotation creates a .json file with the following format:
+```json
+{
+        "image_query": "original query image (base64)",
+        "image_db": "original database image (base64)",
+        "bboxes_query": "bounding box annotations for query image",
+        "bboxes_db": "bounding box annotations for database image",
+        "mask_query": "2D list with query masks",
+        "mask_db": "2D list with database masks",
+        "annotated_query": "query image with masks and bounding boxes printed onto the image",
+        "annotated_db": "database image with masks and bounding boxes printed onto the image",
+        "mask_combined": "2D array with both query and database masks combined",
+}
+```
 
 ## Developing Extensions
 To ensure backend algorithms are compatible with Quetzal_app, modifications to files/directories should utilize `quetzal.dtos.dtos.QuetzalFile` and `quetzal.dtos.video.Video` objects. For guidance on generating cached/meta data associated with specific files, please refer to the modules `quetzal.engines.vpr_engine.anyloc_engine` and `quetzal.engines.image_registration_engine.loftr_engine`. These references will provide insights into how and where to manage cached/meta data effectively.
 
 
-To expand the selection of Object Detection models, please refere to the `quetzal.engines.detection_engine.grounding_sam_engine` and its implementation within `quetzal_app.page.video_comparison_controller`. To integrate your custom object detection model, follow the structure of the abstract class defined in `quetzal.engines.engine`. Then, register your model by adding it to the `detector_dict` within `quetzal_app.page.video_comparison_controller`.
+To expand the selection of Object Detection models, please refere to the `quetzal.engines.detection_engine.grounding_sam_engine` and its implementation within `quetzal_app.page.video_comparison_controller`. To integrate your custom object detection model, follow the structure of the abstract class defined in `quetzal.engines.engine`. Then, register your model by adding it to the `detector_dict` within `quetzal_app.page.video_comparison_controller`.
diff --git a/compute_vlad.py b/compute_vlad.py
@@ -0,0 +1,127 @@
+from quetzal.dtos.video import *
+import logging
+from quetzal.engines.vpr_engine.anyloc_engine import AnyLocEngine
+import torch
+import sys
+
+# Compute_vlad - For "Detection of Tunable and Explainable Salient Changes".
+logging.basicConfig()
+logger = logging.getLogger("compute_vlad")
+logger.setLevel(logging.DEBUG)
+
+import argparse
+
+dataset_layout_help = """
+Your dataset directory will be structured as following
+Place your desired video files in dataset_root/route_name/raw_videos/
+
+    Dataset structure:
+    dataset_root/
+    |
+    ├── route_name/
+    |   ├── raw_video/
+    |   |   ├── video_name.mp4
+    |   |   └── ...
+    |   |
+    |   ├── database/
+    |   |   ├── video_name/
+    |   |   |   ├── frames_{fps}_{resolution}/
+    |   |   |   |   ├── frame_%05d.jpg
+    |   |   |   |   └── ...
+    |   |   |   └── ...
+    |   |   └── ...
+    |   |
+    |   ├── query/
+    |   |   ├── video_name/
+    |   |   |   ├── frames_{fps}_{resolution}/
+    |   |   |   |   ├── frame_%05d.jpg
+    |   |   |   |   └── ...
+    |   |   |   └── ...
+    |   |   └── ...
+    |   └── ...
+    └── ...
+    """
+
+def generate_VLAD(database_video: Video, query_video: Video, torch_device):
+    """
+    Generates VLAD descriptors for the given database and query videos.
+
+    Args:
+        database_video (Video): The video object representing the database video.
+        query_video (Video): The video object representing the query video.
+        torch_device (torch.device): The PyTorch device to use for computations.
+
+    Returns:
+        Tuple[np.ndarray, np.ndarray]: A tuple containing the VLAD descriptors for the database and query videos.
+    """
+
+    logger.info("Loading Videos")
+    anylocEngine = AnyLocEngine(
+        database_video=database_video,
+        query_video=query_video,
+        device=torch_device,
+        mode="lazy",
+    )
+
+    db_vlad = anylocEngine.get_database_vlad()
+    query_vlad = anylocEngine.get_query_vlad()
+    del anylocEngine
+
+    return db_vlad, query_vlad
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        formatter_class=argparse.RawTextHelpFormatter,
+        description="This program computes  ",
+        epilog=dataset_layout_help,
+    )
+    # ... add arguments to parser ...
+    parser.add_argument(
+        "--dataset-root", default="../data", help="Root directory of datasets"
+    )
+    parser.add_argument("--route-name", required=True, help="Name of the route")
+    parser.add_argument("--database-video", help="Database video file name")
+    parser.add_argument("--query-video", help="Query video file name")
+    parser.add_argument(
+        "--cuda", action="store_true", help="Enable cuda", default=False
+    )
+    parser.add_argument("--cuda_device", help="Select cuda device", default=0, type=int)
+
+    args = parser.parse_args()
+
+    if not (args.database_video or args.query_video):
+        parser.print_usage()
+        print("Error: Either --database-video or --query-video must be provided.")
+        sys.exit(1)
+
+    device = torch.device("cpu")
+    available_gpus = torch.cuda.device_count()
+    if args.cuda and available_gpus > 0:
+        cuda_device = args.cuda_device if args.cuda_device < available_gpus else 0
+        device = torch.device("cuda:" + str(cuda_device))
+
+    ## Initialize System
+
+    # Load Video frames
+    logger.info("Loading Videos")
+    database_video, query_video = None, None
+
+    if args.database_video:
+        database_video = DatabaseVideo(
+            datasets_dir=args.dataset_root,
+            route_name=args.route_name,
+            video_name=args.database_video,
+        )
+    if args.query_video:
+        query_video = QueryVideo(
+            datasets_dir=args.dataset_root,
+            route_name=args.route_name,
+            video_name=args.query_video,
+        )
+
+    generate_VLAD(database_video, query_video, device)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/demo/demo_file_explorer.png b/demo/demo_file_explorer.png
diff --git a/demo/demo_frame_alignment.png b/demo/demo_frame_alignment.png
diff --git a/demo/demo_object_annotation.png b/demo/demo_object_annotation.png
diff --git a/demo/demo_object_detection.png b/demo/demo_object_detection.png