Add Prompt Depth Anything Model #35401

haotongl · 2024-12-23T17:15:09Z

What does this PR do?

This PR adds the Prompt Depth Anything Model. Prompt Depth Anything builds upon Depth Anything V2 and incorporates metric prompt depth to enable accurate and high-resolution metric depth estimation.

The implementation leverages Modular Transformers. The main file can be found here.

Before submitting

[ N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ✅] Did you read the [contributor guideline] (https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request),
Pull Request section?
[ N/A] Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
[ ✅] Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
[ ✅] Did you write any new necessary tests?

…nything

haotongl · 2024-12-24T04:21:29Z

@NielsRogge @qubvel @pcuenca Could you help review this PR when you have some time? Thanks so much in advance! Let me know if you have any questions or suggestions. 😊

docs/source/en/_toctree.yml

src/transformers/models/prompt_depth_anything/__init__.py

tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py

qubvel · 2024-12-24T11:37:17Z

Hi @haotongl! Thanks for working on the model integration to transformers 🤗 I'm on holidays until Jan 3rd, and I'll do a review after that if it's still necessary.

docs/source/en/model_doc/prompt_depth_anything.md

src/transformers/models/prompt_depth_anything/modeling_prompt_depth_anything.py

docs/source/en/model_doc/prompt_depth_anything.md

Co-authored-by: Joshua Lochner <admin@xenova.com>

haotongl · 2025-01-03T16:56:07Z

Hi, @xenova @NielsRogge ! All suggestions have been addressed. Could you please take another look and provide any further suggestions, or go ahead and merge this PR? Thanks!

docs/source/en/model_doc/prompt_depth_anything.md

qubvel

Thanks for working on the model addition 🤗 Great work in terms of the model and in terms of porting to transformers! Please see the comments below

src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

qubvel · 2025-01-06T17:22:38Z

src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

+        if prompt_depth is not None:
+            # prompt_depth is a list of images with shape (height, width)
+            # we need to convert it to a list of images with shape (1, height, width)
+            prompt_depths = make_list_of_images(prompt_depth)
+            prompt_depths = [to_numpy_array(depth) for depth in prompt_depths]
+            prompt_depths = [depth * self.prompt_scale_to_meter for depth in prompt_depths]
+            prompt_depths = [prompt_depth[..., None].astype(np.float32) for prompt_depth in prompt_depths]
+            prompt_depths = [
+                to_channel_dimension_format(depth, data_format, input_channel_dim=input_data_format)
+                for depth in prompt_depths
+            ]
+            data["prompt_depth"] = prompt_depths
+        return BatchFeature(data=data, tensor_type=return_tensors)


should we resize/pad prompt depth?

Also from the paper:

Depth normalization. The irregular range of input depth data can hinder network convergence. To address this, we normalize the LiDAR data using linear scaling to the range [0, 1], based on its minimum and maximum values. The network output is also normalized with the same scaling factor from LiDAR data, ensuring consistent scales and facilitating easier convergence during training.

Should this normalization be added to the preprocessing?
+ we will need offset/scale to make backward transformation of predicted depth in post-processing method

@qubvel The size of the prompt depth should remain unchanged to preserve its original information, allowing the model to fully utilize the prompt depth data to the greatest extent. Additionally, the prompt depth should not be normalized, as maintaining the original depth range is essential during runtime.

Ok, for the current checkpoint, the size is {'height': 756, 'width': 756}, which is a multiple of 14, and the image is not padded. However in case one would like to run the model on another image size that is not multiple of 14 the image will be padded, however, the depth map will not, let's assume the following:

image size: 512x512
padded image size: 518x518 (6 empty pixels on the bottom and right borders)

prompt_depth size: 256x256 (does not have empty pixels on borders to align with padded image)

probably, not a big deal, cause we are not merging image and prompt depth directly, but merging features, however might lead to a slight shift of image features relative to prompt depth features

src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py

tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py

…and related explanations for better clarity and functionality.

…ular prompt depth anything

haotongl · 2025-01-07T16:30:23Z

@qubvel @NielsRogge All suggestions have been addressed. Could you please take another look and provide any further suggestions, or go ahead and merge this PR? Thanks!

qubvel

Thanks for the update, great work! We are getting close to merge it 🤗 Please, see the comment below

qubvel · 2025-01-09T12:15:06Z

src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

+        prompt_depth: ImageInput = None,
+        do_resize: bool = None,
+        size: int = None,
+        keep_aspect_ratio: bool = None,
+        ensure_multiple_of: int = None,
+        resample: PILImageResampling = None,
+        do_rescale: bool = None,
+        rescale_factor: float = None,
+        do_normalize: bool = None,
+        image_mean: Optional[Union[float, List[float]]] = None,
+        image_std: Optional[Union[float, List[float]]] = None,
+        do_pad: bool = None,
+        size_divisor: int = None,


Please add Optional[] for args with None default

qubvel · 2025-01-09T12:19:23Z

src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

+    def constrain_to_multiple_of(val, multiple, min_val=0, max_val=None):
+        x = round(val / multiple) * multiple
+
+        if max_val is not None and x > max_val:
+            x = math.floor(val / multiple) * multiple
+
+        if x < min_val:
+            x = math.ceil(val / multiple) * multiple
+
+        return x


lets move this out of function scope + make private with _constrain_to_multiple_of

qubvel · 2025-01-09T12:19:43Z

src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

+logger = logging.get_logger(__name__)
+
+
+def get_resize_output_image_size(


Please add short docstring and make it private

Suggested change

def get_resize_output_image_size(

def _get_resize_output_image_size(

qubvel · 2025-01-09T12:20:46Z

src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

+
+    def pad_image(
+        self,
+        image: np.array,


Suggested change

image: np.array,

image: np.ndarray,

qubvel · 2025-01-09T12:21:46Z

src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

+        return_tensors: Optional[Union[str, TensorType]] = None,
+        data_format: ChannelDimension = ChannelDimension.FIRST,
+        input_data_format: Optional[Union[str, ChannelDimension]] = None,
+    ) -> PIL.Image.Image:


Suggested change

) -> PIL.Image.Image:

) -> BatchFeature:

qubvel · 2025-01-09T16:57:19Z

tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py

+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
+
+        with torch.no_grad():
+            outputs = model(pixel_values=inputs.pixel_values, prompt_depth=prompt_depth)


outputs = model(**inputs)

qubvel · 2025-01-09T16:58:04Z

tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py

+@require_vision
+@slow
+class PromptDepthAnythingModelIntegrationTest(unittest.TestCase):
+    def test_inference(self):


lets add test for both cases: with and without prompt depth

qubvel · 2025-01-09T16:59:00Z

tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py

+                exported_program = torch.export.export(
+                    model,
+                    args=(inputs["pixel_values"],),
+                    strict=strict,
+                )


should work also for prompt_depth if we move invalid_mask.any() to processor

qubvel · 2025-01-09T16:59:30Z

tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py

+                    .to(torch_device)
+                    .eval()
+                )
+                image_processor = DPTImageProcessor.from_pretrained("depth-anything/prompt-depth-anything-vits-hf")


wrong image processor?

qubvel · 2025-01-09T17:02:11Z

tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py

We need tests for image processor as well (in a separate file)

haotongl added 5 commits December 23, 2024 20:57

add prompt depth anything model by modular transformer

24151d8

add prompt depth anything docs and imports

7e6dcaa

update code style according transformers doc

dfa7d67

update code style: import order issue is fixed by custom_init_isort

8509440

fix depth shape from B,1,H,W to B,H,W which is as the same as Depth A…

2fa72ef

…nything

NielsRogge reviewed Dec 24, 2024

View reviewed changes

docs/source/en/_toctree.yml Outdated Show resolved Hide resolved

NielsRogge reviewed Dec 24, 2024

View reviewed changes

src/transformers/models/prompt_depth_anything/__init__.py Show resolved Hide resolved

NielsRogge reviewed Dec 24, 2024

View reviewed changes

tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py Outdated Show resolved Hide resolved

haotongl added 3 commits December 24, 2024 20:37

move prompt depth anything to vision models in _toctree.yml

d13a55f

update backbone test; there is no need for resnet18 backbone test

6cd1bbf

update init file & pass RUN_SLOW tests

76299f4

NielsRogge reviewed Dec 24, 2024

View reviewed changes

docs/source/en/model_doc/prompt_depth_anything.md Outdated Show resolved Hide resolved

NielsRogge reviewed Dec 24, 2024

View reviewed changes

docs/source/en/model_doc/prompt_depth_anything.md Outdated Show resolved Hide resolved

xenova reviewed Dec 24, 2024

View reviewed changes

src/transformers/models/prompt_depth_anything/modeling_prompt_depth_anything.py Outdated Show resolved Hide resolved

xenova reviewed Dec 25, 2024

View reviewed changes

src/transformers/models/prompt_depth_anything/modeling_prompt_depth_anything.py Outdated Show resolved Hide resolved

xenova reviewed Dec 25, 2024

View reviewed changes

docs/source/en/model_doc/prompt_depth_anything.md Outdated Show resolved Hide resolved

This was referenced Dec 25, 2024

Add ONNX export support for depth anything and prompt depth anything huggingface/optimum#2139

Draft

[WIP] Add support for prompt depth anything huggingface/transformers.js#1113

Draft

haotongl and others added 7 commits December 25, 2024 14:38

update len(prompt_depth) to prompt_depth.shape[0]

2315dd1

Co-authored-by: Joshua Lochner <admin@xenova.com>

fix torch_int/model_doc

c423e91

fix typo

739c07f

update PromptDepthAnythingImageProcessor

5c046e8

fix typo

f3a8aa4

fix typo for prompt depth anything doc

c2647ca

update promptda overview image link of huggingface repo

ea67b90

haotongl requested review from NielsRogge and xenova January 2, 2025 16:57

NielsRogge reviewed Jan 6, 2025

View reviewed changes

docs/source/en/model_doc/prompt_depth_anything.md Outdated Show resolved Hide resolved

NielsRogge reviewed Jan 6, 2025

View reviewed changes

docs/source/en/model_doc/prompt_depth_anything.md Outdated Show resolved Hide resolved

NielsRogge reviewed Jan 6, 2025

View reviewed changes

docs/source/en/model_doc/prompt_depth_anything.md Outdated Show resolved Hide resolved

NielsRogge requested a review from qubvel January 6, 2025 08:24

fix some typos in promptda doc

b2379d6

haotongl requested a review from NielsRogge January 6, 2025 09:19

qubvel reviewed Jan 6, 2025

View reviewed changes

haotongl added 8 commits January 7, 2025 20:01

Update image processing to include pad_image, prompt depth position, …

b9a44fb

…and related explanations for better clarity and functionality.

add copy disclaimer for prompt depth anything image processing

dfee43f

fix some format typos in image processing and conversion scripts

db9f301

fix nn.ReLU(False) to nn.ReLU()

8d0a435

rename residual layer as it's a sequential layer

89956c4

move size compute to a separate line/variable for easier debug in mod…

c713a5e

…ular prompt depth anything

fix modular format for prompt depth anything

777c367

update modular prompt depth anything

cc8f4ac

haotongl requested a review from qubvel January 7, 2025 16:29

qubvel reviewed Jan 9, 2025

View reviewed changes

qubvel added New model Vision run-slow labels Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Prompt Depth Anything Model #35401

Add Prompt Depth Anything Model #35401

haotongl commented Dec 23, 2024

haotongl commented Dec 24, 2024 •

edited

Loading

qubvel commented Dec 24, 2024

haotongl commented Jan 3, 2025

qubvel left a comment

qubvel Jan 6, 2025

qubvel Jan 6, 2025 •

edited

Loading

haotongl Jan 7, 2025 •

edited

Loading

qubvel Jan 9, 2025

haotongl commented Jan 7, 2025

qubvel left a comment

qubvel Jan 9, 2025

qubvel Jan 9, 2025

qubvel Jan 9, 2025 •

edited

Loading

qubvel Jan 9, 2025

qubvel Jan 9, 2025

qubvel Jan 9, 2025

qubvel Jan 9, 2025

qubvel Jan 9, 2025

qubvel Jan 9, 2025

qubvel Jan 9, 2025

		logger = logging.get_logger(__name__)


		def get_resize_output_image_size(

	def get_resize_output_image_size(
	def _get_resize_output_image_size(

Add Prompt Depth Anything Model #35401

Are you sure you want to change the base?

Add Prompt Depth Anything Model #35401

Conversation

haotongl commented Dec 23, 2024

What does this PR do?

Before submitting

haotongl commented Dec 24, 2024 • edited Loading

qubvel commented Dec 24, 2024

haotongl commented Jan 3, 2025

qubvel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qubvel Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

haotongl Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haotongl commented Jan 7, 2025

qubvel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qubvel Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haotongl commented Dec 24, 2024 •

edited

Loading

qubvel Jan 6, 2025 •

edited

Loading

haotongl Jan 7, 2025 •

edited

Loading

qubvel Jan 9, 2025 •

edited

Loading