-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prompt Depth Anything Model #35401
base: main
Are you sure you want to change the base?
Add Prompt Depth Anything Model #35401
Conversation
@NielsRogge @qubvel @pcuenca Could you help review this PR when you have some time? Thanks so much in advance! Let me know if you have any questions or suggestions. 😊 |
tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
Hi @haotongl! Thanks for working on the model integration to transformers 🤗 I'm on holidays until Jan 3rd, and I'll do a review after that if it's still necessary. |
src/transformers/models/prompt_depth_anything/modeling_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
src/transformers/models/prompt_depth_anything/modeling_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Joshua Lochner <admin@xenova.com>
Hi, @xenova @NielsRogge ! All suggestions have been addressed. Could you please take another look and provide any further suggestions, or go ahead and merge this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on the model addition 🤗 Great work in terms of the model and in terms of porting to transformers! Please see the comments below
src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
if prompt_depth is not None: | ||
# prompt_depth is a list of images with shape (height, width) | ||
# we need to convert it to a list of images with shape (1, height, width) | ||
prompt_depths = make_list_of_images(prompt_depth) | ||
prompt_depths = [to_numpy_array(depth) for depth in prompt_depths] | ||
prompt_depths = [depth * self.prompt_scale_to_meter for depth in prompt_depths] | ||
prompt_depths = [prompt_depth[..., None].astype(np.float32) for prompt_depth in prompt_depths] | ||
prompt_depths = [ | ||
to_channel_dimension_format(depth, data_format, input_channel_dim=input_data_format) | ||
for depth in prompt_depths | ||
] | ||
data["prompt_depth"] = prompt_depths | ||
return BatchFeature(data=data, tensor_type=return_tensors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we resize/pad prompt depth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also from the paper:
Depth normalization. The irregular range of input depth
data can hinder network convergence. To address this, we
normalize the LiDAR data using linear scaling to the range
[0, 1], based on its minimum and maximum values. The network output is also normalized with the same scaling factor
from LiDAR data, ensuring consistent scales and facilitating easier convergence during training.
Should this normalization be added to the preprocessing?
+ we will need offset/scale to make backward transformation of predicted depth in post-processing method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qubvel The size of the prompt depth should remain unchanged to preserve its original information, allowing the model to fully utilize the prompt depth data to the greatest extent. Additionally, the prompt depth should not be normalized, as maintaining the original depth range is essential during runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, for the current checkpoint, the size is {'height': 756, 'width': 756}
, which is a multiple of 14, and the image is not padded. However in case one would like to run the model on another image size that is not multiple of 14 the image will be padded, however, the depth map will not, let's assume the following:
image size: 512x512
padded image size: 518x518 (6 empty pixels on the bottom and right borders)
prompt_depth size: 256x256 (does not have empty pixels on borders to align with padded image)
probably, not a big deal, cause we are not merging image and prompt depth directly, but merging features, however might lead to a slight shift of image features relative to prompt depth features
src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py
Outdated
Show resolved
Hide resolved
…and related explanations for better clarity and functionality.
…ular prompt depth anything
@qubvel @NielsRogge All suggestions have been addressed. Could you please take another look and provide any further suggestions, or go ahead and merge this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, great work! We are getting close to merge it 🤗 Please, see the comment below
prompt_depth: ImageInput = None, | ||
do_resize: bool = None, | ||
size: int = None, | ||
keep_aspect_ratio: bool = None, | ||
ensure_multiple_of: int = None, | ||
resample: PILImageResampling = None, | ||
do_rescale: bool = None, | ||
rescale_factor: float = None, | ||
do_normalize: bool = None, | ||
image_mean: Optional[Union[float, List[float]]] = None, | ||
image_std: Optional[Union[float, List[float]]] = None, | ||
do_pad: bool = None, | ||
size_divisor: int = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add Optional[]
for args with None default
def constrain_to_multiple_of(val, multiple, min_val=0, max_val=None): | ||
x = round(val / multiple) * multiple | ||
|
||
if max_val is not None and x > max_val: | ||
x = math.floor(val / multiple) * multiple | ||
|
||
if x < min_val: | ||
x = math.ceil(val / multiple) * multiple | ||
|
||
return x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets move this out of function scope + make private with _constrain_to_multiple_of
logger = logging.get_logger(__name__) | ||
|
||
|
||
def get_resize_output_image_size( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add short docstring and make it private
def get_resize_output_image_size( | |
def _get_resize_output_image_size( |
|
||
def pad_image( | ||
self, | ||
image: np.array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image: np.array, | |
image: np.ndarray, |
return_tensors: Optional[Union[str, TensorType]] = None, | ||
data_format: ChannelDimension = ChannelDimension.FIRST, | ||
input_data_format: Optional[Union[str, ChannelDimension]] = None, | ||
) -> PIL.Image.Image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> PIL.Image.Image: | |
) -> BatchFeature: |
inputs = image_processor(images=image, return_tensors="pt").to(torch_device) | ||
|
||
with torch.no_grad(): | ||
outputs = model(pixel_values=inputs.pixel_values, prompt_depth=prompt_depth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputs = model(**inputs)
@require_vision | ||
@slow | ||
class PromptDepthAnythingModelIntegrationTest(unittest.TestCase): | ||
def test_inference(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add test for both cases: with and without prompt depth
exported_program = torch.export.export( | ||
model, | ||
args=(inputs["pixel_values"],), | ||
strict=strict, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should work also for prompt_depth
if we move invalid_mask.any() to processor
.to(torch_device) | ||
.eval() | ||
) | ||
image_processor = DPTImageProcessor.from_pretrained("depth-anything/prompt-depth-anything-vits-hf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong image processor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need tests for image processor as well (in a separate file)
What does this PR do?
This PR adds the Prompt Depth Anything Model. Prompt Depth Anything builds upon Depth Anything V2 and incorporates metric prompt depth to enable accurate and high-resolution metric depth estimation.
The implementation leverages Modular Transformers. The main file can be found here.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.