Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Camera parameters #24

Closed
Ainaz99 opened this issue May 14, 2021 · 27 comments
Closed

Camera parameters #24

Ainaz99 opened this issue May 14, 2021 · 27 comments

Comments

@Ainaz99
Copy link

Ainaz99 commented May 14, 2021

Hi,

I'm trying to render surface normals from the meshes in Blender, using the camera parameters provided. I scale the mesh using meters_per_asset_unit. I use camera_keyframe_positions.hdf5 for the camera location and get the camera euler angles from camera_keyframe_orientations.hdf5' with fov = pi/3 = 1.0471975511965976.
But my renderings do not match the color images from the dataset for only some specific scenes. Is there any other camera parameter I'm missing?
Here's an example for ai_037_002, cam_00, frame.0000:
image

Thanks for your help.

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented May 14, 2021

UPDATE (Jan 9 2022): This issue has been resolved. See contrib/mikeroberts3000 for details.

Hi! Great question.

It sounds like you're doing everything right. I have noticed this problem occasionally also. It appears to affect a small handful of our scenes, but I don't have a complete list. The problem occurs because there is occasionally an additional offset in the V-Ray scene file that gets applied to the camera parameters during photorealistic rendering. I believe the relevant parameters are the "Horizontal Shift" and "Vertical Shift" parameters described in the V-Ray documentation. If I had known about these unpleasant parameters prior to rendering our images, I certainly would have explicitly hard-coded them to 0 for each scene.

Perhaps you can help with this issue. (You are especially well-positioned to help because your independent rendering infrastructure with Blender is already set up.) I'm assuming that you have access to the source assets, and you have run the pre-processing phase of our pipeline. Otherwise, how could you do your own rendering in Blender? Anyway, there are several ways to proceed.

First, can you manually confirm that the scene you're experimenting with has non-zero values for these shift parameters in its exported vrscene file?

Second, given some non-zero shift parameters from a vrscene file, as well as our camera positions and camera orientations, can you compute the correct camera parameters such that your Blender rendering matches our ground truth rendering? I don't know exactly how to do this, because I don't know exactly what the shift parameters mean. Are they describing some kind of off-center perspective projection? What changes are required in your Blender setup to match our pre-rendered ground truth images pixel-perfectly?

Third, do you have a complete list of scenes that are affected? It should be straightforward to parse each vrscene file and search for any non-zero shift parameters. See this code for an example of how to programmatically access the cameras in a vrscene file. In addition to these shift parameters, there are also tilt parameters. Are any scenes affected by non-zero tilt parameters?

Haha sorry for hijacking your thread with all of these to-do items 😅 I wish I had more answers and fewer questions. But I figured this is the right place to document my incomplete understanding of the problem, and to highlight possible next steps.

@rpautrat
Copy link

rpautrat commented Aug 3, 2021

Hi @mikeroberts3000,

I am also interested in multi-view applications based on Hypersim, and I stumbled upon the same issue as Ainaz99, Frank-Mic and rikba. It seems that a considerable amount of scenes are concerned with the shifted camera parameters and are thus unusable for multi-view applications.

From the reprojections tests that I have run, I think that non-zero tilt offsets are also involved, in addition to the shift offsets. Unfortunately I do not have access to the source assets and cannot run the steps that you suggested to investigate this issue. Did you have by chance time to investigate it and got fresh news about it?

If not, would it be possible to publicly expose the VRay tilt and shift parameters for each scene and cameras? I think that based on this information, I would be able to retrieve the correct camera position and orientation.

Overall I would be happy to contribute to the resolution of this problem, but I feel that some information coming from the source assets are necessary to achieve it, which I unfortunately do not have...

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Aug 3, 2021

Hi @rpautrat, I agree with your assessment that it is necessary to expose some additional information from the source assets. Once this information is exposed, it should be possible to derive the correct camera intrinsics for the scenes that are affected by this issue. I'll follow up with you offline and maybe we can work on this together.

@Ainaz99
Copy link
Author

Ainaz99 commented Sep 7, 2021

Hi @mikeroberts3000 @rpautrat ! Are there any updates on the correct camera parameters?

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Sep 7, 2021

Hi @Ainaz99, I'm happy to say that we're making solid progress. @rpautrat has been doing a bunch of great experiments to figure all of this out.

We identified four V-Ray parameters that can affect the camera intrinsics, and we have an accurate mathematical model of what each of these parameters does in isolation. Roughly speaking, each parameter translates or rotates the image plane in camera space. But we don't have a model of how the parameters interact with each other, and there are many possible conventions (translation then rotation, rotation then translation, different Euler angle conventions, etc). We reached out to Chaos Group, and we're waiting for them to tell us the exact order of operations.

Suppose we knew exactly how to transform the image plane as a function of our V-Ray parameters. Let's call this transformation T. We have sketched out a derivation for the non-standard projection matrix P, computed as a function of T, that correctly projects points from world space to image space. This projection matrix P can then be used as a drop-in replacement for the usual pinhole camera perspective projection matrix in graphics and vision applications.

To summarize, we think we have a solid understanding of this issue, but we're waiting for Chaos Group to tell us exactly how to compute T based on the V-Ray parameters. If you're super motivated to get this issue resolved, and you don't want to wait for Chaos Group to get back to us, I'd be happy to send you the Jupyter notebooks that we've been using in our experiments. You don't need to install V-Ray to run our notebooks, and you can try to compute T from the V-Ray parameters with a brute force guess-and-check strategy.

@Ainaz99
Copy link
Author

Ainaz99 commented Sep 13, 2021

Thanks @mikeroberts3000, happy to hear you made progress in solving the issue. Do you have any estimate on how long it will take the Chaos Group to answer you?

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Sep 14, 2021

@Ainaz99 no estimate from Chaos Group. I'll definitely post here when we hear back from them. In the meantime, the invitation is still open for you to experiment with our notebooks and attempt to compute the correct transformation 🤓

An alternative to the guess-and-check strategy would be to set up a simple scene, and infer the correct transformation from rendered observations. It would then be possible to collect a training set of (camera parameter, transformation) pairs, and fit a function to the training set.

@liuzhy71
Copy link

liuzhy71 commented Oct 15, 2021

@Ainaz99 no estimate from Chaos Group. I'll definitely post here when we hear back from them. In the meantime, the invitation is still open for you to experiment with our notebooks and attempt to compute the correct transformation nerd_face

An alternative to the guess-and-check strategy would be to set up a simple scene, and infer the correct transformation from rendered observations. It would then be possible to collect a training set of (camera parameter, transformation) pairs, and fit a function to the training set.

How am I able to acquire the notebook for the camera operations? Can you send a copy of that notebook you mentioned before?

(My email is liuzhy71@gmail.com. )

@mikeroberts3000
Copy link
Collaborator

Hi @liuzhy71, you're a total legend for having a look at this. I'll send you an email with all of our debugging notebooks.

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Oct 15, 2021

In case anyone else is interested, here is a diagram explaining what we think is going on.

image001

We think the image plane is being transformed somehow, and there seem to be four camera parameters that control the transformation. So, our goal is to determine the function f that maps from the scalar camera parameters (p1,p2,p3,p4) to a transformation matrix T.

In our debugging notebook, we try to guess this function f based on a code snippet that we got from Chaos Group. This code snippet is correct for some combinations of parameters, but not others. So we must be getting something wrong. In order to test the correctness of f in our notebook, we compute a depth image using my own reference raycaster. In this test, we can control the outbound ray at each pixel. Using my reference raycaster, we want to obtain depth images that perfectly match the ones we obtain from V-Ray for all combinations of camera parameters. If we can do this, then we will know that we are implementing f correctly.

To make progress, I think a promising approach would be the following. First, compute the correct transformation T, given a depth image rendered by V-Ray with a known set of camera parameters (p1,p2,p3,p4). The transformation can be recovered from the depth image by solving a convex optimization problem. Anyway, once this is working, it is straightforward to render a large collection of depth images with randomly chosen camera parameters, solving for T for each rendered image. After doing so, we will have a large collection of (camera parameters, transformation) pairs. Finally, it is straightforward to fit some kind of simple function (e.g., a neural network) to all of the example pairs. This learned function can then be queried later to output the correct transformation T for a new set of parameters (p1,p2,p3,p4). Once the transformation T is known, it is straightforward to derive a modified projection matrix P (in terms of T) that projects world-space points to the correct image-space coordinates. This projection matrix can used as a drop-in replacement for the usual projection matrix in graphics and vision applications (e.g., multi-view stereo applications, rendering additional images that exactly match our pre-rendered Hypersim images, etc).

Of course doing all of this is an unpleasant hack. But so far, Chaos Group has not mathematically characterized these camera parameters, so we must resort to reverse-engineering their meaning. If anyone else is interested in having a look at this issue, comment here and I will send you our debugging notebooks.

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Oct 27, 2021

Great news. I obtained some very useful code from Chaos Group, and this has enabled me to make some exciting progress. I now have a working implementation that computes the transformation T from the parameters (p1,p2,p3,p4). Using my own reference raycaster, I can now generate images that match V-Ray images exactly, even in the presence of non-standard camera parameters.

I have not yet derived the modified projection matrix P that projects world-space points to the correct pixel locations, but I believe this is relatively straightforward. I'll post any relevant updates here. Thanks again to everyone that is helping out with this issue. Post a comment here if you want to have a look at my latest notebook.

@Ainaz99 @rpautrat @liuzhy71

@liuzhy71
Copy link

liuzhy71 commented Oct 27, 2021

Wow, I was just able to compute the correct depth using the previous notebook. But I am still working on adjusting the projection matrix with OpenGL rendering
camera_shift_and_tilt.csv
. This is the camera parameters for (p1-p4). Is there any updates of the notebook? @mikeroberts3000

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Oct 27, 2021

@liuzhy71 that spreadsheet looks great! I'm thinking about how we can adjust it slightly to make it a bit cleaner, and more suitable for inclusion in the actual dataset. Can you update your spreadsheet with the following information?

  • Sort rows by scene name.
  • Include a column called "includes_camera_physical_plugin" which is True or False depending on whether or not the vrscene has the CameraPhysical plugin.
  • Include columns for all parameters in the CameraPhysical plugin. The plgparams executable that ships with V-Ray lists 46 parameters, but only 45 are accessible through the V-Ray AppSDK, so it is fine to only include 45 columns. If includes_camera_physical_plugin is False for a particular row, you can leave all of these columns blank.
  • Use the V-Ray AppSDK when extracting parameter values (rather than your own parser) because the V-Ray AppSDK will populate the CameraPhysical plugin with the correct default values.
  • I'm not sure exactly which of these parameters will prove to be useful in computer vision applications, and I don't want to try to guess, so I think it is better to export them all. My implementation of the transformation T depends on a few more parameters than the 4 described so far in this thread, so we will need to include more parameters in the spreadsheet no matter what.

The function for computing depth in the previous version of our notebook (i.e., the one I sent you) works for some combinations of parameters, but not others. An easy way to break it is to set horizontal_shift=1.0. I'll send you my latest notebook over email, which works correctly for all parameter combinations.

@liuzhy71
Copy link

ok, I'll try to update spreadsheet with as many parameters as possible. I'm not so familiar with VRay SDK, all the data were parsed from .vrscene file. And I do not have all the files for scene ai_055_xxx. So some data may be missing.

@mikeroberts3000
Copy link
Collaborator

@liuzhy71 I think there is a 30-day free trial available for the V-Ray AppSDK. If you prepare the code, I'll run it on all of the scenes.

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Oct 27, 2021

I have some more good news. I derived a modified projection matrix P (computed in terms of V-Ray's camera parameters) that correctly accounts for this issue. I have verified that my projection matrix P correctly projects world-space points to the correct screen-space locations, even when V-Ray's non-standard camera parameters have a drastic effect on the rendered image. My projection matrix can be used as a drop-in replacement for the usual OpenGL perspective projection matrix. So I think the main technical challenge here has been resolved.

For example, here is a depth image for a scene that has been rendered with non-standard camera parameters.

horizontal_offset=0.2; vertical_offset=0.3; horizontal_shift=0.0; lens_shift=1.0;

image

The left image is generated by V-Ray, the middle image is generated by my own reference raycaster, and the right image is a difference image. We see here that the images are nearly identical. So we are generating the correct ray at each pixel.

Here is the same depth image, but I have projected the sink mesh (i.e., the world-space vertex positions belonging to the sink) onto the image.

image

The red dots are mesh vertices. We see that the projected vertices align very accurately with the sink in the image. So my modified projection matrix appears to be correct.

I have also tried other combinations of camera parameters, and my solution works correctly for those parameters too. Here are the resulting images with different camera parameters.

horizontal_offset=0.0; vertical_offset=0.0; horizontal_shift=0.5; lens_shift=0.0;

image

image

So we're nearly finished. The only task that remains is to expose the relevant camera parameters for each scene. I will try to do this over the next couple of weeks, and I will post my progress here.

@liuzhy71
Copy link

liuzhy71 commented Oct 29, 2021

I have manfully tested all the camera tilt and shift parameters. Now scene 009_003, 038_009, 039_004 are incorrect.

@alexsax
Copy link

alexsax commented Nov 29, 2021

@mikeroberts3000 @liuzhy71 @Ainaz99 @rpautrat thank you all for the hard work on getting accurate camera parameters for these scenes.

Any update on when these will be released and we can use them? 😄

@jatentaki
Copy link

I'd also be curious if the fixes have been released somewhere :)

@mikeroberts3000
Copy link
Collaborator

Hi @Ainaz99 @alexsax @jatentaki @liuzhy71 @rpautrat, I have some good news. I just checked in some data and example code that resolves this issue.

In the contrib/mikeroberts3000 directory, I provide a modified perspective projection matrix for each scene that can be used as a drop-in replacement for the usual OpenGL perspective projection matrix, as well as example code for projecting world-space points into Hypersim images.

I apologize that it took so long to address this issue. It was especially challenging to debug because V-Ray's behavior wasn't well-documented, and I've been busy with other projects and holiday travel.

@Ainaz99
Copy link
Author

Ainaz99 commented Jan 18, 2022

Hi @mikeroberts3000 ,

Thanks for releasing the modified version of the camera parameters.
I'm wondering if there is a way to directly calculate new camera positions and orientations which were originally saved as camera_keyframe_positions.hdf5 and camera_keyframe_orientations.hdf5 only using the released perspective projection matrices for the scenes? And if yes, how is the calculation done? Thank you!

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Jan 18, 2022

Hi @Ainaz99, this is possible, but there are some important technical details to be aware of. In particular, the modified "camera orientation" will have some extra transformations encoded in it, and it will not be a rotation matrix. As a result, any downstream code that intends to invert this matrix must take care to actually invert it, rather than merely transposing it.

To derive our modified camera orientation, consider the following equation that transforms a point in world-space p_world (expressed in homogeneous coordinates) into a point in homogeneous clip-space p_clip,

p_clip = M_proj_modified * M_cam_from_world * p_world

where M_proj_modified is the modified projection matrix from our CSV file; and M_cam_from_world is a 4x4 matrix that encodes the camera position and orientation. We can express this transformation in terms of the standard OpenGL projection matrix M_proj_canonical as follows,

p_clip = M_proj_canonical * block_diag(M_cam_from_uv_canonical * M_cam_from_uv.I, 1) * M_cam_from_world * p_world

where M_cam_from_uv is a 3x3 matrix defined in our CSV file for each scene; and M_cam_from_uv_canonical = diag([tan(fov_x/2.0), tan(fov_y/2.0), -1.0]). By collecting terms in the matrix block_diag(...) * M_cam_from_world, we see that we can define a modified camera orientation R_world_from_cam_modified that completely accounts for our non-standard camera parameters.

R_world_from_cam_modified = camera_orientation * (M_cam_from_uv_canonical * M_cam_from_uv.I).I

where camera_orientation is the camera orientation stored in our HDF5 files.

R_world_from_cam_modified can be used as a drop-in replacement for the camera orientation stored in our HDF5 files, and completely accounts for all non-standard camera parameters when used in conjunction with a standard OpenGL projection matrix. We do not need to make any modifications to the camera positions stored in our HDF5 files. But remember that R_world_from_cam_modified is not a rotation matrix.

@Ainaz99
Copy link
Author

Ainaz99 commented Jan 19, 2022

Thank you @mikeroberts3000 for your explanation!
So if I want to calculate the worldspace transformation matrix for the camera as follows,

R_world_from_cam    =  camera_orientation
t_world_from_cam     =  camera_position.T
M_world_from_cam   =  matrix(block([[R_world_from_cam, t_world_from_cam], [matrix(zeros(3)), 1.0]]))

can I use the new R_world_from_cam_modified?
Thank you.

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Jan 19, 2022

@Ainaz99 that looks correct to me, assuming that you want M_world_from_cam to transform points to world-space from camera-space, and that you are expressing points in homogeneous coordinates as 4D column vectors. Remember to proceed with caution because your modified M_world_from_cam has extra transformations encoded into it, so it no longer encodes pure rotation and translation.

@Ainaz99
Copy link
Author

Ainaz99 commented Jan 26, 2022

Thank you @mikeroberts3000. So I want to use the new camera transformation matrix in Blender.
I'm attaching some images which show from left to right: the RGB, the old rendering (using old M_world_from_cam), the new rendering (using the R_world_from_cam_modified), and the RGB and new rendering overlaid. Although the new rendering looks better, there is still some difference which I assume it's due to the extra transformations that you mentioned.
Do you know how I can take care of these extra transformations? Are there any other parameters I have to manually change for the camera?
Screen Shot 1400-11-06 at 13 27 24
Screen Shot 1400-11-06 at 13 29 39
Screen Shot 1400-11-06 at 13 30 16

@mikeroberts3000
Copy link
Collaborator

mikeroberts3000 commented Jan 26, 2022

I haven't spent much time with Blender, so I'm not sure what exactly it is doing with the matrices you're specifying. Is Blender rendering images via a rasterization approach or a raycasting approach? How exactly are you specifying these matrices to Blender? Are you specifying position and orientation in a combined 4x4 M_world_from_cam_modified matrix? Or are you specifying R_world_from_cam_modified and t_world_from_cam separately?

In these notebooks, I show how to reproduce pre-rendered Hypersim images pixel-perfectly using a rasterization approach and a raycasting approach for this specific scene. It should be straightforward to figure out what Blender is doing that is different to these notebooks by digging into the Blender documentation or source code. For what it's worth, in my local repository, I computed R_world_from_cam_modified according to the equation I posted above, and I verified that it behaves as expected, both for rasterization and raycasting.

@mikeroberts3000
Copy link
Collaborator

Hi @Ainaz99, I'm just double-checking if you ever got this rendering functionality figured out in Blender.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants