Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to determine the parameters in the plane.txt? #26

Open
tding1 opened this issue Aug 11, 2021 · 7 comments
Open

How to determine the parameters in the plane.txt? #26

tding1 opened this issue Aug 11, 2021 · 7 comments

Comments

@tding1
Copy link

tding1 commented Aug 11, 2021

Hi,

I am trying to reproduce the colmap results in your shiny datasets. I tried the exact same command for the scene "food". It generates 'hwf_cxcy.npy' and 'poses_bounds.npy', which are similar to your results.

However, how to determine the 4 parameters in the 'plane.txt'? For example, in your 'plane.txt', the four numbers for this scene are:

2.6 100 1 300

Where do these numbers come from? My colmap result shows:

Post-colmap
Images # 49
Points (50725, 3) Visibility (50725, 49)
Depth stats -0.06422890217140988 273.68360285004354 25.033925606886992

How to determine the first two numbers based on this information in your 'plane.txt'? And what about the last two numbers?

Moreover, I observe for some other scenes the 'plane.txt' contains only 3 numbers. For example, for the scene 'lab', the numbers in the file are

46 206 1

How to deal with the 4th missing value? What does that mean?

Thank you very much!

@pureexe
Copy link
Contributor

pureexe commented Aug 11, 2021

Here is the code that reads planes.txt as an input

elif os.path.exists(self.dataset + "/planes.txt"):
with open(self.dataset + "/planes.txt", "r") as fi:
data = [float(x) for x in fi.readline().split(" ")]
if len(data) == 3:
self.dmin, self.dmax, self.invz = data
elif len(data) == 2:
self.dmin, self.dmax = data
elif len(data) == 4:
self.dmin, self.dmax, self.invz, self.offset = data
self.offset = int(self.offset)
print(f'Read offset from planes.txt: {self.offset}')
else:
raise Exception("Malform planes.txt")
else:
print("no planes.txt or bounds.txt found")

if the last number is missing, it will use a default variable which is 200

self.offset = 200

Please take a look at the explanation of the first 3 numbers.
https://github.com/nex-mpi/nex-code/wiki/planes.txt---near---far---inverse


Here is more explanation that might help you.

First and Second number - near / far

About the method to find the near and far (the first and the second number), the easiest way is using the percentile of the location of the point cloud.

COLMAP also generates a point cloud besides of camera parameter. You can take a look at LLFF's code on how to do a percentile.

close_depth in LLFF code is near / dmin in NeX code
inf_depth in LLFF code is far/ dmax in NeX code

https://github.com/Fyusion/LLFF/blob/c6e27b1ee59cb18f054ccb0f87a90214dbe70482/llff/poses/pose_utils.py#L56-L88

Third number - inverse depth

inverse depth is determined by 0 or 1. If it contains things that places far away, set to 1.

Far away mean when the camera is moved, the object has a little (or even doesn't have) a parallax effect at all.

Fourth number - offset

When you look into images/2_mpic section in Tensorboard. You can see the gray border that spacing between the black edge and the object.

if the offset is set to 0. the object will fell off the MPI_C and the reconstruction will look bad around the edge.

But if set the offset too much, it will take a lot of memory while training. and total variation regularization might wipe out high-frequency detail.

If you wish to use a large offset, you have to reduce the -tvc parameter (default is 0.03).

image

Feel free to ask if things still not clear enough 😄😄😄

@tding1
Copy link
Author

tding1 commented Aug 11, 2021

Hi Pakkapon,

Thank you so much for the extremely helpful answer! After carefully checking your wiki and code, I still have some questions regarding those parameters:

  1. First of all, a very naïve question: what is the purpose of separating these 4 parameters in a 'plane.txt' file rather than putting them altogether with other parameters in a configuration of args? I know this might be a design question but I am just wondering if there is any intuition behind this practice, e.g., highlight the importance of these 4 parameters? Or maybe it is just a practice that adopted from LLFF?

  2. Next, for the third parameter, does that mean that we are free to choose it to be 0 or 1 depends on the scene property? For example, the wiki page says that if the scene contains objects lie far away from the camera, we can choose inverse depth (the third parameter is set to 1). However, I found that all the shiny datasets use the setting of inverse depth. For scenes like food or pasta, is that the case that the scene contains objects lie far away from the camera? I am a little bit confused of this point.

  3. For the first two parameters, it is said in the wiki that:

    NeX-MPI needs to select 1 camera to be a "Reference camera". The set of planes will forward-facing to this camera. NeX-MPI will automatically select the most centered camera. However, you can pick your own reference camera by providing the argument -ref_img

    My question is that, must the "reference camera" be one of those cameras that takes the pictures? My understanding of MPI is that we only need to select a virtual camera that being the centroid of all the cameras. In your example of the wiki page, what if we do not have the middle camera? Does that mean the NeX-MPI will pick either the left or right camera to be the reference camera? My impression is that we need to construct a virtual middle camera that facing towards the scene, whose field of view is chosen such that it covers the region of space visible to the union of these cameras. Correct me if I am wrong :)

  4. If the "Reference camera" is one of those existing cameras, and suppose we automatically find it by the code of load_llff. The README file for shiny dataset says that

    The first two numbers are the distances from a reference camera to the first and last planes (near, far).

    Currently we already have the reference camera, we can also read its close_depth and inf_depth, do we? In fact, I see the LLFF dmin/dmax is defined in this way (the reference_depth is obtained from the bds extracted from the last 2 columns of poses_bounds.npy):

    nex-code/utils/load_llff.py

    Lines 354 to 361 in eeff38c

    train_ids = np.logical_not(validation_ids)
    train_poses = poses[train_ids]
    train_bds = bds[train_ids]
    c2w = poses_avg(train_poses)
    dists = np.sum(np.square(c2w[:3,3] - train_poses[:,:3,3]), -1)
    reference_view_id = np.argmin(dists)
    reference_depth = train_bds[reference_view_id]

    Then what is the purpose of specifying the first two numbers manually, instead of directly using this estimated bounds dmin/dmax of LLFF for that reference camera? I think your answers above also points to the estimated bounds stored in poses_bounds.npy:

    COLMAP also generates a point cloud besides of camera parameter. You can take a look at LLFF's code on how to do a percentile.

    close_depth in LLFF code is near / dmin in NeX code
    inf_depth in LLFF code is far/ dmax in NeX code

    https://github.com/Fyusion/LLFF/blob/c6e27b1ee59cb18f054ccb0f87a90214dbe70482/llff/poses/pose_utils.py#L56-L88

    Then, my impression is that we don't need to set dmin/dmax but use the values from the LLFF bounds. Why do we need to manually specify the two numbers in the plane.txt file?

  5. For a specific example, the first 2 parameters in the plane.txt file of dataset 'crest' are 4.7 200, where do they come from? It seems that they are not the one extracted from the estimated bounds in poses_bounds.npy. Does this mean we are also free to choose the first 2 parameters as long as they cover the real range of scene depth? For example, I can choose minimum dmin of all the cameras as the 1st parameter and choose the maximum dmax of all the cameras, though they may not be associated with the same camera. Is this doable?

Thank you in advance for you effort in answering these questions!

@pureexe
Copy link
Contributor

pureexe commented Aug 13, 2021

1. what is the purpose of separating these 4 parameters in a 'plane.txt' file rather than putting them altogether with other parameters in a configuration of args?

Each scene has a different parameter near/far parameter, while parameters that list in args are shared across every scene.

Actually, you can provide argument -dmin -dmax -invz and -offset manually. (useful for debugging)

nex-code/train.py

Lines 70 to 72 in eeff38c

parser.add_argument('-dmin', type=float, default=-1, help='first plane depth')
parser.add_argument('-dmax', type=float, default=-1, help='last plane depth')
parser.add_argument('-invz', action='store_true', help='place MPI with inverse depth')

2.1 Does that mean that we are free to choose the inverse depth parameter to be 0 or 1 depends on the scene property?

depends on the scene property.

Choose 1 if the scene contains things that place far away. but if not an important object.

So, we use denser plane in the object near the caemra.

see 2.2 for example that we want to use a denser plane on pasta and food. The background is not so important so we use only a few planes to represent it. That is why we set it into 1.

2.2 For scenes like food or pasta, is that the case that the scene contains objects lie far away from the camera

The green highlight show far away part that we want to represent in the scene but with only a few planes.

pasta_far

food

3. must the "reference camera" be one of those cameras that takes the pictures

You can use a virtual camera to be a reference camera. You can set dataset.sfm.ref_rT (reference camera rotation matrix with transpose) and dataset.sfm.ref_t (reference camera translation vector) in to this dataset variable.

dataset = loadDataset(dpath)

However, be careful on picking a reference camera. Some ray (pixel in MPI factum) might never be seen by any image in training set. So, we avoid this problem by select one of the training images as a reference camera to guarantee that the ray was seen at least in 1 image.

4. Can we automatically find dmin/dmax by the code of load_llff?

Yes, we can. However, this requires a point cloud in COLMAP to be good. Sometimes these points are missing in some parts of an object. (usually, affect dmin). So, we need to re-adjust the near plane (admin) to become closer to the camera)

5.1 scene 'crest' dmin/dmax are 4.7 200, where do they come from?

red: require a near plane to very close to the camera.
green: require a far plane to very far from the camera.

crest_near_far png

5.2 Does this mean we are also free to choose the first 2 parameters as long as they cover the real range of scene depth

We want to select the range that covers the entire scene. This near and far plane is considered from the reference camera. You can freely set it. the homography will wrap the plane from the reference view into the training/testing view.

However, you still have to be careful on freely picking dmin/dmax

if the dmin/dmax doesn't cover entire scene,

    1. The object is missing. because it can't represent the object on MPI
    1. The object looks blurry. because MPI tries to place an object on the wrong depth

If the dmin/dmax set is too large to cover everything,
some planes might have nothing on them. the plane in images/2_mpic will become entire gray color.
It make NeX using fewer plane to represent the scene and leading to this kind of artifact.

image

@JiuTongBro
Copy link

Hi,

I read the code of depth generation in LLFF, and checked the point cloud file. So the source point cloud itself misses some parts of the scene right? Does it mean we can't generate the correct d_min and d_max from the provided data strictly, and we can only manually re-adjust the depth boundary?

Thanks a lot!

@pureexe
Copy link
Contributor

pureexe commented Aug 26, 2021

I think so. We manually adjust to maximize MPI utilization. However, the depth generation from LLFF produces reasonable dmin/dmax in most all cases.

@shamafiras
Copy link

Hey @pureexe ,
Can you please explain about your answer earlier regarding the impact of total variation on large offsets:
"But if set the offset too much, it will take a lot of memory while training. and total variation regularization might wipe out high-frequency detail."
How can large offset cause total variation regularization to wipe out high frequency details ?
As TV is per pixel loss, how can adding more pixels (larger offset) impact the quality of original pixels?
Thanks for your incredible quality paper and high support.
Firas

@pureexe
Copy link
Contributor

pureexe commented Sep 9, 2021

total variation is using for smoothing mpi_c.

with a larger offset, it introduces more noise (more pixel to smoothing) lead to higher total variation.

The network will try to reduce the loss by smoothing mpi_c instead of reducing the reconstruction error. This can lead to a poor results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants