-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom dataset config options #3
Comments
Nice to see that you got it running on your data! It seems that the bbox is a bit too small, as the upper part of the model is cut. You can either increase the mesh scale parameter in the .json config For the dataset question, I would just modify or create a custom dataloader, by modifying the class DatasetNERF https://github.com/NVlabs/nvdiffrec/blob/main/dataset/dataset_nerf.py . That gives you full freedom w.r.t. how to apply the masks, etc. There is currently no option exposed in the config to remove validation data. The flag The train and validation data for nerf are loaded here |
Unfortunately the code wasn't written with that in mind, so there's no configuration flag. You can find the setup code for the nerf_synthetic dataset here: https://github.com/NVlabs/nvdiffrec/blob/main/train.py#L564. You should be able to just change
For NeRD datasets (a few lines up) we always use the same train & test set. |
I'll look into how imgs2poses could be adapted to produce rembg/U^2net masks, make it a single script like colmap2nerf. |
issue solved, know where to look to automatically produce custom datasets from colmap data and some config options to improve the mesh scale and load poses from a single file. |
@jmunkberg Just to wanted to cross check the steps for custom data
|
@eyebies Yes, that is correct. The same procedure as when generating data for NeRF/NeRD applies. If you want to use synthetic data, you can follow the Blender setup used in the synthetic NeRF data: https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1 in which case you get exact masks and perfect poses. For masks, the methods you cite above would work. We tried with a bunch of other (Detectron2, Photoshop), see e.g., Fig 23 in the paper: https://nvlabs.github.io/nvdiffrec/assets/paper.pdf In general, high quality masks improve results, particularly the geometry reconstruction. For poses, COLMAP works well in our tests. We currently do not optimize over poses or correct for noise in the pose estimation, but that could likely be added. |
@eyebies img2poses would produce a .npy file since it's using a different dataset format (and the benefit of not having to make test and val files), but both are loading the same data from colmap. Other than that, yea. I'll be sure to link it in this issue. Want it to be a single gist. |
https://gist.github.com/mjc619/e20835d652c51f305ce328342af7fefd |
thanks a lot guys ! I really appreciate the gesture ! Just for the sake of completeness. Here are a couple of algorithms that could be used in-place of the rembg. These are mostly class agnostic. Thanks again ! |
Depending on how the masking is handled, those are probably far better. rembg included applying alpha mattes to the png result, which gave a simple solution for masking NeRF datasets. |
@mjc619 Hey, i would also like to train with custom data. However, I don't know my way around that well and couldn't follow the thread here that well... I have also trained some scenes with instant ngp as you described. |
@c6s0 Colmap2poses is still a work in progress, I can now work video directly out of my phone, but the results are lackluster. I also need to optimize some losses in the training and find an alternative to ffmpeg that finds clear frames to use in rendering handheld video (colmap seems to just accept a lot of the blurry frames in sequential mode). I'd recommend just using nerf datasets made for instant-ngp with a manual alpha mask. Rendering video -> 3d model on a consumer gpu is possible, just not that friendly or high quality yet. |
@mjc619 Thank you for your answer! A little tutorial in the repo would be super cool and helpful! GPU power shouldn't be a problem (at least in my case) since I have access to 3090's and A100's. But yes, probably it just needs some more time. |
Motion blur and keeping the model in the center of frame for mapping seem to still be the major reasons for models turning into swiss cheese 5k iters in. Unlike the DSLR results, which look pretty good after training, I can't change the exposure time with phone video. If you're trying to set up your own dataset right now, use camera photos and filter by quality and center. Cv2 is already called for the NeRF datasets, so I'm thinking of ways to prefilter images before colmap based on sharpness, best recording settings to get sharp images (60fps, ffmpeg to resize down to 520p), and possibly using the mask to find out how close to center the object is. |
@mjc619 No, vram should not increase over time! That said, when we switch from volumetric texturing (using the MLP) to traditional 2D textures, memory consumption changes, depending on the texture resolution, MLP parameters etc. We also have the option to run with depth peeling on in the second pass (used in NeRF mic, ficus, drums, ship). This is also increase memory drastically. This happens halfway into the training. Could you share some details? Is this a slow increase over time or a sudden change? |
log showing 2.6gb free turn to 0 in under an hour
It fails to reach halfway through training during this test, and without changing other settings, will continue training below the number of iters that it OOM's at. For this case, at 6k iters the config will complete the model(with the same issue I posted in #14 ), but 10k iters cannot. This is the main limit on getting a batchsize of 2 or more to improve some training. (in addition to using 512 square images and fewer images overall with some filtering; a larger batchsize is possible on 8gb, if this OOM didn't limit the maximum iters at higher batch sizes than 1) output when running nvdia-smi inside the iter update loop def get_gpu_memory():
command = "nvidia-smi --query-gpu=memory.free --format=csv"
memory_free_info = sp.check_output(command.split()).decode('ascii').split('\n')[:-1][1:]
memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
return memory_free_values |
Thanks @mjc619. I can confirm I see the same on my side. I'm not sure how long it will take to fix, but after some experimenting it seems that the memory leak happens every time we dump a validation frame. As a short term workaround you could increase the |
Thanks for reporting this issue @mjc619 ! We have hopefully fixed the issue now (itertools.cycle leaked memory). Pushed in CL 3e7007c |
Hello, would it be correct to adjust I am getting poor results on my own data, and I'm trying to establish if the cause is blurred images, poor masks, incorrect camera poses or incorrect nvdiffrec config. |
Hello @not-william Yes, as stated earlier in the thread:
We usually do this by visual inspection. The mesh scale only affects the size of the tetrahedral grid. You can see if that is too small by looking at the images early in training. If the object is too large, and does not fit on screen, you need to scale your transform matrix in the dataset loader. I would recommend to scale your transform matrix so that the entire object is visible on screen. We are fairly sensitive to blurred images, poor masks, and incorrect camera poses. We studied robustness to mask corruption in the supplemental material of the paper and it seems to work fairly well. We do not correct for noisy/incorrect poses, so if they are incorrect, it won't train. |
Hi @Sazoji ! Is this script still available? |
oh, I'm currently looking at the ways for normal NeRF datasets to handle multiple resolutions, but the links to the scripts have changed since I swapped my username tl;dr |
@Sazoji These were the steps I have taken this far:
I guess I'm stuck on what should I do next now that I have the dataset structured and the poses generated from LLFF. Do I need to create my own json file inside of the config folder that has params similar to what is in the neRD json files? |
@mexicantexan there's a --json option for blender-NeRF format, but you dont need that for nvdiffrec
The --json format is useful for other repos like inistant-ngp and other NeRF repos, but for object capture in nvdiffrec you'd have to apply an alpha layer with the mask to all your images. |
@Sazoji Context:
ORIGINAL:
CHANGED TO:
Specific Questions:
If you need any more info from me feel free to ask, happy to share whatever is needed to get this working and hopefully help others along the way |
1 Image size correlates to more vram cost to load everything, I've been able to use 800p for instance, but if you had a high VRAM GPU you'd just want to change the batch size and make sure it trains in enough time. 4 oh yea there is another .json file you need to make, but it's not really part of the dataset since it's config options for nvidffrec. I copied the ehead config and adjusted the res to the image size and batch settings to fit my vram. Laplace_scale and a couple other settings I added to see what affected the training for the salt shaker, but the biggest contribution I saw was adjusting the learning rate to the batch size and making sure the whole model was within the mesh size. |
💀 |
Alright so for anyone trying to run this on their own datasets here was my final step-by-step instruction on getting this to run on Windows. With scripts!!! Assumptions:
Steps:
If you want to pass in your own masks, change that to:
|
Hi, did you get any solution for it? please share i got stuck at same place |
@neerajtiwari360 |
how u do that? can you some explain how to make your own data model?? |
@Sazoji @mexicantexan When I try to run colmap2poses on a dataset (350 images), no error appears, but if I look at the view_imgs.txt file only 4 images are shown. Am I doing something wrong? Thanks in advance. |
Colmap takes what images it can map together, try exhaustive mapping or reduce image resolution to avoid some motion blur. Under absolutely perfect conditions, colmap shouldn't reject any images, but if you're feeding a large unfiltered dataset it's better to spend the extra time in exhaustive mapping to check each image against each other. Changing res or manually using only images with as little movement of the subject/background should help. I also should note interpolation on video to increase the number of images in a dataset doesn't improve the quality of the model even when the interpolated images are added to the set, tried that for NeRFs awhile ago. |
I recorded a video, so the images are sequentially arranged. The camera is held at a fixed point while the object slowly rotates on a platform. |
hmm, try using rembg before colmap, so it won't try to map the camera's position based on the background, I dont know if nvdiffrec would like it since it's trying to reconstruct HDRI lighting based on camera position. I know when creating NeRFs back last febuary, I had success that way to remove some noise and map more images. Exhaustive mode changes the Big O computation time depending on the number of images (but might help to get more images accepted), so try masking before messing with that. |
Perfect, thank you very much for the advice, I'll try it. |
Thank you for detailed steps. Can you please add steps to handle assumptions you mentioned like how to create masks (you have the correlating masks to each of the images above that have the background as black and the foreground/subject as white)? Also, why colmap fails to calculate poses for all the images in the folder? |
Hello @Sazoji @mexicantexan , I got masks from the scrpit: https://github.com/Sazoji/nvdiffrec/blob/main/colmap2poses.py, primaries_14_aug_2/masks/0228.jpg
Traceback (most recent call last):
File "/HPS/ColorNeRF/work/nvdiffrec/data/scale_images.py", line 33, in <module>
img = img[None, ...].permute(0, 3, 1, 2)
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 3 is not equal to len(dims) = 4 please help. |
I've been able to run this on my own data (slowly, on a 3070) using U^2Net matting and colmap2nerf code from instant-ngp (if others are following this, remove the image extensions, and it should run on windows)
The issue I'm having is val, and test data or the lack of it. Are there config options to remove the requirement for those datasets. Copying and renaming the json lists to val and train works, but is rather cumbersome, and I was wondering if I was missing a better option with the NeRD dataset preparation, which use manual masks not applied to the image itself (which is possible with U^2 net, and would help with colmap mapping since the background can still be used for feature tracking.)
I haven't looked into what data is required to be presented in the config, nor resolution/training options, but I'm just wondering what's the generally intended arguments and method for presenting your own data, when no val is available.
The text was updated successfully, but these errors were encountered: