Colab demo? / Headless server version? #6

INF800 · 2022-01-14T15:51:17Z

If there isn't a colab demo already, will send a PR. Please let me know if there will be any OOM issues or any technical issues that I many face.

Tom94 · 2022-01-14T17:17:25Z

No, we haven't looked into colab at all, actually. It would be great to have, thank you!

The CPU memory usage is relatively tame. GPU memory usage is on the order of the size of the dataset (plus a few GB for temporary training, inference, and render buffers).

Unfortunately, the codebase is not particularly optimized for being memory-economical. We've been spoiled by the 3090's 24GB.

Example GPU RAM usages:

NeRF (bundled fox dataset): 7.56 GB
SDF (bundled armadillo mesh): 1.73 GB

You can read out the GPU RAM usage at the top of the UI

pwais · 2022-01-14T19:58:04Z

with Colab, you might need to get lucky and get a V100 to get anywhere (might be Colab Pro only?)... the P100s and K80s don't have Tensor cores, and somebody else found you can't seem to build tiny-cuda-nn with Pascal or Maxwell: NVlabs/tiny-cuda-nn#10

Tensor cores were introduced in Volta? So you'd need a V100, Titan V, or RTX 20xx or better to try this project.
Edit: sounds like tiny-cuda-nn might require Turing tensor cores, so no V100 support :( #13

What would be really cool is if tiny-cuda-nn and/or this project could provide a fused ops / network that does not require tensor cores and can work for the older GPU architectures-- it will be slower but would still probably be faster than altneratives (pytorch / tensorflow etc). TensorRT has fused ops for the older architectures and these might provide easy drop-ins (at least, likely for inference).

myagues · 2022-01-17T19:40:52Z

It should be possible to run on Colab now that lower compute capabilities are allowed, but I'm stuck at compilation with the following error:

[ 98%] Linking CXX executable testbed
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libGL.so: undefined reference to `_glapi_tls_Current'
collect2: error: ld returned 1 exit status
CMakeFiles/testbed.dir/build.make:115: recipe for target 'testbed' failed
make[2]: *** [testbed] Error 1
CMakeFiles/Makefile2:199: recipe for target 'CMakeFiles/testbed.dir/all' failed
make[1]: *** [CMakeFiles/testbed.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[100%] Linking CXX shared library pyngp.cpython-37m-x86_64-linux-gnu.so
[100%] Built target pyngp
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

Here is a link for reproducing it.

Tom94 · 2022-01-17T19:53:51Z

Progress! Thanks for reporting!

Does python3 scripts/run.py --scene data/nerf/fox run?
There's currently a half-working CMake option NGP_BUILD_WITH_GUI=OFF that's supposed to build the project without GUI (and thus without linking to OpenGL and GLFW). I'm saying "half-working", because I haven't removed all references to GL-related symbols yet -- in light of your problem I'll prioritize this tomorrow and report back.

Edit: you can now run cmake -DNGP_BUILD_WITH_GUI=OFF <remaining params> to build instant-ngp without linking GLFW, ImGUI, and OpenGL for headless operation. Hopefully this will work around the linker error you encountered.

pwais · 2022-01-17T22:12:43Z

FWIW, at least EGL works in Colab, see e.g. the pyrender demo notebook: https://colab.research.google.com/drive/1pcndwqeY8vker3bLKQNJKr3B-7-SYenE?usp=sharing

There's no X11 though. It would be pretty nice to have imgui over websocket for Colab / Jupyter (e.g. via https://github.com/ggerganov/imgui-ws -- see in-browser demos ) but I don't see anybody has tried that yet.

Tom94 · 2022-01-18T10:37:25Z

If someone with access to a K80 machine could check whether it runs now, that'd be appreciated. :)

pwais · 2022-01-18T22:04:10Z

🔥 🔥 🔥 Thanks @Tom94 !! 🔥 🔥 🔥

I had to remove transforms_val.json from the lego scene to avoid an OOM during train, and I made some small mods for test time. I think the testbed just loads all images / rays into GPU memory (even temp memory) for all dataset transform_*.json meta files no matter if they get used or not, so skipping that might help avoid some OOMs.

Overall: The K80 is about 60x slower than a 30-series GPU, but also about 60x cheaper at the time of writing (YMMV but check ebay). I've seen 100x slowdown for pytorch stuff so 60x is pretty good.

(What about K40? Note that K40 seems to be compute 35, while K80 is compute 37. A K80 is basically two K40s on the same card. At the time of writing, an AWS p2.xlarge with a single K80 (two separate devices, 11gb memory each) is ~$0.90/hr or $0.25/hr spot price. In Google Colab free version or Kaggle, you're likely to get a K80 or slightly better).

Other than that one training change, here's what I see for Nerf lego train out of the box on a K80 :

python3 scripts/run.py --scene=data/nerf_synthetic/lego/ --mode=nerf --screenshot_transforms=data/nerf_synthetic/lego/transforms_test.json --screenshot_w=800 --screenshot_h=800 --screenshot_dir=data/nerf_synthetic/lego/screenshots --save_snapshot=data/nerf_synthetic/lego/snapshot.msgpack --n_steps=1000
21:38:39 INFO     Loading NeRF dataset from
21:38:39 INFO       data/nerf_synthetic/lego/transforms_test.json
21:38:39 INFO       data/nerf_synthetic/lego/transforms_train.json
21:38:39 SUCCESS  Loaded 300 images of size 800x800 after 0s
21:38:39 INFO       cam_aabb=[min=[0.5,0.5,0.5], max=[0.5,0.5,0.5]]
21:38:39 INFO     Loading network config from: /opt/instant-ngp/configs/nerf/base.json
21:38:39 INFO     GridEncoding:  Nmin=16 b=1.38191 F=2 T=2^19 L=16
Warning: FullyFusedMLP is not supported for the selected architecture 37. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 37. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
21:38:39 INFO     Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
21:38:39 INFO     Color model:   3--[SphericalHarmonics]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
21:38:39 INFO       total_encoding_params=12196240 total_network_params=9728
Screenshot transforms from  data/nerf_synthetic/lego/transforms_test.json
Training:  34%|███████████████████████████████▉                                                               | 336/1000 [01:44<03:45,  2.94step/s, loss=0.00474]

Final train time (as reported) was 05:54 with loss=0.00275.

nvidia-smi during training:

|   1  Tesla K80           Off  | 00000000:04:00.0 Off |                    0 |
| N/A   45C    P0    92W / 149W |  11271MiB / 11441MiB |    100%      Default |
|                               |                      |                  N/A |

So the K80 is about 60x slower than a 30-series GPU (6 seconds -> 360 seconds). In my experience, pytorch stuff (high i/o) is a 50x-100x lag, so this is pretty nice! Clearly the implementation helps a ton.

Once the model finishes training, I do get an OOM when rendering tries to start:
RuntimeError: Could not allocate memory: CUDA Error: cudaMalloc(&rawptr, n_bytes+DEBUG_GUARD_SIZE*2) failed with error out of memory

For rendering, I did this:

swapped the above command's save_snapshot for load_snapshot
commented out the train loop in run.py:

instant-ngp/scripts/run.py

Line 167 in 9ad813b

if n_steps > 0:
added a tqdm to monitor rendering progress:

instant-ngp/scripts/run.py

Line 288 in 9ad813b

for idx in args.screenshot_frames:

I see moderate GPU memory usage:

|   1  Tesla K80           Off  | 00000000:04:00.0 Off |                    0 |
| N/A   43C    P0    93W / 149W |   3319MiB / 11441MiB |    100%      Default |
|                               |                      |                  N/A |

Rendering is about 11/sec per frame: 5/200 [00:55<36:09, 11.12s/it]

Most importantly, the render looks good, no different for 1000 iters as other GPU:

Tom94 · 2022-01-19T06:39:09Z

Awesome, thank you so much for testing!

You don’t actually need to delete transforms_val.json et al. You can directly pass a path to the training transforms to testbed — then it will train from just that one .json file rather than all it finds in the folder.

In the above, I believe you ended up training using also the testing transforms, so there’s more memory to be saved by not loading their respective images.

pwais · 2022-01-19T07:51:59Z

@Tom94 oh my bad! I have not been able to use the GUI yet so I didn't know --scene would concatenate everything (perhaps that doesn't happen in the GUI?). The README just provides the lego dir. So the command should be:

python3 scripts/run.py --scene=data/nerf_synthetic/lego/transforms_train.json ...

That does save memory, and, erm, results in more correct training too :) 🎉

myagues · 2022-01-19T11:58:18Z

Can confirm it works in Colab (link) (with a T4), only downside is it takes some 5-10 min of compile time given that Colab allocates only 2 CPUs.

Maybe an approach could be copying the compiled folder to the user's GDrive, so it could be reusable in next runs, and avoid recompilation, hoping you get the same GPU in the Colab lottery.

pwais · 2022-01-19T20:53:08Z

the repo builds & works in docker with nvidia/cuda base image(s) ( #20 ) so it's likely that building a binary in a CUDA 11.2 base image (seems that's what Colab was using there) could work in Colab.

5-10 mins isn't that bad tho, there are many Colab notebooks like Nerfies ( https://colab.research.google.com/github/google/nerfies/blob/main/notebooks/Nerfies_Capture_Processing.ipynb ) that can take 30 mins or more to set up or run.

Huggingface spaces wouldn't offer the notebook environment, but since this project has its own nice GUI, it might be a better match https://huggingface.co/spaces/launch

tlightsky · 2022-01-30T03:56:44Z

It should be possible to run on Colab now that lower compute capabilities are allowed, but I'm stuck at compilation with the following error:

[ 98%] Linking CXX executable testbed
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libGL.so: undefined reference to `_glapi_tls_Current'
collect2: error: ld returned 1 exit status
CMakeFiles/testbed.dir/build.make:115: recipe for target 'testbed' failed
make[2]: *** [testbed] Error 1
CMakeFiles/Makefile2:199: recipe for target 'CMakeFiles/testbed.dir/all' failed
make[1]: *** [CMakeFiles/testbed.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[100%] Linking CXX shared library pyngp.cpython-37m-x86_64-linux-gnu.so
[100%] Built target pyngp
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

Here is a link for reproducing it.

met exact issue in colab

Tom94 · 2022-01-30T07:16:11Z

Hi there, you can avoid this error by compiling testbed without GUI support

cmake -DNGP_BUILD_WITH_GUI=OFF <remaining params>

This way, it won't try to link to OpenGL, which you presumably don't need when running in colab. (You can still render out images as numpy arrays.)

loboere · 2022-02-01T22:58:04Z

how to see the rendering in colab?

pwais · 2022-02-01T23:56:54Z

how to see the rendering in colab?

It's gonna be really hard to do that :( There might be a path thru websockets (e.g. https://github.com/ggerganov/imgui-ws ) or perhaps some way of standing up an X server / VNC on colab. The GUI is pretty killer though, it could be worth the hassle.

Tom94 · 2022-02-02T07:42:40Z

If rendering only a single image (or a handful) is desired, you can call testbed.render(width, height, spp=8, linear=False) to get a numpy array that you can imshow or similar.

spp refers to "samples per pixel" and it'll mostly wind up doing anti-aliasing for you, aside from getting rid of a little bit of raymarching noise. If performance is a concern, you can set it to 1 for fastest rendering.

Note that the returned colors will be sRGB if linear == False, which is likely what you want if you'd like to directly display the image or save it as png/jpg. Use linear colors only if you want to tonemap the image yourself.

loboere · 2022-02-05T05:48:07Z

If rendering only a single image (or a handful) is desired, you can call testbed.render(width, height, spp=8, linear=False) to get a numpy array that you can imshow or similar.

spp refers to "samples per pixel" and it'll mostly wind up doing anti-aliasing for you, aside from getting rid of a little bit of raymarching noise. If performance is a concern, you can set it to 1 for fastest rendering.

Note that the returned colors will be sRGB if linear == False, which is likely what you want if you'd like to directly display the image or save it as png/jpg. Use linear colors only if you want to tonemap the image yourself.

how run testbed.render(width, height, spp=8, linear=False) ?
i get this error

NameError Traceback (most recent call last)
in ()
----> 1 testbed.render(width, height, spp=8, linear=False)

NameError: name 'testbed' is not defined

Tom94 · 2022-02-05T08:36:48Z

You'll have to first instantiate a testbed object and train it (or load a snapshot) before rendering makes sense. I recommend consulting scripts/run.py for an example of how it can be used. Its screenshot functionality uses the .render method.

Tom94 mentioned this issue Jan 15, 2022

Is 16GB memory not enough? #14

Closed

This was referenced Jan 18, 2022

Allow build without GUI/OpenGL by cmake -DNGP_BUILD_WITH_GUI=OFF #46

Merged

Add support for sm_37 architecture #50

Merged

Tom94 changed the title ~~Colab demo?~~ Colab demo? / Headless server version? Jan 18, 2022

Tom94 mentioned this issue Jan 18, 2022

Headless server version？ #10

Closed

NVlabs locked and limited conversation to collaborators Feb 16, 2022

Tom94 converted this issue into discussion #147 Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Colab demo? / Headless server version? #6

Colab demo? / Headless server version? #6

INF800 commented Jan 14, 2022

Tom94 commented Jan 14, 2022

pwais commented Jan 14, 2022 •

edited

Loading

myagues commented Jan 17, 2022

Tom94 commented Jan 17, 2022 •

edited

Loading

pwais commented Jan 17, 2022

Tom94 commented Jan 18, 2022

pwais commented Jan 18, 2022 •

edited

Loading

Tom94 commented Jan 19, 2022 •

edited

Loading

pwais commented Jan 19, 2022 •

edited

Loading

myagues commented Jan 19, 2022

pwais commented Jan 19, 2022

tlightsky commented Jan 30, 2022

Tom94 commented Jan 30, 2022

loboere commented Feb 1, 2022

pwais commented Feb 1, 2022

Tom94 commented Feb 2, 2022 •

edited

Loading

loboere commented Feb 5, 2022

Tom94 commented Feb 5, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

Colab demo? / Headless server version? #6

Colab demo? / Headless server version? #6

Comments

INF800 commented Jan 14, 2022

Tom94 commented Jan 14, 2022

pwais commented Jan 14, 2022 • edited Loading

myagues commented Jan 17, 2022

Tom94 commented Jan 17, 2022 • edited Loading

pwais commented Jan 17, 2022

Tom94 commented Jan 18, 2022

pwais commented Jan 18, 2022 • edited Loading

Tom94 commented Jan 19, 2022 • edited Loading

pwais commented Jan 19, 2022 • edited Loading

myagues commented Jan 19, 2022

pwais commented Jan 19, 2022

tlightsky commented Jan 30, 2022

Tom94 commented Jan 30, 2022

loboere commented Feb 1, 2022

pwais commented Feb 1, 2022

Tom94 commented Feb 2, 2022 • edited Loading

loboere commented Feb 5, 2022

how run testbed.render(width, height, spp=8, linear=False) ? i get this error

Tom94 commented Feb 5, 2022

This issue was moved to a discussion.

pwais commented Jan 14, 2022 •

edited

Loading

Tom94 commented Jan 17, 2022 •

edited

Loading

pwais commented Jan 18, 2022 •

edited

Loading

Tom94 commented Jan 19, 2022 •

edited

Loading

pwais commented Jan 19, 2022 •

edited

Loading

Tom94 commented Feb 2, 2022 •

edited

Loading

how run testbed.render(width, height, spp=8, linear=False) ?
i get this error