University of Pennsylvania, CIS 565: GPU Programming and Architecture, Final Project
- Guanlin Huang, Shutong Wu, Rhuta Joshi
- Tested on: Samsung Tablet S8, Qualcomm SM8450 Snapdragon 8 Gen 1;CPU: Octa-core (1x3.00 GHz Cortex-X2 & 3x2.50 GHz Cortex-A710 & 4x1.80 GHz Cortex-A510), GPU: Adreno 730
Project ARC is proposed to implement an application for adding procedural elements to the real-world view. One hidden problem of procedural generation is its performance with a large number of generations, and we intend to solve this with GPU. We have implemented grass and customized the L-system in Unity and show it in AR with Unity AR Foundation.
Basic design of application |
---|
The App has not been released to the public on a mobile app store, but the code can be downloaded and built in Unity. Download and install the .APK file from this repository root to your android device. If it does not work, you may have to build and deploy from Unity. AR experiences on Android devices are driven by ARCore, available only on ARCore supported devices. Ensure that your development device is compatible with AR. Alternatively, you can use a correctly configured AR-compatible Android Emulator instance.
The App allows you to select a type of grass or procedurally generated trees you would like to place in the AR view. It works best in a larger area with some feature points where ARCore plane detection works well.
- Click on the
Show Panel
switch to view a panel with items to place. - Scan the room till you see the light white dots on the plane indicating surface detection.
- Once the surface is detected, you can tap anywhere on the plane to see the selected procedural item in AR view.
- You can also turn on the
Presets
where we provide options to view a preset arrangement of trees, grass, and plants arranged as a garden. - An option to view some additional
Debug Info
is also provided - At any time, you can use the clear screen option to reset and start over.
- Developed using Unity 2021.3.13f1. We recommend using Unity Hub.
- Clone this repo to your machine (optionally, fork this repository if you plan on expanding on it).
- Open the project in Unity 2021.3.13f1, and open the 'ARCScene' scene (if it doesn't open automatically).
- Deploy to an ARCore compatible device.
- Version control system used during development - Plastic SCM
- Project board, bug tracker, and resources
- Project Pitch
- Milestone 1 Presentation
- Milestone 2 Presentation
- Milestone 3 Presentation
- Final Presentation
The grass is rendered through Unity Universal Render Pipeline compute shaders, which can take in a single mesh shape as input and generate other mesh as outputs. In this project, we decided to put a plane mesh as input, and for each vertex of that plane, we output a single triangle to represent a grass blade. To render those grass blades with decent color, we also need a fragment shader and interpolate between a base grass color and a tip grass color. The output should look like the following:
To make the grass more realistic, we want to change the height of the grass blades so that they do not look too triangle-like. Next, we need randomly twist the grass blades so that they will not be facing the same direction; we also need to bend the grass to mimic gravity. To accomplish the aforementioned transformational goals, we define the blade in space close to the vertex emitting it, after which it will be transformed to be local to the mesh. We implemented tangent space to cope with this need.
In order to have grass curvature and convincing grass movement, each blade of grass is divided into a number of segments. Compared to tessellation, this method saves more memory and is more efficient for constructing and compute.
Tangent Space (Image source) | Grass construction (Image source) |
---|---|
The wind is implemented by sampling from a noise texture. The UV coordinate is constructed using the grass blades' input points; this will ensure that they will behave the same with multiple grass instances. The wind is then applied using a scaled rotation matrix to each segment of the grass blade. Combining with some variance in blade shapes, the final result looks like this:
The grass also interacts with the device camera. This means that when a user moves the device camera closer or further away from the grass, the grass bends interactively. Similar to how the wind is applied, the interaction bending is applied with the scaled rotation matrix with respect to the distance between the device and the grass.
Grass under noise based wind forces | Grass interaction with device camera |
---|---|
We implement our L-system generation process based on the paper Parallel Generation of L-Systems. Unlike the paper, which uses CUDA to implement the L-System, we use Unity with its compute shader for the generation. The implementation can be broken down into three parts:
- L-System Derivation: Turn the axiom string into a derived string based on selected rulesets
- L-System Interpretation: Transform the string to a list of position/orientation/material arrays that we use to draw for each symbol
- Rendering: render all the items in the array to the scene
Parallel L-system Generation Workflow used |
---|
Before the derivation starts, there will be a preparation step where every customized rulesets will be loaded into the script. The compute shader will know what each character will derive into.
Consider the following L-System Example:
AXIOM: FAA
Rules:
F -> F[+F][-F]
A -> FF
The above rules get passed to the GPU in 2 buffers - the predecessor buffer and the successor buffer, which are obtained by serially appending all grammar rules.
Predecessor Buffer: FA // Concatenating the left side of all rules
Successor Buffer: F[+F][-F]FF // Concatenating the right side of all rules
In the derivation process, each thread will take care of each character in the string, and a prefix sum scan function is used to calculate the total length of the newly derived string. Because Compute Shader does not accept characters, we are converting characters to ASCII code to ensure the compute shader's dispatch goes smoothly.
In each iteration of derivation(L-System might go through many iterations), we will first use the scan function to examine the total string length of the next iteration's derived string; then we use the prefix sum array to identify the derived characters' indices in the new string. A new string will be generated after this step.
Our scan function can deal with 512*512 = 262144 elements for each L-System, and it is sufficient for this project and the most complex L-system generation.
Derivation example from base paper | Understanding our example |
---|---|
This is the paper's approach to the interpretation, but we decided to use a 1D linked list to help finish the interpretation more simply.
After we get all the strings from derivation, we will then set data for:
- SymbolBuffer: A compute buffer that identifies if each character is a symbol(Something to draw) or not
- DepthBuffer: A compute buffer that identifies each character's depth
- PosBuffer/OriBuffer: A kernel that identifies the position/orientation vector that the character might change to the next character
Interpretation example from base paper | Our Linked List based approach |
---|---|
Then after a prefix sum scan of the Depth buffer and Pos/Ori Buffer, we will set data for LinkedBuffer, where each index will store its parent's index. This solves the hardest problem when constructing an L-system: having every symbol have its predecessor's data.
Finally, we iterate through the array, fetching all the symbols that need to be a draw and their local coordinates. The interpretation marks are complete up to this point.
L-System Example 1 | L-System Example 2 | L-System Example 3 |
---|---|---|
We have implemented rendering in two ways, one in CPU and one in GPU, to render L-System based on different needs.
For CPU, we instantiate the gameobject we have for each symbol, and their performance is relatively acceptive under not complex scenes. The point of keeping this is for debugging and comparing its performance against GPU rendering in simple to not very complicated (below 100 L-system) scenes.
For GPU, we initially passed the mesh/vertex/triangles information to the vertex shader and ran a shader file to render this at runtime. Yet, in our research, we find Unity has a DrawProcedural and DrawMesh function, where we can let GPU render mesh at giving world coordinates at runtime, so we also implement this rendering procedure.
We also implemented a procedural noise generation where you can generate these L-system trees with a certain probability. We can scale each tree's likelihood of appearance, its distances from others, and the boundary of the generation.
Arrangement based on noise function |
---|
We have also provided an option for the users to view a pre-arranged set of procedural L-system trees and grass to look like a garden.
Garden preset in Unity | Garden preset in AR |
---|---|
Setting up an AR environment included many steps elaborately explained in Google documentation here. Considering the future scope of the project, we have also set up ARCore extensions as documented here.
ARCore is Google's framework for building augmented reality experiences on smartphones. We are using Unity's AR Foundation to build this application. We need to take care of two things - detecting surfaces/planes and placing objects in the real-world scene based on user interaction, like tapping or dragging on the screen. This is mainly accomplished using the following:
- ARPlaneManager - An ARPlaneManager detects ARPlanes and creates, updates, and removes game objects when the device's understanding of the environment changes.
- ARRaycastManager - An ARRaycastManager exposes raycasting functionality in which we shoot a ray from the screen coordinates of our tap into the real-world detected, and we store the intersection points as hit points. Refer to the image shown below taken from Google Cdoelabs.
- ARAnchorManager - ARAnchorManager is used to track elements (gameobjects) in the real world. In our application, we are tracking each L-system that the user places so that we can interact with them better and because we want to anchor and orient them correctly along the plane detected. The number of anchors can be reduced since they are resource-intensive.
ARRayCast Hit from touch point on screen |
---|
- Tested on: Samsung Tablet S8, Qualcomm SM8450 Snapdragon 8 Gen 1;CPU: Octa-core (1x3.00 GHz Cortex-X2 & 3x2.50 GHz Cortex-A710 & 4x1.80 GHz Cortex-A510), GPU: Adreno 730
For the performance analysis, we will first analyze each of the components we have implemented. Then we will examine the overall performance to show that using GPU for L-system generation is efficient.
The following graph shows how the number of grass blades will impact the framerate on the mobile device; the maximum framerate on our tester device is 31 FPS. With 800 blades covering approximately 1 square meter (M2) of the real-world area, our App can cover up to 46M2 of space, or 37600 blades of grass, without sacrificing framerate.
The pressure on GPU caused by increasing the number of grass blades shows why we would have a 37600 grass blade limit. After reaching 32000 grass blades, or approximately covering 40M2 of area, the GPU usage reaches a throttle at 95%. From 37600 blades onwards, the device will sacrifice framerate to render the grass blades properly.
We have compared two resources using different methods to generate L-System. One uses CPU to generate L-System entirely; the other is our current work, which uses GPU to analyze the grammar and only uses CPU to instantiate the gameobject in the last step.
For the sample to be compared with, we choose one of the L-System Bush written by Paul Bourke as a test case to pay tribute to his work on L-System. We will measure the time that take for them to generate different L-Systems.
Paul Bourke Bush |
---|
The performance analysis below shows the different times each method uses to generate the tree. At early iterations with only 100 or fewer symbols to draw, CPU/GPU takes a similar time to generate, and the CPU is even faster in milliseconds. But at later iterations, where thousands of symbols will be drawn to the screen, the time taken for the CPU version is exponentially increasing and become slower than the GPU version.
The reason for the time difference is that GPU's parallelism may not be fully utilized until the grammar is complex enough. But one problem with GPU and Unity's compute shader is that if the grammar is too complicated and the whole derived string exceeds the maximum length of string, then the GPU version would not work correctly. It will be a future upgrade if possible, but for now the current version could handle 262144 elements which is sufficient enough for most L-system trees.
Finally, we placed multiple trees of a single grammar at a different level of generations in AR while also having interactive grass blades in the scene to determine the overall performance. We can observe that as the generations increase, there is an expected drop in framerate because of the increased number of characters in each generation, especially when the number of items placed is high. Another possible reason for this drop, despite the parallel generation of these L-systems, is that this testing is performed where instances are rendered sequentially on the CPU.
- Ned Makes Games: Blade Grass
- Parallel Generation of L-Systems
- Roystan: Grass Shader Tutorial
- LSystem Ruleset
- Stylized Hand Painted Nature Pack by BigRookGames
- Hand Painted Flowers by Infinity3DGame
- Adam Mally - For brainstorming ideas on how L-systems can be parallelized & used
- Adam Norris - For discussing issues around surface detection & lighting in ARCore
- Akhilesh Joshi - For discussing problems faced by existing mesh based AR applications & suggesting user interaction ideas for AR
- Erik Minarini - For discussing their CPU based AR Procedural Ivy project and giving lead points
May the force be with you |
---|
Grammatical mistakes | Copy paste | Reaching for the skies |
---|---|---|