-
-
Notifications
You must be signed in to change notification settings - Fork 0
A.04 Spaces
The magic of computer graphics lies in the ability to transform and manipulate objects within cartesian coordinate systems, also known as spaces for brevity. Indeed, a fundamental concept that forms the backbone of many graphics pipelines is the utilization of various spaces that bring virtual worlds to be shown on our screens.
In appendix 02 on matrices, we showed that to transform a vector we can transform the starting frame so that we can express its coordinates with respect to a new space. Building upon this foundation, it’s now interesting to look at the common spaces employed in the graphics pipeline and how to go from a space to another as well in order to project 3D scenes onto a 2D surface before showing the result on the screen. From the initial local space where objects originate, to the all-encompassing world space, and the camera space that offers a unique perspective, each space plays a crucial role in the intricate process of rendering.
The object space, also known as local space, is the frame in which 3D meshes are defined. When creating these meshes, 3D graphic artists often work in a convenient space that simplifies vertex modeling and provides symmetry with respect to the origin of the coordinate system.
For instance, consider the modeling of a sphere. It is much easier to place all the vertices at an equal distance from the origin, rather than using a random point as the sphere's center. This intuitive choice is not only practical but can also be mathematically justified.
Equation of the sphere with center in
Equation of the Sphere with center in
It's important to note that the local space is the frame where the vertices of a mesh are defined in the first place. These vertices are often stored in a file on disk and can be loaded into memory to create the vertex buffer, which is subsequently sent to the input assembler. Within this buffer, the vertices remain in their local space representation until the graphics pipeline performs the necessary transformations to convert the 3D objects they represent into a 2D representation to show on the screen.
Throughout this tutorial series, we’ll use a right-handed coordinate system where the z-axis points upwards to represent the object space. This is in line with the convention used by default in Blender and 3ds Max, two well-known 3D modeling softwares. However, keep in mind that this is completely arbitrary and you can choose any configuration (z-up or y-up) and handedness that suits your needs.
When the input assembler sends its output to the next stage (the vertex shader), we have vertices in local space that we want to place in a 3D global scene shared by all meshes. The space of the global scene is called world space, and the transformation to go from local to world space is called world transformation. To represent the world space, we will use the same convention as the object space: right-handed system with the z-axis pointing upwards.
As we know, to go from a frame to another, we need to express the basis vectors of the starting frame with respect to the new frame. So, we can build a matrix
Then, we can define
where the first three columns of
Example:
Given a cube in local space, suppose you want to double its size, rotate it by
As you can see in the following illustration, the first three columns of
Once we have applied the world transformation, all of our meshes are in world space, but we need a specific point of view to observe the 3D scene. This new space is called view space, or camera space. Again, we must perform another transformation (known as the view transformation) on all the vertices of our meshes to go from world space to view space. To facilitate this transition, we employ a matrix
It is worth noting that for the camera space, we’ll use a right-handed system where the y-axis points downwards. This will be very helpful as we move through the following spaces, which we’ll discuss in upcoming sections. By following the convention of having the y-axis point downwards at some point during the rendering process, we can eliminate the awkward flip instruction in the vertex shader we used so far.
Unlike the world transformation, where each mesh has its own unique transformation, in the case of the view transformation, we typically use the same view matrix to transform all the vertices of our meshes. This is because we usually want a consistent viewpoint to observe the entire scene. In other words, we desire a single point of view that encompasses the entire scene. It's as if we can consider the entire scene, comprising all the meshes, as a single large mesh that needs to be transformed from world space to view space.
Now, to build the view matrix, we can start considering the camera as an ordinary mesh we can place in world space. So, we can use a world matrix
Indeed, remember that the inverse of a rotation matrix is equal of its transpose (see appendix 03). Then, the view matrix
It’s interesting to note that, since
because both
Now, we need to calculate
To compute
Observe that to compute
$\mathbf{f}$ we can use$-\mathbf{k}$ because we typically limit the vertical rotation of the camera to less than$90°$ around the x-axis (the reasons behind this will be explained in another tutorial). As a result, the angle between$-\mathbf{k}$ and$\mathbf{g}$ will be less than$90°$ . Similarly, the angle between$-\mathbf{k}$ and$\mathbf{h}$ will be less than$180°$ , as$\mathbf{g}$ and$\mathbf{h}$ must be orthogonal to each other.
Therefore, we can calculate
Observe that the vector
Finally, to compute
Both
GLM provides the helper function lookAtLH to build a view matrix similar to the one we discussed in this section. You need to pass the camera position and target point as arguments to this function, which returns the related view matrix.
Observe that we use the LH (left-handed) version of the more general lookAt function. Indeed, the RH (right-handed) version provided by GLM is implemented with the assumption that the y-axis points upwards and the z-axis points backwards with respect to the camera. This results in different calculations for computing
// c_pos: position (in world coordinates) of the (origin of the) view\camera space.
// c_at: position (in world coordinates) where we want the camera is aimed at.
// c_down: -k (unit basis vector which points downwards).
glm::vec3 c_pos = { 0.0f, -10.0f, 3.0f };
glm::vec3 c_at = { 0.0f, 0.0f, 1.0f };
glm::vec3 c_down = { 0.0f, 0.0f, -1.0f };
// Compute the View matrix.
glm::mat4 viewMatrix = glm::lookAtLH(c_pos, c_at, c_down);
The implementation of the lookAtLH function should be relatively straightforward, given the concepts we have discussed in this section. Observe how the columns and rows are swapped compared to the view matrix
/// Build a left handed look at view matrix.
///
/// @param eye Position of the camera
/// @param center Position where the camera is looking at
/// @param up Normalized up vector, how the camera is oriented. Note: We will use it as down vector
///
/// @tparam T A floating-point scalar type
/// @tparam Q A value from qualifier enum
template<typename T, qualifier Q>
GLM_FUNC_DECL mat<4, 4, T, Q> lookAtLH(
vec<3, T, Q> const& eye, vec<3, T, Q> const& center, vec<3, T, Q> const& up);
template<typename T, qualifier Q>
GLM_FUNC_QUALIFIER mat<4, 4, T, Q> lookAtLH(vec<3, T, Q> const& eye, vec<3, T, Q> const& center, vec<3, T, Q> const& up)
{
vec<3, T, Q> const f(normalize(center - eye)); // h = target - camPos
vec<3, T, Q> const s(normalize(cross(up, f))); // f = -k x h
vec<3, T, Q> const u(cross(f, s)); // g = h x f
mat<4, 4, T, Q> Result(1);
Result[0][0] = s.x;
Result[1][0] = s.y;
Result[2][0] = s.z;
Result[0][1] = u.x;
Result[1][1] = u.y;
Result[2][1] = u.z;
Result[0][2] = f.x;
Result[1][2] = f.y;
Result[2][2] = f.z;
Result[3][0] = -dot(s, eye);
Result[3][1] = -dot(u, eye);
Result[3][2] = -dot(f, eye);
return Result;
}
Once we have the entire scene in camera space, the next step is to project it onto a plane to obtain a 2D representation of the 3D scene. To achieve this, we can ideally place a plane in front of the camera and trace rays from the camera to each vertex of the mesh. The intersection between these rays and the plane gives us the 2D representation of the corresponding 3D vertices. Note that if the projection rays are parallel to each other and orthogonal to the projection plane, the camera's position becomes irrelevant.
In the first case, where the projection rays converge towards a focal point, distant objects appear smaller. This replicates the way human vision works in real life and we commonly refer to this type of projection as perspective.
On the other hand, if the projection rays are parallel to each other, the perspective effect is lost, and the size of objects becomes independent of their distance from the camera. This type of projection is known as orthographic.
To better understand the difference, consider the illustration provided below. It depicts two segments of equal size placed at different distances from the camera. In the perspective projection, the closer segment appears longer when projected onto the projection plane, emphasizing the depth perception effect.
Fortunately, the intricacies of the projection process are almost transparent to the programmer, who is primarily responsible for defining the portion of the 3D scene to be projected onto the projection plane. Indeed, in most cases, capturing the entire scene is not necessary or desired. Depending on the type of projection being used, different geometric shapes define the region of interest.
For orthographic projections, the region is represented by a box. This box encapsulates the portion of the scene that will be projected onto the 2D plane.
In the case of perspective projections, the region of interest is defined by a frustum. A frustum is the volume that exists between two parallel planes that intersect a pyramid. The apex of the pyramid corresponds to the camera position. The plane closer to the camera is referred to as the near plane, while the farther plane is called the far plane. By intersecting a plane and the frustum between the camera and the near plane, we obtain a projection window. Alternatively, the upper face of the frustum, which is the intersection between the near plane and the frustum, can be used as the projection window. In computer graphics literature, the terms "near plane" and "far plane" are commonly used to refer to the corresponding windows as well.
The illustration clearly demonstrates the differences between perspective and orthographic projections. In both projections, the green ball lies outside the defined region of interest and therefore is not projected onto the projection window.
In the case of the orthographic projection, the red and yellow balls appear the same size, regardless of their distance from the camera. This is because the projection rays are parallel and do not converge towards a focal point, resulting in a lack of perspective distortion.
On the other hand, in the perspective projection, the red ball appears smaller compared to the yellow ball. This is due to the converging projection rays that mimic the behavior of human vision in real life. As objects move further away from the camera, they appear smaller, resulting in the size difference observed in the perspective projection.
To define a frustum or a box, we need to specify the distances of the near and far planes from the camera. Therefore, it is convenient to define the frustum in view space, where the camera position is located at the origin. Additionally, we need to determine the dimensions of the projection window. Thanks to this information we can build a projection matrix to transform 3D vertices from view space to another one, called NDC (Normalized Device Coordinates) space. The frustum defined in view space becomes a parallelepiped in NDC space, whose origin is located at the center of the front face (corresponding to the transformation of the near plane).
One significant aspect of NDC space is that the meshes contained within the parallelepiped (previously within the frustum) will have vertex coordinates falling within the following ranges:
The illustration below depicts the frustum in view space (left) and the corresponding parallelepiped in NDC space (right). In Vulkan, the y-axis of the NDC space point downwards. This means we made a wise choice by setting up a y-down configuration for the view space. Indeed, this allows us to align with the Vulkan coordinate system and ensures consistency throughout the rendering process. In other words, there’s no need to flip the y-coordinate in the vertex shader anymore.
The z-axis is always perpendicular to both the front and back faces of the parallelepiped in NDC space and passes through their centers. While this arrangement also holds in view space, it is not an absolute requirement. Indeed, the z-axis in view space can be non-perpendicular to both the near and far planes, and it may pass through a point other than their centers.
Now, you may wonder what’s the point of this transformation. The following illustration shows a 2D representation from the top that explains what happens if you transform a frustum to a parallelepiped. The meshes inside the frustum are transformed accordingly, and the projection rays become parallel to each other. That way, we can orthographically project the mesh vertices onto a projection window (for example, the front face of the parallelepiped in NDC space) to mimic the perspective vision we are used to in real life, where the sides of a long road (or a railway) seem to meet at infinity, for example, and where near objects appear bigger than distant ones.
Interestingly, once we are in NDC space, there is no actual need to project the 3D vertices onto the projection window, as we already have a 2D representation of them. Indeed, as mentioned earlier, in NDC space the projection rays are parallel, and the z-axis is orthogonal to the front face of the NDC parallelepiped, passing through its center (the origin of the NDC space). This means that the x- and y-coordinates of vertices in NDC space remain constant along the projection rays, with only the z-coordinate varying. Consequently, the x- and y-coordinates of a vertex in NDC space are identical both inside the NDC parallelepiped and when projected onto the front face (which lies in the
Most of the time, that’s all we need to know in order to write applications that renders 3D objects on the screen. However, as graphics programmers, we are expected to know how things work under the hood. In particular, knowing how to build a projection matrix might come in useful in the future.
As stated earlier, once we go from view space to NDC space, we implicitly get a 2D representation of 3D mesh vertex positions. So, this transformation is definitely related to the concept of projection. Indeed, the associated matrix is called projection matrix, that can vary depending on the type of projection we are interested in. We will start with a couple of matrices associated with the perspective projection, and then we will show the matrix associated with the orthographic projection.
While GLM offers convenient helper functions for constructing some projection matrices, in this section we will explore the process of manually creating a couple of projection matrices based on frustum information. Our first objective is to derive NDC coordinates from view coordinates. Then, we will attempt to express the resulting equations in matrix form, with the goal of finding a projection matrix to go from the view space to the NDC space. Consider the following illustration.
To begin constructing a projection matrix, we must define a frustum that provides the necessary information. Regarding the projection window, we can intersect a pyramid in view space with any plane positioned between the camera (located at the origin
The angle
However, we usually don’t use the horizontal FOV
Furthermore, the frustum includes near and far planes positioned at distances
Since the z-axis is orthogonal to the projection window and passes through its center, any 3D vertex projected onto its surface will have the y-coordinate already in NDC space (i.e., within the range
On the other hand, the perspective projection
Let's begin by examining
Also, we know that
If you want to compute the horizontal FOV
As for
Observe that a vertex in view space
where
As we know, a vertex position is a point, so the w-coordinate is always 1 regardless of the coordinate space. As for
However, before deriving
Observe that if we multiply the NDC coordinates by
The rasterizer expects to receive primitives with vertices in clip coordinates as input. Therefore, the last stage before the rasterizer must output vertices in clip space. Typically, if no optional stage is enabled, the last stage before the rasterizer is the vertex shader. Otherwise, it can be one between geometry shader and tess evaluation shader.
With the perspective division automatically performed by the rasterizer, we are able to transform the coordinates of a vertex from clip to NDC space. Now, we need to find a matrix form to go from view space to clip space. To do this, we must first multiply equations
Observe that we still need to derive
Then, to get the NDC coordinates, we simply need to divide all the components of
We can now focus on deriving a formula for
As mentioned before, the projection window in camera space can be obtained by intersecting any plane between the camera and the near plane. The result remains the same because, once in NDC space, they represent the same projection window at different distances. This difference in distance does not affect the x- and y-coordinates, as previously explained. However, it does impact the z-coordinate, which requires to handle it separately, as discussed in the current explanation.
Observe that
Consequently, the matrix
because the last two entry in the third row are the only ones that can scale and translate the third coordinate of
The coordinates of
However, in this case we know that
We also know that for a vertex in view space that lies in the far plane we have
where we used
However, in this case we know that
Substituting this into equation
So, we just found the values of
Although, that’s not what we wanted to find at the start of this section (the matrix to go from view to NDC space). However, since we get the perspective division for free during the rasterizer stage, we can actually consider
We built the perspective projection matrix
It won’t be too difficult to derive a perspective projection matrix for this general case, since we have already examined and solved a specific case. Indeed, after projecting the 3D vertices onto the projection window, we just need to translate the projection window so that the z-axis goes through its center again. But first, we need to make some initial observations.
In the general case, the frustum is not necessarily symmetrical with respect to the z-axis, so we can’t use the vertical FOV and aspect ratio to define its size. Instead, we need to set the width and height of the projection window by specifying the view coordinates of its top, bottom, left, and right sides. Also, we will project 3D vertices onto the projection window that lies on the near plane (meaning
In the general case, a vertex
where
Therefore, we need to translate the first two coordinates of
Observe that we used the mid-point formula to subtract the corresponding coordinate of the center (of the projection window) from
Now that we are back to the specific case, we can substitute equation
Similarly, we can substitute equation
With equations
If we omit the perspective division
After the perspective division by the w-component, the vertices inside the NDC parallelepiped are the ones with NDC coordinates falling within the following ranges
This means that the vertices in clip space inside the frustum were the ones with homogeneous coordinates falling within the following ranges
That is, the vertices inside the frustum are the ones bounded by the following homogeneous planes (that is, 4D planes expressed in homogeneous coordinates).
Left:
Right:
Bottom:
Top:
Near:
Far:
The following illustration shows a 2D representation of the frustum in the homogeneous xw-plane.
If
We have
As you can see in the image above, a clipped primitive might no longer be a triangle. Therefore, the rasterizer also needs to triangulate clipped primitives, and re-inserts them in the pipeline.
Whatever perspective projection matrix you decide to use (either
If you set
The following graph shows what happens if you set
This can represent a big problem because if a far mesh A is in front of another mesh B, but A is rendered after B, then A could be considered at the same distance as B with respect to the camera, and discarded from the pipeline if the depth test is enabled. We will delve into depth testing in a subsequent tutorial.
To mitigate the problem, we can set
In an orthographic projection, we also want the z-axis to pass through the center of the projection window, just like in the general case of a perspective projection. However, in an orthographic projection, we can move the projection window anywhere along the z-axis as its location doesn’t really matter. This is an interesting property that we will use to derive an equation for
Indeed, we can reuse equations
Also, with an orthographic projection, we can’t substitute
This means the matrix above allows us to go straight from view space to NDC space, without passing through the homogeneous clip space. Although, the rasterizer still expects vertices in clip coordinates. Then, we need a way to make the rasterizer believe we are passing clip coordinates, while also avoiding the perspective division. As you can see in the fourth row of the orthographic projection matrix, the unitary value has moved in the last element. This means that if you multiply a vertex by an orthographic projection matrix you will get 1 in the last component of the resultant vector. That way, the rasterizer will divide the remaining components by 1, which nullifies the effect of the perspective division.
GLM provides many useful functions for building different types of projection matrices, depending on the type of projection and the handedness of the frame. However, for the same reasons discussed in the section on view space, the fact that we are using a right-handed system with the y-axis pointing downwards and the z-axis pointing towards the frustum affects the implementation of the projection matrices as well. As a result, we need to use the left-handed versions provided by GLM. So, for example, to build a perspective projection matrix we can use the helper function perspectiveLH_ZO.
/// Creates a matrix for a left handed, symmetric perspective-view frustum.
/// The near and far clip planes correspond to z normalized device coordinates of 0 and +1 respectively. (Direct3D and Vulkan clip volume definition)
///
/// @param fovy Specifies the field of view angle, in degrees, in the y direction. Expressed in radians.
/// @param aspect Specifies the aspect ratio that determines the field of view in the x direction. The aspect ratio is the ratio of x (width) to y (height).
/// @param near Specifies the distance from the viewer to the near clipping plane (always positive).
/// @param far Specifies the distance from the viewer to the far clipping plane (always positive).
///
/// @tparam T A floating-point scalar type
template<typename T>
GLM_FUNC_DECL mat<4, 4, T, defaultp> perspectiveLH_ZO(
T fovy, T aspect, T near, T far);
As you can see, we only need to pass the vertical FOV, the aspect ratio, and the distances of the near and far planes. This means that with this function we can build the matrix
As for the general case of a perspective projection, we can use the helper function frustumLH_ZO.
/// Creates a left handed frustum matrix.
/// The near and far clip planes correspond to z normalized device coordinates of 0 and +1 respectively. (Direct3D and Vulkan clip volume definition)
///
/// @param left Specifies the view x-coordinate of the left side of the projection window (on the near plane).
/// @param right Specifies the view x-coordinate of the right side of the projection window (on the near plane).
/// @param top Specifies the view y-coordinate of the top side of the projection window (on the near plane).
/// @param bottom Specifies the view y-coordinate of the bottom side of the projection window (on the near plane).
/// @param near Specifies the distance from the viewer to the near clipping plane (always positive).
/// @param far Specifies the distance from the viewer to the far clipping plane (always positive).
///
/// @tparam T A floating-point scalar type
template<typename T>
GLM_FUNC_DECL mat<4, 4, T, defaultp> frustumLH_ZO(
T left, T right, T bottom, T top, T near, T far);
As for the orthographic projection, we can use the helper function orthoLH_ZO.
/// Creates a matrix for an orthographic parallel viewing volume, using left-handed coordinates.
/// The near and far clip planes correspond to z normalized device coordinates of 0 and +1 respectively. (Direct3D and Vulkan clip volume definition)
///
/// @param left Specifies the view x-coordinate of the left side of the projection window (on the near plane).
/// @param right Specifies the view x-coordinate of the right side of the projection window (on the near plane).
/// @param top Specifies the view y-coordinate of the top side of the projection window (on the near plane).
/// @param bottom Specifies the view y-coordinate of the bottom side of the projection window (on the near plane).
/// @param near Specifies the distance from the viewer to the near clipping plane (always positive).
/// @param far Specifies the distance from the viewer to the far clipping plane (always positive).
///
/// @tparam T A floating-point scalar type
template<typename T>
GLM_FUNC_DECL mat<4, 4, T, defaultp> orthoLH_ZO(
T left, T right, T bottom, T top, T zNear, T zFar);
Refer to the GLM library’s source code to verify that these projection matrices are implemented according to the definitions presented in this tutorial. And remember that GLM follows the column-major order convention for storing matrix data, which means that columns are stored contiguously in memory rather than rows. Consequently, when inspecting the GLM source code, you should expect to see a reversal of columns and rows compared to the projection matrices presented in this tutorial.
After the perspective division, all vertices are in NDC space, and if we only consider the first two NDC coordinates, we also have their 2D representations. Although, we are in a normalized 2D space (the
The framebuffer space is the coordinate system used by pipeline stages that operate on, or with respect to, framebuffer attachments (including the color attachment). Framebuffer coordinates are used to specify texel\pixel positions in framebuffer space, where adjacent pixels’ coordinates differ by 1 in x and/or y, with (0,0) in the upper left corner and pixel centers at half-integers. Framebuffer space also has size information (determined by the dimensions specified during the creation of the framebuffer, as discussed in the tutorial 01.A - Hello Window) that specifies where the rendering operations will be restricted in framebuffer space. In the image below,
The rasterizer automatically transforms the vertices from NDC space to framebuffer space by using the viewport information we set with vkCmdSetViewport. Once in framebuffer space, it can generate fragments covered by primitives. However, if the framebuffer coordinates of a fragment fall outside the specified framebuffer size, the fragment will be discarded and won't be processed by any subsequent stage of the pipeline.
In the tutorial 01.A - Hello Window, we briefly mentioned that a viewport can be seen as a rectangular region within the framebuffer space where rendering operations take place. Now, we can be more specific in stating that a viewport is a structure that holds the necessary information for the rasterizer to construct a matrix that transforms vertices from NDC space to a specific rectangle within the framebuffer space. In other words, it defines the mapping of the projection window onto a chosen area of the color attachment within the framebuffer.
Since we might find it useful in the future, let’s see how we can manually build this matrix to go from NDC space to frambuffer space from the viewport information. Suppose we want to draw on a selected
to the following framebuffer ranges
Starting with the x-coordinate, we need to map
A similar calculation applies to calculate the y-coordinate by mapping
As for the z-coordinate, we only need to scale
At this point, we only need to translate the resulting coordinates to shift the origin of the
Due to this translation operation, if the resulting framebuffer coordinates fall outside the framebuffer size, then the corresponding fragments generated by the rasterizer will be discarded. That is, they won't be processed by subsequent stages in the pipeline.
Now, we can derive our framebuffer coordinates
In matrix form this becomes
Although, most of the time we don’t want to rescale the NDC z-coordinate, so we have
To prevent stretching in the final image on the screen, it’s recommended to set
$w$ and$h$ so that aspect ratio of the projection window matches the aspect ratio of the color attachment, and the window’s client area as well.
Once mesh vertices are in framebuffer space, the rasterizer can identify the texels covered by the primitives, and emit fragments at the corresponding positions to be consumed by the fragment shader.
Source code: LearnVulkan
[1] Essential Mathematics for Games and Interactive Applications (Van Verth, Bishop)
[2] 3D Graphics for Game Programming (Han)
If you found the content of this tutorial somewhat useful or interesting, please consider supporting this project by clicking on the Sponsor button. Whether a small tip, a one time donation, or a recurring payment, it's all welcome! Thank you!