The Forge Shading Language (FSL)

ForgeShadingLanguage (FSL)

The purpose of FSL is to provide a single shader syntax from which hlsl/pssl/vk-glsl/metal shader code shader code can be generated. The syntax is largely identical to hlsl, with differences in the shader entry and resource declarations. Whenever possible we make use of simple macros. For more complex modifications, a python script is used (Common_3/Tools/ForgeShadingLanguage/fsl.py).
Therefore python 3.6 is necessary to generate the shaders. We include a no-install python3.6 in Tools/python-3.6.0-embed-amd64.
In the vs fsl.target custom target we prepend that path to PATH, such that the build system uses that binary.

The syntax is generally similar to hlsl, with some modifications intended to make it simpler to expand the code as necessary. For development we recommend to setup and use as many target compilers as possible.

FSL supports vertex, pixel, compute and tessellation shaders (control and evaluation stages). Entry functions are declared using the

VS_MAIN, PS_MAIN, CS_MAIN, TC_MAIN, TE_MAIN

keywords and should span a single line.
The first statement in the main function body should be:

INIT_MAIN;

this statement will get expanded differently for each target language, to return from the main function, the keyword

RETURN(); // for void main function
float4 Out = (...);
RETURN(Out); // for main function with return type

is used;

Here is a sample fsl pixel shader using shader IO and global resources:

STRUCT(VSOutput)
{
	DATA(float4, Position, SV_Position);
    DATA(float, TexCoord, TEXCOORD);
};

float4 PS_MAIN( VSOutput In )
{
    INIT_MAIN;
    float4 color = SampleTex2D(Get(uTexture0), Get(uSampler0), In.TexCoord);
    RETURN(color);
}

Shader Reload

Shaders are built offline and only the binaries are loading by the application.
The unit tests have a "Reload Shaders" button which when triggered will reload the shader binaries.
See the ReloadServer section for more details.

Binary Declarations

Syntax

Binary declarations map to individual output binaries and use the following syntax:

#frag myshader.frag
// src here
#end

Binary declarations need to be declared in the top-level fsl src file, but can be located anywhere in that file.
The generator will collect these and preprocess them roughly as follows:

#ifdef myshader_frag
// src here
#endif

This allows to have a single file with all binary declarations using includes, or for example the following:

float4 PS_MAIN( )
{
#frag myshader_red.frag
    float4 color = float4(1,0,0,1);
#end
#frag myshader_green.frag
    float4 color = float4(0,1,0,1);
#end
    RETURN(color);
}

Feature Flags

Feature flags can be added to binary declarations to enable the following features:

FT_PRIM_ID     // necessary for use of SV_PrimitiveID
FT_RAYTRACING  // enables ray query extensions/headers and necessary msl/spirv/hlsl targets
FT_VRS         // Variable Rate Shading
FT_MULTIVIEW   // necessary for multiview rendering for VR

They are inserted into the declaration as follows:

#frag FT_PRIMT_ID FT_MULTIVIEW myshader.frag

Specialization/Function constants(Vulkan and Metal only)

These constants get baked into the micro-code during pipeline creation time so the performance is identical to using a macro without any of the downsides of macros (too many shader variations increasing the size of the build).

Good read on Specialization constants. Same things apply to function constants on Metal https://arm-software.github.io/vulkan_best_practice_for_mobile_developers/samples/performance/specialization_constants/specialization_constants_tutorial.html

Declared at global scope using SHADER_CONSTANT macro. Used as any regular variable after declaration

Macro arguments:

#define SHADER_CONSTANT(INDEX, TYPE, NAME, VALUE)

Example usage:

SHADER_CONSTANT(0, uint, gRenderMode, 0);
// Vulkan - layout (constant_id = 0) const uint gRenderMode = 0;
// Metal  - constant uint gRenderMode [[function_constant(0)]];
// Others - const uint gRenderMode = 0;

void main()
{
    // Can be used like regular variables in shader code
    if (gRenderMode == 1)
    {
        // 
    }
}

NOTE: Unlike Vulkan, Metal does not provide a way to initialize function constants to default value. So all required function constants need to be passed through ShaderLoadDesc/BinaryShaderDesc when creating the shader

Parameter Modifiers

Function parameters can be annotated using in/out/inout:

void fn( 
    in(float) param_in,
    out(float) param_out,
    inout(float) param_inout
    ) {}

Matrices

FSL matrices are column major. Matrices declared inside cbuffer, pushconstants, or structure buffers are initialized from memory in column major order.
Explicit constructors and accessors are provided:

// this initializes a 3 cols by 2 rows matrix from three 2-component rows.
f3x2 M = f3x2Rows(r0, r1, r2); 
setElem(M, 0, 1, 42.0f); // sets the element at col 0, row 1 to 42
float3 col0 = getCol0(M);
float2 row1 = getRow1(M);

// create a matrix from scalars, provided in row-major order
f2x2 M = f2x2RowElems(
    0,1
    2,3);
float2 col1 = getCol1(M); // (0,2)

We also provide overloaded Identity constructors and helpers to initialize vectors with identical components:

f4x4 id = Identity();
float4 = f4(1); // float4(1,1,1,1)

Shader Resources

The-Forge resources are grouped into four update frequencies:

UPDATE_FREQ_NONE
UPDATE_FREQ_PER_FRAME
UPDATE_FREQ_PER_BATCH
UPDATE_FREQ_PER_DRAW

These generally map to resource tables in the rootsignature.

Since a range of platforms require identical resource declarations per stage, we recommend placing these into resource headers which get included by each stage source file (the declarations are not necessary if the stage uses no resources).

Resources are declared using the CBUFFER(...), PUSH_CONSTANT(...) and RES(...) syntax.
Resources, CBuffer and push constant elements are made available in a global resource namespace which can be accessed from any function.
For explicit resource placement, hlsl registers and glsl bindings need to be declared.

To access a resource, the syntax Get(resource) is used. Texture and Buffer resources can be declared as arrays by appending the dimension to the identifier. For Metal, argument buffers are generated for an update frequency whenever a single resource is declared as an array:

RES(Buffer(uint), myBuffers[2], UPDATE_FREQ_NONE, b0, binding=0);

If any such resource declaration is active in a shader, all resource declaring with the same update frequency get placed inside the argument buffer.

CBuffers

The following syntax declares a CBuffer:

CBUFFER(Uniforms, UPDATE_FRE_NONE, b0, binding=0)
{
    DATA(f4x4, mvp, None);
};

Push Constants

The following syntax declares a PushConstant:

PUSH_CONSTANT(PushConstants, b0)
{
    DATA(uint, index, None);
};

Buffers

The following types of buffers are supported:
Buffer, WBuffer, RWBuffer, ByteBuffer, and RWByteBuffer

RES(RWBuffer(MyType), myArray, UPDATE_FREQ_NONE, b0, binding=0);

The following atomic functions are supported:

// atomic add of value 42 at location 0, previous value is written to last argument
AtomicAdd(Get(uRWBuffer)[0], 0, 42, pre_val);

 // atomic load & store of value 42 at location 0
val = AtomicLoad(Get(uRWBuffer)[0]);
AtomicStore(Get(uRWBuffer)[0], 42);

AtomicMin(Get(uRWBuffer)[0], 42);
AtomicMax(Get(uRWBuffer)[0], 42);

Textures

FSL texture are fundamentally split between readonly types for sampling

Tex#D, Tex#DArray, Tex2DMS, TexCube, Depth2D, Depth2DMS

And read-write types:

RTex#D (readonly), WTex#D (writeonly), RWTex#D (read-write)

Sampling types map to hlsl Texture#D types, glsl texture#D and metal texture#d<T, access::sample> types.
Read-Write types map to hlsl RWTexture#D types, glsl image#D types and metal texture#d<T, access::read_write> types.

Sampling is performed using SampleTex# functions.
Load access is performed using LoadTex# functions for sampling types, and LoadRWTex# for read-write types.
Writing is performed using Write#D functions.

An example, sampling from a cube texture and writing the result to an RW texture2D array:

RES(TexCube(float4), srcTexture, UPDATE_FREQ_NONE, t0, binding = 0);
RES(RWTex2DArray(float4), dstTexture, UPDATE_FREQ_NONE, u2, binding = 2);
RES(SamplerState, skyboxSampler, UPDATE_FREQ_NONE, s3, binding = 3);
(...)
float4 value = SampleLvlTexCube(Get(srcTexture), Get(skyboxSampler), float3(1,0,0), 0);
Write3D(Get(dstTexture), int3(0,0,0), value); // write to texel (0,0) of slice 0.

For loading functions, the sampler argument can also be NO_SAMPLER, though for Vulkan GL_EXT_samplerless_texture_functions is necessary (its gets automatically enabled).

Texture dimensions can be retrieved using:

int2 size = GetDimensions(Get(uTexture), Get(uSampler));

// samplerless alternative
int2 size2 = GetDimensions(Get(uTexture), NO_SAMPLER);

Shader IO

Shader input and output structs are declared using the following syntax:

STRUCT(VSInput)
{
    DATA(float4, position, SV_Position);
};

Such declared datatypes are then normally passen to the main function:

VSOutput(Out) VS_MAIN(VS_Input In)

The shader return variables get automatically created in the INIT_MAIN expansion, and is automatically returned on a call to RETURN. The following semantics are supported:

SV_Position
SV_VertexID
SV_InstanceID
SV_GroupID
SV_DispatchThreadID
SV_GroupThreadID
SV_GroupIndex
SV_SampleIndex
SV_PrimitiveID
SV_DomainLocation

For regular main inputs, the semantic is used as a case-insensitive decoration around the variable type:

void CS_MAIN(SV_GroupIndex(uint) groupIndex)
{...}

Non Uniform Resource Index

For accessing elements of resource arrays, special syntax is necessary when the index is divergent:

uint index = (...);
float4 texColor = f4(0);
BeginNonUniformResourceIndex(index, 256); // 256 is the max possible index
    texColor = SampleLvlTex2D(Get(textureMaps)[index], Get(smp), uv, 0);
EndNonUniformResourceIndex();

For Vulkan, the enclosed block gets replaced based on the availability of the following extensions:

VK_EXT_DESCRIPTOR_INDEXING_EXTENSION: wraps the index inside the block with nonuniformEXT(...)
VK_FEATURE_TEXTURE_ARRAY_DYNAMIC_INDEXING: code inside the block is left untouched
if no extension is available, a switch construct is used

For other platforms, a loop with lane masking is being used as necessary.

Tessellation

For Tessellation, the following syntax is provided:

TESS_VS_SHADER("shader.vert.fsl") // the vs which will be part of the pipeline
PATCH_CONSTANT_FUNC("ConstantHS") // name of the pcf

// declare domain, partitioning and output topology
// required in TC and TE stages
TESS_LAYOUT("quad", "integer", "triangle_ccw")

OUTPUT_CONTROL_POINTS(1)
MAX_TESS_FACTOR(10.0f)

For metal, each TC shader get transformed into a compute shader which:

calls the VS main function
runs the TC main code
calls the pcf function
write the results to a buffer

Wave Intrinsics

To enable Wave Intrisics, the keyword

ENABLE_WAVEOPS

needs to be inserted into the shader code, its location isnt relevant.
The following intrinsics are supported:

ballot_t vote = WaveActiveBallot(expr);
uint numActiveLanes = CountBallot(activeLaneMask);
if (WaveIsFirstLane())
    {...}
if (WaveGetLaneIndex() == WaveGetMaxActiveIndex())
    {...}

val = WaveReadLaneFirst(val);
val = WaveActiveSum(val);
val = QuadReadAcrossX(i):
val = QuadReadAcrossX(j);

Integration and python tool

FSL is integrated into our Visual Studio, XCode and CodeLite projects. The generator tool can also be called directly:

usage: fsl.py [-h] -d DESTINATION -b BINARYDESTINATION [-i INTERMEDIATEDESTINATION]
              [-l {DIRECT3D11,DIRECT3D12,METAL,ORBIS,PROSPERO,SCARLETT,VULKAN,XBOX,GLES} [{DIRECT3D11,DIRECT3D12,METAL,ORBIS,PROSPERO,SCARLETT,VULKAN,XBOX,GLES} ...]]
              [--verbose] [--compile] [--rootSignature ROOTSIGNATURE] [--cache=args] [--shaderServerPort PORT]
              fsl_input

If compilation is requested, the tool will attempt to locate appropirate compilers using env variables:

DIRECT3D11: $(FSL_COMPILER_FXC)
(if not set, will default to "C:/Program Files (x86)/Windows Kits/8.1/bin/x64/fxc.exe")
DIRECT3D12: $(FSL_COMPILER_DXC)
(if not set, will default to "The-Forge/ThirdParty/OpenSource/DirectXShaderCompiler/bin/x64/dxc.exe")
METAL:      $(FSL_COMPILER_METAL)
(if not set, will default to "'C:/Program Files/METAL Developer Tools/macos/bin/metal.exe'")
ORBIS:      $(SCE_ORBIS_SDK_DIR)/host_tools/bin/orbis-wave-psslc.exe
PROSPERO:   $(SCE_PROSPERO_SDK_DIR)/host_tools/bin/prospero-wave-psslc.exe
VULKAN:     $(VULKAN_SDK)/Bin/glslangValidator.exe
XBOX:       $(GXDKLATEST)/bin/XboxOne/dxc.exe
SCARLETT:   $(GXDKLATEST)/bin/Scarlett/dxc.exe
GLES:       (Can only be compiled during runtime)

Visual Studio

A custom buid dependency is defined in Common_3/Tools/ForgeShadingLanguage/VS/fsl.target. Once added to a project, any added *.fsl is assigned the <FSLShader> item type. To add the build customization right-click the project in VS and choose "Build Dependencies" -> "Build Customizations..." -> "Find Existing..." and choose the fsl.target file. The customization can than be enabled per-project from the same menu.

XCode

For XCode, we use a custom build rule for *.fsl resources and directly generate the metal shaders into the the target package. You can find this in a shell script in the Build Phases section of the XCode project settings.

CodeLite

For codelite we use custom makefile additions. You can find this in the Customize -> Custom Makefile Rules section of the Codelite project settings.

For further examples, please consult our Unit Test shader code.

Additional Notes

We aimed to handle includes on our own as much as possible to reduce the need for compiler include handlers. A notable case was dxc, where our generated shaders would compile and run just fine, but hlsl::Exceptions were being thrown from IDxcCompiler::Compile() which originated from the clang ast parse.

ReloadServer

ReloadServer allows you to dynamically recompile FSL shaders at runtime by clicking the Reload shaders button in the Debug UI. It works by running a socket server on the host PC that is waiting for the device to send a shader recompile request. Upon receiving this request, ReloadServer only recompiles shaders that have been modified for the requested projecct, and sends them back to the device where they are reloaded after being received. In the case of a compilation/connection error, the message will be printed to the device logs so that the issue can quickly be inspected.

How to use ReloadServer

ReloadServer is intended to work automatically, and will at most require the user to run a script once per session in order to use it. It is already integrated into all of our projects on all platforms, so no setup is required. See Manually running ReloadServer for details on how to integrate ReloadServer into a new project.

PC

ReloadServer is run automatically on PC during App init, and killed during App exit. There is no input required from the user, it works completely automatically.

Console/Mobile

For Console/Mobile projects, ReloadServer must be run manually on the host PC in order to allow dynamic recompilation of shaders on the device. See Manually running ReloadServer for details on how to run the server manually.

The basic workflow is as follows:

(Non-PC only) Run Common_3/Tools/ReloadServer/ReloadServer.sh or ReloadServer.bat in a terminal
Run App
Modify FSL file
Click Reload shaders button
(if success) Observe updated shaders in App
(if failure) Error is printed in App logs

Configuring ReloadServer

ReloadServer has only one configuration option - the server port. The default port is 6543. The port can also be configured in the following ways:

Visual Studio

You can configure ReloadServer port using the DevicePort option in the FSLShader IDE configuration panel of your project settings. If DevicePort is empty, then the default port is used.

XCode/CodeLite

On XCode/CodeLite, ReloadServer can be configured via the invocation of fsl.py in the build script located in the project settings. If no port is provided, then the default port is used. See fsl.py integration for details.

Platform details

Android

ReloadServer on Android uses adb reverse tcp:PORT tcp:PORT in order to forward recompile requests from the device to the host PC via the USB cable, which avoids requiring a network connection. This is done automatically and requires no user input.

iOS

ReloadServer on iOS requires the device to be connected to the same network as the host PC on which the ReloadServer daemon is running.

Switch

ReloadServer on Switch requires the device to be connected to the same network as the host PC on which the ReloadServer daemon is running.

PS4

See PS4/ReloadServer.md.

Xbox

See Xbox/ReloadServer.md.

Manually running ReloadServer

Batch/shell script

ReloadServer can easily be run manually by using the platform-specific batch/shell script located at Common_3/Tools/ReloadServer.

Windows

.\Common_3\Tools\ForgeShadingLanguage\server\ReloadServer.bat

MacOS/Linux

./Common_3/Tools/ReloadServer/ReloadServer.sh

Python script

The ReloadServer python script can be run from any directory, and is located at Common_3/Tools/ReloadServer/ReloadServer.py.

usage: ReloadServer.py [--port PORT] [--daemon] [--kill]

--port PORT - Choose port used by ReloadServer
--kill - Kill currently running ReloadServer (--daemon is ignored if this is passed)
--daemon - Run ReloadServer as a daemon process instead of directly in the terminal

Only one server will ever be run on a given port (regardless if running as daemon process or not). If there is already a server running on the given port, the server script will print a message and exit instead of running another server on that port. This can be useful when debugging potential issues.

ReloadServer errors and debugging

When an error occurs during shader recompilation, the error message is sent to 3 different locations:

Printed to stdout in ReloadServer.py script - useful for debugging when running directly from terminal
Written to server-log.txt next to ReloadServer.py - useful for debugging the daemon process
Sent to App and printed to device/IDE logs - useful for debugging errors in shader code

This error message can be one of two things:

Generic error message returned by ReloadServer.py (i.e. path sent by device does not exist).
Shader compile error returned by fsl.py. In this case the entire output stdout is the error message.

ReloadServer is designed to be as fast and responsive as possible, so errors in shader compilation do not cause App to stop running. The reasoning is that compiling/running App might take very long, whereas fixing ReloadServer issues can be done very quickly (and maybe can be done several times before App can restart even once). In the case of every failure, ReloadServer prints a detailed message to the App logs about what might have gone wrong and how to fix it, which the developer can use to fix the issue (often in much less time than it takes to restart App). The following most common issues can all quickly be fixed by glancing at device logs:

User programming error in shader code (most commonly a typo)
ReloadServer is not running when requesting shader recompile (usually only on XCode/Codelite where it needs to be run manually)

The Forge Shading Language (FSL)

ForgeShadingLanguage (FSL)

Shader Reload

Binary Declarations

Syntax

Feature Flags

Specialization/Function constants(Vulkan and Metal only)

Parameter Modifiers

Matrices

Shader Resources

CBuffers

Push Constants

Buffers

Textures

Shader IO

Non Uniform Resource Index

Tessellation

Wave Intrinsics

Integration and python tool

Visual Studio

XCode

CodeLite

Additional Notes

ReloadServer

How to use ReloadServer

PC

Console/Mobile

Configuring ReloadServer

Visual Studio

XCode/CodeLite

Platform details

Android

iOS

Switch

PS4

Xbox

Manually running ReloadServer

Batch/shell script

Windows

MacOS/Linux

Python script

ReloadServer errors and debugging

Clone this wiki locally