Shaders

Urho3D uses an ubershader-like approach: permutations of each shader will be built with different compilation defines, to produce eg. static or skinned, deferred or forward or shadowed/unshadowed rendering.

The building of these permutations happens on demand: technique and renderpath definition files both refer to shaders and the compilation defines to use with them. In addition the engine will add inbuilt defines related to geometry type and lighting. It is not generally possible to enumerate beforehand all the possible permutations that can be built out of a single shader.

On Direct3D compiled shader bytecode is saved to disk in a "Cache" subdirectory next to the shader source code, so that the possibly time-consuming compile can be skipped on the next time the shader permutation is needed. On OpenGL such mechanism is not available.

Inbuilt compilation defines

When rendering scene objects, the engine expects certain shader permutations to exist for different geometry types and lighting conditions. These correspond to the following compilation defines:

Vertex shader:

  • NUMVERTEXLIGHTS=1,2,3 or 4: number of vertex lights influencing the object
  • DIRLIGHT, SPOTLIGHT, POINTLIGHT: a per-pixel forward light is being used. Accompanied by the define PERPIXEL
  • SHADOW: the per-pixel forward light has shadowing
  • NORMALOFFSET: shadow receiver UV coordinates should be adjusted according to normals
  • SKINNED, INSTANCED, BILLBOARD: choosing the geometry type

Pixel shader:

  • DIRLIGHT, SPOTLIGHT, POINTLIGHT: a per-pixel forward light is being used. Accompanied by the define PERPIXEL
  • CUBEMASK: the point light has a cube map mask
  • SPEC: the per-pixel forward light has specular calculations
  • SHADOW: the per-pixel forward light has shadowing
  • SIMPLE_SHADOW, PCF_SHADOW, VSM_SHADOW: the shadow sampling quality that is to be used
  • SHADOWCMP: use manual shadow depth compare, Direct3D9 only for DF16 & DF24 shadow map formats
  • HEIGHTFOG: object's zone has height fog mode

Inbuilt shader uniforms

When objects or quad passes are being rendered, various engine inbuilt uniforms are set to assist with the rendering. Below is a partial list of the uniforms listed as HLSL data types. Look at the file Uniforms.glsl for the corresponding GLSL uniforms.

Vertex shader uniforms:

  • float3 cAmbientStartColor: the start color value for a zone's ambient gradient
  • float3 cAmbientEndColor: the end color value for a zone's ambient gradient
  • float3 cCameraPos: camera's world position
  • float cNearClip: camera's near clip distance
  • float cFarClip: camera's far clip distance
  • float cDeltaTime: the timestep of the current frame
  • float4 cDepthMode: parameters for calculating a linear depth value between 0-1 to pass to the pixel shader in an interpolator.
  • float cElapsedTime: scene's elapsed time value. Can be used to implement animating materials
  • float4x3 cModel: the world transform matrix of the object being rendered
  • float4x3 cView: the camera's view matrix
  • float4x3 cViewInv: the inverse of the camera's view matrix (camera world transform)
  • float4x4 cViewProj: the camera's concatenated view and projection matrices
  • float4x3 cZone: zone's transform matrix; used for ambient gradient calculations

Pixel shader uniforms:

  • float3 cAmbientColor: ambient color for a zone with no ambient gradient
  • float3 cCameraPosPS: camera's world position
  • float4 cDepthReconstruct: parameters for reconstructing a linear depth value between 0-1 from a nonlinear hardware depth texture sample.
  • float cDeltaTimePS: the timestep of the current frame
  • float cElapsedTimePS: scene's elapsed time value
  • float3 cFogColor: the zone's fog color
  • float4 cFogParams: fog calculation parameters (see Batch.cpp and Fog.hlsl for the exact meaning)
  • float cNearClipPS: camera's near clip distance
  • float cFarClipPS: camera's far clip distance

Writing shaders

Shaders must be written separately for HLSL (Direct3D) and GLSL (OpenGL). The built-in shaders try to implement the same functionality on both shader languages as closely as possible.

To get started with writing your own shaders, start with studying the most basic examples possible: the Basic, Shadow & Unlit shaders. Note the shader include files which bring common functionality, for example Uniforms.hlsl, Samplers.hlsl & Transform.hlsl for HLSL shaders.

Transforming the vertex (which hides the actual skinning, instancing or billboarding process) is a slight hack which uses a combination of macros and functions: it is safest to copy the following piece of code verbatim:

For HLSL:

float4x3 modelMatrix = iModelMatrix;
float3 worldPos = GetWorldPos(modelMatrix);
oPos = GetClipPos(worldPos);

For GLSL:

mat4 modelMatrix = iModelMatrix;
vec3 worldPos = GetWorldPos(modelMatrix);
gl_Position = GetClipPos(worldPos);

On both Direct3D and OpenGL the vertex and pixel shaders are written into the same file, and the entrypoint functions must be called VS() and PS(). In OpenGL mode one of these is transformed behind the scenes to the main() function required by GLSL. When compiling a vertex shader, the compilation define "COMPILEVS" is always present, and likewise "COMPILEPS" when compiling a pixel shader. These are heavily used in the shader include files to prevent constructs that are illegal for the "wrong" type of shader, and to reduce compilation time.

Vertex shader inputs need to be matched to vertex element semantics to render properly.. In HLSL semantics for inputs are defined in each shader with uppercase words (POSITION, NORMAL, TEXCOORD0 etc.) while in GLSL the default attributes are defined in Transform.glsl and are matched to the vertex element semantics with a case-insensitive string "contains" operation, with an optional number postfix to define the semantic index. For example iTexCoord is the first (semantic index 0) texture coordinate, and iTexCoord1 is the second (semantic index 1).

Uniforms must be prefixed in a certain way so that the engine understands them:

  • c for uniform constants, for example cMatDiffColor. The c is stripped when referred to inside the engine, so it would be called "MatDiffColor" in eg. SetShaderParameter()
  • s for texture samplers, for example sDiffMap.

In GLSL shaders it is important that the samplers are assigned to the correct texture units. If you are using sampler names that are not predefined in the engine like sDiffMap, just make sure there is a number somewhere in the sampler's name and it will be interpreted as the texture unit. For example the terrain shader uses texture units 0-3 in the following way:

uniform sampler2D sWeightMap0;
uniform sampler2D sDetailMap1;
uniform sampler2D sDetailMap2;
uniform sampler2D sDetailMap3;

The maximum number of bones supported for hardware skinning depends on the graphics API and is relayed to the shader code in the MAXBONES compilation define. Typically the maximum is 64, but is reduced to 32 on the Raspberry PI, and increased to 128 on Direct3D 11 & OpenGL 3. See also GetMaxBones().

API differences

Direct3D9 and Direct3D11 share the same HLSL shader code, and likewise OpenGL 2, OpenGL 3, OpenGL ES 2 and WebGL share the same GLSL code. Macros and some conditional code are used to hide the API differences where possible.

When HLSL shaders are compiled for Direct3D11, the define D3D11 is present, and the following details need to be observed:

  • Uniforms are organized into constant buffers. See the file Uniforms.hlsl for the built-in uniforms. See TerrainBlend.hlsl for an example of defining your own uniforms into the "custom" constant buffer slot.
  • Both textures and samplers are defined for each texture unit. The macros in Samplers.hlsl (Sample2D, SampleCube etc.) can be used to write code that works on both APIs. These take the texture unit name without the 's' prefix.
  • Vertex shader output position and pixel shader output color need to use the SV_POSITION and SV_TARGET semantics. The macros OUTPOSITION and OUTCOLOR0-3 can be used to select the correct semantic on both APIs. In the vertex shader, the output position should be specified last, as otherwise other output semantics may not function correctly. In general, it is necessary that the output semantics defined by the vertex shader are defined as pixel shader inputs in the same order. Otherwise the Direct3D shader compiler may assign the semantics wrong.
  • On Direct3D11 the clip plane coordinate must be calculated manually. This is indicated by the CLIPPLANE compilation define, which is added automatically by the Graphics class. See for example the LitSolid.hlsl shader.
  • Direct3D11 does not support luminance and luminance-alpha texture formats, but rather uses the R and RG channels. Therefore be prepared to perform swizzling in the texture reads as appropriate.
  • Direct3D11 will fail to render if the vertex shader refers to vertex elements that don't exist in the vertex buffers.

For OpenGL, the define GL3 is present when GLSL shaders are being compiled for OpenGL 3+, the define GL_ES is present for OpenGL ES 2, WEBGL define is present for WebGL and RPI define is present for the Raspberry Pi. Observe the following differences:

  • On OpenGL 3 GLSL version 150 will be used if the shader source code does not define the version. The texture sampling functions are different but are worked around with defines in the file Samplers.glsl. Likewise the file Transform.glsl contains macros to hide the differences in declaring vertex attributes, interpolators and fragment outputs.
  • On OpenGL 3 luminance, alpha and luminance-alpha texture formats are deprecated, and are replaced with R and RG formats. Therefore be prepared to perform swizzling in the texture reads as appropriate.
  • On OpenGL ES 2 precision qualifiers need to be used.

Shader precaching

The shader variations that are potentially used by a material technique in different lighting conditions and rendering passes are enumerated at material load time, but because of their large amount, they are not actually compiled or loaded from bytecode before being used in rendering. Especially on OpenGL the compiling of shaders just before rendering can cause hitches in the framerate. To avoid this, used shader combinations can be dumped out to an XML file, then preloaded. See BeginDumpShaders(), EndDumpShaders() and PrecacheShaders() in the Graphics subsystem. The command line parameters -ds <file> can be used to instruct the Engine to begin dumping shaders automatically on startup.

Note that the used shader variations will vary with graphics settings, for example shadow quality simple/PCF/VSM or instancing on/off.