Contents |
There are many ways in which to increase the realism of a computer generated image. One such method is to calculate the effects of shadowing on an object when evaluating its lighting equations. There is a very wide variety of shadowing techniques available for use in real-time computer graphics, with each technique exhibiting both advantages and disadvantages. In general, shadowing techniques try to strike some balance between shadow quality and runtime performance.
One such technique has been termed Screen Space Ambient Occlusion (SSAO). This technique was originally discussed by Martin Mittring when presenting the paper “Finding Next Gen” at the 2007 SIGGRAPH conference. The basic concept behind the algorithm is to modify the ambient lighting term [see the lighting section of the book for details about the lighting equation :Foundation and theory] based on how occluded a particular point in a scene is. This was not the first technique to utilize the concept of ambient occlusion, but as you will see later in this chapter Screen Space Ambient Occlusion makes some clever assumptions and simplifications to give very convincing results while maintaining a high level of performance.

In this chapter, we will explore the theory behind the Screen Space Ambient Occlusion technique, provide an implementation of the technique that utilizes the Direct3D 10 pipeline, and discuss what parameters in the implementation can be used to provide scalable performance and quality. Finally, we will discuss an interactive demo which uses the given implementation and use it to give some indication of the performance of the algorithm.
Now that we have a background on where screen space ambient occlusion has come from, we can begin to explore the technique in more detail. SSAO differs from its predecessors by making one major simplification: it uses only a depth buffer to determine the amount of occlusion at any given point in the current scene view instead of using the scene information before it is rasterized. This is accomplished by either directly using the depth buffer or creating a special render target to render the depth information into. The depth information immediately surrounding a given point is then scrutinized and used to calculate how occluded that point is. The occlusion factor is then used to modify the amount of ambient light applied to that point.
The generation of an occlusion factor using the SSAO algorithm is logically a two step process. First we must generate a depth texture representation of the current view of the scene, and then use the depth texture to determine the level of occlusion for each pixel in the final view of the current scene. This process is shown below in Figure 2.

The simplification of using only a depth buffer has several effects with respect to performance and quality. Since the scene information is rasterized into a render target, the information used to represent the scene is reduced from three dimensional to two dimensional. The data reduction is performed with the standard GPU pipeline hardware which allows for a very efficient operation. In addition to reducing the scene by an entire dimension, we are also limiting the 2D scene representation to the visible portions of the scene as well which eliminates any unnecessary processing on objects that will not be seen in this frame.
The data reduction provides a significant performance gain when calculating the occlusion factor, but at the same time removes information from our calculation. This means that the occlusion factor that is calculated may not be an exactly correct factor, but as you will see later in this chapter the ambient lighting term can be an approximation and still add a great deal of realism to the scene.
Once we have the scene depth information in a texture, we must determine how occluded each point is based on it's immediate neighborhood of depth values. This calculation is actually quite similar to other ambient occlusion techniques. Consider the depth buffer location shown in Figure 3:

Ideally we would create a sphere surrounding that point at a given radius r, and then integrate around that sphere to calculate the volume of the sphere that is intersected by some object in the depth buffer. This concept is shown here in Figure 4.

Of course, performing a complex 3D integration at every pixel of a rendered image is completely impractical. Instead, we will approximate this sphere by using a 3D sampling kernel that resides within the sphere's volume. The sampling kernel is then used to look into the depth buffer at the specified locations and determine if each of it's virtual points are obscured by the depth buffer surface. An example sampling kernel is shown in Figure 5, as well as how it would be applied to our surface point.

The occlusion factor can then be calculated as the sum of the sampling kernel point's individual occlusions. As you will see in the implementation section, these individual calculations can be made more or less sensitive based on several different factors, such as distance from the camera, the distance between our point and the occluding point, and artist specified scaling factors.
With a clear understanding of how the algorithm functions, we can now discuss a working implementation. As discussed in the algorithm theory section, we will need a source of scene depth information. For simplicity's sake, we will use a separate floating point buffer to store the depth data. A more efficient technique, although slightly more complex, would utilize the z-buffer to acquire the scene depth. The following code snippet shows how to create the floating point buffer, as well as the render target and shader resource views that we will be binding it to the pipeline with:
// Create the depth buffer D3D10_TEXTURE2D_DESC desc; ZeroMemory( &desc, sizeof( desc ) ); desc.Width = pBufferSurfaceDesc->Width; desc.Height = pBufferSurfaceDesc->Height; desc.MipLevels = 1; desc.ArraySize = 1; desc.Format = DXGI_FORMAT_R32_FLOAT; desc.SampleDesc.Count = 1; desc.Usage = D3D10_USAGE_DEFAULT; desc.BindFlags = D3D10_BIND_RENDER_TARGET | D3D10_BIND_SHADER_RESOURCE; pd3dDevice->CreateTexture2D( &desc, NULL, &g_pDepthTex2D ); // Create the render target resource view D3D10_RENDER_TARGET_VIEW_DESC rtDesc; rtDesc.Format = desc.Format; rtDesc.ViewDimension = D3D10_RTV_DIMENSION_TEXTURE2D; rtDesc.Texture2D.MipSlice = 0; pd3dDevice->CreateRenderTargetView( g_pDepthTex2D, &rtDesc, &g_pDepthRTV ); // Create the shader-resource view D3D10_SHADER_RESOURCE_VIEW_DESC srDesc; srDesc.Format = desc.Format; srDesc.ViewDimension = D3D10_SRV_DIMENSION_TEXTURE2D; srDesc.Texture2D.MostDetailedMip = 0; srDesc.Texture2D.MipLevels = 1; pd3dDevice->CreateShaderResourceView( g_pDepthTex2D, &srDesc, &g_pDepthSRV );
Once the depth information has been generated, the next step is to calculate the occlusion factor at each pixel in the final scene. This information will be stored in a second single component floating point buffer. The code to create this buffer is very similar to the code used to create the floating point depth buffer, so it is omitted for brevity.
Now that both buffers have been created, we can describe the details of generating the information that will be stored in each of them. The depth buffer will essentially store the view space depth value. The linear view space depth is used instead of clip space depth to prevent any depth range distortions caused by the perspective projection. The view space depth is calculated by multiplying the incoming vertices by the worldview matrix. The view space depth is then passed to the pixel shader as a single component vertex attribute.
fragment VS( vertex IN ) { fragment OUT; // Output the clip space position OUT.position = mul( float4(IN.position, 1), WVP ); // Calculate the view space position float3 viewpos = mul( float4( IN.position, 1 ), WV ).xyz; // Save the view space depth value OUT.viewdepth = viewpos.z; return OUT; }
To store the depth values in the floating point buffer, we then scale the value by the distance from the near to the far clipping planes in the pixel shader. This produces a linear, normalized depth value in the range [0,1].
pixel PS( fragment IN ) { pixel OUT; // Scale depth by the view space range float normDepth = IN.viewdepth / 100.0f; // Output the scaled depth OUT.color = float4( normDepth, normDepth, normDepth, normDepth ); return OUT; }
With the normalized depth information stored, we then bind the depth buffer as a shader resource to generate the occlusion factor for each pixel and store it in the occlusion buffer. The occlusion buffer generation is initiated by rendering a single full screen quad. The vertices of the full screen quad only have a single two-component attribute which specifies the texture coordinates at each of its four corners corresponding to their locations in the depth buffer. The vertex shader trivially passes these parameters through to the pixel shader.
fragment VS( vertex IN ) { fragment OUT; OUT.position = float4( IN.position.x, IN.position.y, 0.0f, 1.0f ); OUT.tex = IN.tex; return OUT; }
The pixel shader starts out by defining the shape of the 3D sampling kernel in an array of three component vectors. The lengths of the vectors are varied in the range [0,1] to provide a small amount of variation in each of the occlusion tests.
const float3 avKernel[8] = { normalize( float3( 1, 1, 1 ) ) * 0.125f, normalize( float3( -1,-1,-1 ) ) * 0.250f, normalize( float3( -1,-1, 1 ) ) * 0.375f, normalize( float3( -1, 1,-1 ) ) * 0.500f, normalize( float3( -1, 1 ,1 ) ) * 0.625f, normalize( float3( 1,-1,-1 ) ) * 0.750f, normalize( float3( 1,-1, 1 ) ) * 0.875f, normalize( float3( 1, 1,-1 ) ) * 1.000f };
Next, the pixel shader looks up a random vector to reflect the sampling kernel around from a texture lookup. This provides a high degree of variation in the sampling kernel used, which will allow us to use a smaller number of occlusion tests to produce a high quality result. This effectively "jitters" the depths used to calculate the occlusion, which hides the fact that we are under-sampling the area around the current pixel.
float3 random = VectorTexture.Sample( VectorSampler, IN.tex.xy * 10.0f ).xyz; random = random * 2.0f - 1.0f;
Now the pixel shader will calculate a scaling to apply to the sampling kernel based on the depth of the current pixel. The pixel's depth value is read from the depth buffer and expanded back into view space by multiplying by the near to far clip plane distance. Then the scaling for the x and y component of the sampling kernel are calculated as the desired radius of the sampling kernel (in meters) divided by the pixel's depth (in meters). This will scale the texture coordinates used to look up individual samples. The z component scale is calculated by dividing the desired kernel radius by the near to far plane distance. This allows all depth comparisons to be performed in the normalized depth space that our depth buffer is stored in.
float fRadius = vSSAOParams.y; float fPixelDepth = DepthTexture.Sample( DepthSampler, IN.tex.xy ).r; float fDepth = fPixelDepth * vViewDimensions.z; float3 vKernelScale = float3( fRadius / fDepth, fRadius / fDepth, fRadius / vViewDimensions.z ) ;
With the kernel scaling calculated, the individual occlusion tests can now be carried out. This is performed in a for loop which iterates over each of the points in the sampling kernel. The current pixel's texture coordinates are offset by the randomly reflected kernel vector's x and y components and used to look up the depth at the new location. This depth value is then offset by the kernel vector's z-component and compared to the current pixel depth.
float fOcclusion = 0.0f; for ( int j = 1; j < 3; j++ ) { float3 random = VectorTexture.Sample( VectorSampler, IN.tex.xy * ( 7.0f + (float)j ) ).xyz; random = random * 2.0f - 1.0f; for ( int i = 0; i < 8; i++ ) { float3 vRotatedKernel = reflect( avKernel[i], random ) * vKernelScale; float fSampleDepth = DepthTexture.Sample( DepthSampler, vRotatedKernel.xy + IN.tex.xy ).r; float fDelta = max( fSampleDepth - fPixelDepth + vRotatedKernel.z, 0 ); float fRange = abs( fDelta ) / ( vKernelScale.z * vSSAOParams.z ); fOcclusion += lerp( fDelta * vSSAOParams.w, vSSAOParams.x, saturate( fRange ) ); } }
The delta depth value is then normalized by an application selectable range value. This is intended to produce a factor to determine the relative magnitude of the depth difference. This information can, in turn, be used to adjust the influence of this particular occlusion test. For example, if an object is in the foreground of a scene, and another object is partially obscured behind it then you don't want to count a failed occlusion test for the back object if the foreground object is the only occluder. Otherwise the background object will show a shadow-like halo around the edges of the foreground object. The implementation of this scaling is to linearly interpolate between a scaled version of the delta value and a default value, with the interpolation amount based on the range value.
fOcclusion += lerp( ( fDelta * vSSAOParams.w ), vSSAOParams.x, saturate( fRange ) );
The final step before emitting the final pixel color is to calculate an average occlusion value from all of the kernel samples, and then use that value to interpolate between a maximum and minimum occlusion value. This final interpolation compresses the dynamic range of the occlusion and provides a somewhat smoother output. It should also be noted that in this chapter’s demo program, only sixteen samples are used to calculate the current occlusion. If a larger number of samples are used, then the occlusion buffer output can be made smoother. This is a good place to scale the performance and quality of the algorithm for different hardware levels.
OUT.color = fOcclusion / ( 2.0f * 8.0f ); // Range remapping OUT.color = lerp( 0.1f, 0.6, saturate( OUT.color.x ) );
With the occlusion buffer generated, it is bound as a shader resource to be used during the final rendering pass. The rendered geometry simply has to calculate it's screen space texture coordinates, sample the occlusion buffer, and modulate the ambient term by that value. The sample file provided performs a simple five sample average, but more sophisticated filtering like a Gaussian blur could easily be used instead.
Demo Download: SSAO_Demo
The demo program developed for this chapter provides a simple rendering of a series of cube models that is shaded only with our screen space ambient occlusion. The adjustable parameters discussed for this technique can be changed in real-time with the onscreen slider controls. Figure 6 below shows the occlusion buffer and the resulting final output rendering. Notice that the occlusion parameters have been adjusted to exaggerate the occlusion effect.


In this chapter we developed an efficient, screen space technique for adding realism to the ambient lighting term of the standard phong lighting model. This technique provides one implementation of the SSAO algorithm, but it is certainly not the only one. The current method can be modified for a given type of scene, with more or less occlusion for distant geometry. In addition, the minimum and maximum amounts of occlusion, interpolation techniques, and sampling kernels are all potential areas for improvement or simplification. This chapter has attempted to provide you with an insight into the inner workings of the SSAO technique as well as a sample implementation to get you started.