Running Microsoft FXC in Docker

17 Sep 2018 in Pipeline on Docker, Shaders, Linux

Microsoft DXC is the new shader compiler stack, but the FXC compiler is still the dominant HLSL compiler for a number of reasons:

Performance and correctness regressions of DXIL shaders compared to DXBC
Many cross compilers and custom toolchains still rely on DXBC
IHV drivers are still being adapted to consume DXIL, which is more low-level compared to DXBC
DXC is a complex codebase, as it is based on LLVM - difficult to build, and many components
DXIL is Direct3D 12 only, which makes it Windows 10 only

Therefore, it is still important to support shader compilation with FXC in some situations.

The performance and correctness regressions are a point of ongoing effort, but this is less of a problem today than it was 6 months ago - at least in my opinion, based on my own shaders and tests. In fact, most issues that are reported are fixed in just a couple days - an example. The opposite is also true, where some shaders have massive performance or compile-time cliffs when compiled with FXC compared to DXC, especially when arrays are involved.

Halcyon (SEED’s R&D engine) currently has a mixture of FXC and DXC compiled shaders when running under Direct3D 12, whereas the Vulkan path exclusively uses shaders compiled with DXC.

Given this scene:

Lets compare the the performance:

Name	Direct3D 12	Vulkan	Using DXBC
Depth Clear	0.003 ms	0.003 ms	No
GBuffer Meshes	3.126 ms	3.387 ms	No
Velocity Vector	0.035 ms	0.033 ms	No
GBuffer Sky	0.046 ms	0.048 ms	No
Reproject Meta	0.091 ms	0.089 ms	No
Temporal Reproject	0.163 ms	0.158 ms	No
DiffuseSh	0.012 ms	0.011 ms	No
Shadow Pass	1.084 ms	1.086 ms	No
Shadow Pass	1.091 ms	1.114 ms	No
Shadow Pass	1.080 ms	1.100 ms	No
Depth Pyramid	0.041 ms	0.032 ms	No
GTAO Pass	0.284 ms	0.181 ms	Yes
GTAO Bilateral	0.084 ms	0.083 ms	No
GTAO Bilateral	0.085 ms	0.086 ms	No
GTAO Temporal	0.099 ms	0.173 ms	Yes
Lighting	1.081 ms	3.048 ms	No
SSR Trace	0.595 ms	0.604 ms	No
IBL Reflection	0.021 ms	0.031 ms	Yes
Reflection Filter	0.831 ms	1.239 ms	Yes
Reflection Filter	0.486 ms	0.478 ms	Yes
Reflection Merge	0.049 ms	0.065 ms	No
Temporal AA	0.269 ms	0.219 ms	No
Velocity Reduce	0.019 ms	0.029 ms	No
Velocity Reduce	0.004 ms	0.004 ms	No
Velocity Dilate	0.011 ms	0.004 ms	No
Motion Blur	0.111 ms	0.113 ms	No
Bloom Extract	0.013 ms	0.045 ms	No
Bloom Downsample	0.004 ms	0.008 ms	No
Bloom Blur	0.004 ms	0.004 ms	No
Exposure Adaption	0.004 ms	0.003 ms	No
Bloom Upsample	0.005 ms	0.005 ms	No
Bloom Upsample	0.006 ms	0.004 ms	No
Bloom Upsample	0.009 ms	0.008 ms	No
Bloom Upsample	0.022 ms	0.023 ms	No
Bloom Apply	0.039 ms	0.255 ms	No
Final Output	0.041 ms	0.092 ms	Yes
Present	0.018 ms	0.017 ms	No
Totals	11.173 ms	13.928 ms	5 / 37

A bit hand-wavy, but if we assume that DXIL and SPIR-V are translated by backend compilers into comparable IL, then we can draw some conclusions about these performance metrics.

In cases where DXBC is used but the Direct3D 12 performance is worse than Vulkan, this typically indicates a case where DXIL is likely faster than DXBC, but correctness prevents us from using it.

In cases where DXBC is used and the Direct3D 12 performance is better than Vulkan, this typically indicates a case where DXIL is slower than DXBC, indicating a performance regression.

The most interesting case is the Lighting pass which uses DXIL, and Vulkan is ~3x more expensive. In the DXC stack, HLSL to SPIR-V uses the same AST as HLSL to DXIL, indicating this performance cliff exists in the translation from AST to SPIR-V.

NOTE: A fun data point is that the Lighting pass takes ~200ms to compile to SPIR-V, and about ~10s to compile to DXIL - surely we can fix the compile time and performance cliffs in this instance? ;)

The performance issue with the Reflection passes is largely related to pow(x, 2) differences; FXC emits x * x whereas DXC emits exp2(log2(x) * 2). It’s of course easy to solve this app-side, but it’s important to track and fix these issues in the compiler itself (i.e. supporting power expansion up to 16). Aside from performance, there are numerical differences which cause corruption when DXIL is used for these passes instead of DXBC.

In general, DXIL is used for nearly all passes, and with good performance and compile times.

One of the components in the DXC compiler stack is dxbc2dxil, would could possibly help with transitioning existing DXBC toolchains over to DXIL. Source

HLSL   Other shading langs  DSL          DXBC IL
+      +                    +            +
|      |                    |            |
v      v                    v            v
Clang  Clang                Other Tools  dxbc2dxil
+      +                    +            +
|      |                    |            |
v      v                    v            |
+------+--------------------+---------+  |
|          High level IR (DXIR)       |  |
+-------------------------------------+  |
                  |                      |
                  |                      |
                  v                      |
              Optimizer <-----+ Linker   |
              +      ^             +     |
              |      |             |     |
              |      |             |     |
 +------------v------+-------------v-----v-------+
 |              Low level IR (DXIL)              |
 +------------+----------------------+-----------+
              |                      |
              v                      v
      Driver Compiler             Verifier

Regarding IHV driver stability, I definitely don’t envy the hard work the driver engineers have been needing to do in order to support DXIL. Previously, they just needed to support the more higher level DXBC specification, which gave them a lot more freedom to map these concepts to their internal IL, whereas DXIL is a lot lower level and more explicit around flow control, intrinsics, and overall behavior.

This is definitely a controversial topic, but I personally feel that the overall benefits of an open source compiler stack, proper support for features like wave intrinsics, and an actual specification are very advantageous. As one example, the open source nature of DXC has allowed for Google to collaborate with Microsoft and add HLSL to SPIR-V support to the same codebase, making it less problematic to develop or maintain a complex engine that runs on Vulkan and Direct3D 12, using only HLSL as a source language.

Following my previous posts regarding shader compilation on Linux and scaling out in Kubernetes, I looked into running FXC in Docker. One major problem of FXC is that it is only a closed source Windows binary, which eliminates any ability to cross-compile it for Linux.

Without any source, the only other alternative was to give Wine a shot, which has no problem running fxc.exe correctly.

Repository

FROM ubuntu:18.04
ARG DEBIAN_FRONTEND="noninteractive"
RUN dpkg --add-architecture i386 \
  && apt-get update \
  && apt-get install -y \
    software-properties-common \
    winbind \
    cabextract \
    p7zip \
    unzip \
    wget \
    curl \
    zenity \
  && wget -O- https://dl.winehq.org/wine-builds/Release.key | apt-key add - \
  && apt-add-repository https://dl.winehq.org/wine-builds/ubuntu/ \
  && apt-get update \
  && apt-get install -y --install-recommends winehq-stable \
  && mkdir -p /home/wine/.cache/wine \
  && wget https://dl.winehq.org/wine/wine-mono/4.7.3/wine-mono-4.7.3.msi \
    -O /home/wine/.cache/wine/wine-mono-4.6.4.msi \
  && wget https://dl.winehq.org/wine/wine-gecko/2.47/wine_gecko-2.47-x86.msi \
    -O /home/wine/.cache/wine/wine_gecko-2.47-x86.msi \
  && wget https://dl.winehq.org/wine/wine-gecko/2.47/wine_gecko-2.47-x86_64.msi \
    -O /home/wine/.cache/wine/wine_gecko-2.47-x86_64.msi \
  && wget https://raw.githubusercontent.com/Winetricks/winetricks/master/src/winetricks \
    -O /usr/bin/winetricks \
  && chmod +rx /usr/bin/winetricks \
  && mkdir -p /home/wine/.cache/winetricks/win7sp1 \
  && wget https://download.microsoft.com/download/0/A/F/0AFB5316-3062-494A-AB78-7FB0D4461357/windows6.1-KB976932-X86.exe \
    -O /home/wine/.cache/winetricks/win7sp1/windows6.1-KB976932-X86.exe \
  && groupadd -g 1010 wine \
  && useradd -s /bin/bash -u 1010 -g 1010 wine \
  && chown -R wine:wine /home/wine \
  && apt-get autoremove -y \
    software-properties-common \
  && apt-get autoclean \
  && apt-get clean \
  && apt-get autoremove

VOLUME /home/wine
ENV WINEARCH=win64
ENV WINEDEBUG=fixme-all
RUN winecfg

WORKDIR /fxc
COPY d3dcompiler_47.dll .
COPY fxc.exe .

ENTRYPOINT ["wine", "fxc"]

The above Dockerfile has been published to Docker Hub as gwihlidal/fxc.

The published image can be invoked with:

$ docker run --rm gwihlidal/fxc /help

The host machine file system can also be bind mounted into the container so that fxc can be used like a regular command line application on any machine:

$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) gwihlidal/fxc /T <target> /E <entry-point-name> <input-hlsl-file>

Example output (DXBC):

% docker run --rm -v $(pwd):$(pwd) -w $(pwd) gwihlidal/fxc /T ps_5_1 /E main simple.hlsl

Microsoft (R) Direct3D Shader Compiler 10.1
Copyright (C) 2013 Microsoft. All rights reserved.

//
// Generated by Microsoft (R) HLSL Shader Compiler 10.1
//
//
//
// Input signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Input
//
// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_TARGET                0   xyzw        0   TARGET   float   xyzw
//
ps_5_1
dcl_globalFlags refactoringAllowed
dcl_output o0.xyzw
mov o0.xyzw, l(0,1.000000,0,1.000000)
ret
// Approximately 2 instruction slots used

Running Microsoft FXC in Docker

Graham Wihlidal

Error

Templates (for web app):

Error