**Thallo -- Scheduling for High-Performance Large-scale Non-linear Least-Squares Solvers** Michael Mara, Felix Heide, Michael Zollhöfer, Matthias Nießner, Pat Hanrahan ![](./images/representative_image_thallo_siggraph_small.png) # Overview Large-scale optimization problems at the core of many graphics, vision, and imaging applications are often implemented by hand in tedious and error-prone processes in order to achieve high performance (in particular on GPUs), despite recent developments in libraries and DSLs. At the same time, these hand-crafted solver implementations reveal that the key for high performance is a problem-specific schedule that enables efficient usage of the underlying hardware. Thallo is a high-performance domain-specific language (DSL) that generates tailored high-performance GPU solvers and schedules from a concise, high-level energy description without the hassle of manually constructing and maintaining tedious and error-prone solvers. It is a direct successor to the [Opt DSL](http://optlang.org/), extended with compiler transforms and a scheduling languge inspired by [Halide](https://halide-lang.org/) that enable it to tackle a much wider class of problems (from BundleFusion to Blendshape Fitting) with state-of-the-art performance. # Resources - The original paper has been accepted to Transactions on Graphics and will be (remotely) presented August 2021 at [SIGGRAPH](https://s2021.siggraph.org). - The code will be released within the next few weeks on [github](https://github.com/). - Since this project is a direct descendant of Opt, [its website](http://optlang.org/) is a good resource as well. For more information, contact: mmara at cs dot stanford dot edu # Illustrative Example: 3D Blendshape Fitting For illustrative purposes, let's take a deep look at a simple example: ## Example Overview Non-rigid model-based registration is a fundamental building block for a large variety of tracking and reconstruction approaches. One very prominent example is fitting an affine parametric 3D mesh model $P$ to 2D observations, e.g., as done in the Face2Face [Thies et al. 2016] approach. In the following, we assume that the embedding of a mesh is represented by a vector $\mathbf{m}\in \mathbb{R}^{3N}$ that stacks its $N$ vertices. In the case of face tracking, the affine parametric model is then defined by a low-dimensional expression subspace of $M$ blendshapes spanned by a matrix $B\in\mathbb{R}^{3N\times M}$ that models the facial expressions relative to a neutral face template $\mathbf{m}\in \mathbb{R}^{3N}$; $m=\mathcal{P}(\mathbf{c})=\mathbf{a} + \mathbf{B} \mathbf{c}$. Each of the $M$ blendshapes represents a different facial expression as per-vertex offsets to the neutral face template. New faces $\mathbf{m}\in \mathbb{R}^{3N}$ are created by interpolation of these $M$ displacement vectors based on the $M$ blendshape weights $\mathbf{c}\in \mathbb{R}^{M}$. In the easiest scenario, fitting the model to 3D data means finding the best coefficients $c_*$, such that $\mathcal{P}(\mathbf{c}_*)$ best matches a set of target correspondences $\mathbf{t}$. Typically, the number of vertices is much larger than the number of blendshapes $N \gg M$, hence the resulting Jacobian matrix has a lot more rows than columns and the Jacobian matrix is fully dense. We regularize the model by adding a term penalizing weights with large magnitude. This optimization is run in alternation with a correspondence search to build a model-based non-rigid ICP. ![An example of parametric model fitting using optimization. The mesh is constrained to be a linear combination of basis meshes, called blendshapes, and the optimization seeks to minimize vertex distances to target correspondences, shown in green. ](./images/face_fitting.png) ## Mathematical Formulation \begin{equation} \min_{\mathbf{c}} \sum_{i=1}^M \big|\big| \mathbf{t}_{i}-\Pi\left(\mathcal{P}(\mathbf{c}_i)\right) \big|\big|_{2}^{2} + \sum_{i=1}^M \omega_r \mathbf{c}_i^2 \\ \mathcal{P}(\mathbf{c}) = \mathbf{m} = \mathbf{a} + \mathbf{B} \mathbf{c} \end{equation} where $\mathbf{m}$ are the vertex positions of the parametric mesh, calculated from the neutral template $\mathbf{a}$ with offsets calculated by multiplying the blendshape matrix $\mathbf{B}$ by the vector of blendshape weights $\mathbf{c}$, which are what we are optimizing for. $\mathbf{t}$ are the target (2D) correspondences, and $\Pi$ is the camera transformation and projection function, which transforms 3D points into 2D coordinates. $\omega_r$ is the regularization weight. ## Code The code is expressed per-element. This requires a straightforward translation from the slightly higher-level linear alegebra formulation above. The matrix multiply becomes a dot product (sum of products) along the blendshape weight dimension $M$. `BlendshapeWeights` is $\mathbf{c}$, `BlendshapeBasis` is $\mathbf{B}$, `AverageMesh` is $\mathbf{a}$, `Target` is $\mathbf{t}$, and `w_regSqrt` is $\sqrt{\omega_r}$. $\Pi$ is implemented by the line using `CameraParams` and `CameraXForm`. The intermediate expression `Mesh` corresponds to $\mathcal{P}(\mathbf{c}) = \mathbf{m}$. We also add some extra code to allow users to disable the fitting term for some vertices by setting the target position's $x$ coordinate to a large negative number. In the final line we use a single scheduling construct to direct the compiler to split the core $\mathbf{J}^T\mathbf{J}p$ computation into two separate (matrix-free) kernels, materializing $\mathbf{J}p$ in the first step. Both of these kernels are (by default) parallelized over the residual terms. ~~~~lua local N,M = Dims("N", "M") Inputs { BlendshapeWeights = Unknown(opt_float, {M}, 0), AverageMesh = Array(opt_float3, {N}, 1), BlendshapeBasis = Array(opt_float3, {N,M}, 2), Target = Array(opt_float3, {N}, 4), w_regSqrt = Param(float, 5), cameraParams = Param(float9, 6), cameraXform = Param(float6, 7) } local m,n = M(),N() local Mesh = AverageMesh(n) + Sum({m}, BlendshapeBasis(n,m)*BlendshapeWeights(m)) local Pos2D = project_point(rigid_trans(PoseToMatrix(cameraXForm), Mesh), cameraParams) local e_fit = Target(n) - Pos2D local valid = greatereq(Target(n,0), -999999.9) r = Residuals { reg = w_regSqrt*BlendshapeWeights(m), fit = Select(valid,e_fit,0) } r.fit.Jp:set_materialize(true) ~~~~ This code uses the following helper functions from `lib.t`: - `PoseToMatrix` - `rigid_trans` - `project_point` ## References - [Face2Face: Real-time Face Capture and Reenactment of RGB Videos](http://www.graphics.stanford.edu/~niessner/papers/2016/1facetoface/thies2016face.pdf) - [Deformable Model Fitting by Regularized Landmark Mean-Shift](https://ci2cv.net/media/papers/2011_IJCV_Saragih.pdf)