TexAvatars : Hybrid Texel-3D Representations for Stable Rigging of Photorealistic Gaussian Head Avatars

* equal contribution
1KAIST AI, 2Hanyang University
KAIST AI Logo HYU Logo

3DV 2026

TexAvatars reconstructs high-quality, real-time animatable 3D head avatars from multi-view images. Our key insight is a mathematically simple yet well-bounded deformation strategy using Jacobian transformations, ensuring stable training and robust extrapolation to extreme expressions.

Abstract

Constructing drivable and photorealistic 3D head avatars has become a central task in AR/XR, enabling immersive and expressive user experiences. With the emergence of high-fidelity, efficient representations such as 3D Gaussians, recent works have pushed toward ultra-detailed head avatars. Existing approaches typically fall into two categories: rule-based analytic rigging or neural networkbased deformation fields. While effective in constrained settings, both approaches often fail to generalize to unseen expressions and poses—particularly in extreme reenactment scenarios. Other methods constrain Gaussians to the global texel space of 3DMMs to reduce rendering complexity. However, these texel-based avatars tend to underutilize the underlying mesh structure. They apply minimal analytic deformation and rely heavily on neural regressors and heuristic regularization in UV space, which weakens geometric consistency and limits extrapolation to complex, out-of-distribution deformations. To address these limitations, we introduce TexAvatars, a hybrid avatar representation that combines the explicit geometric grounding of analytic rigging with the spatial continuity of texel space. Our approach predicts local geometric attributes in UV space via CNNs, but drives 3D deformation through mesh-aware Jacobians, enabling smooth and semantically meaningful transitions across triangle boundaries. This hybrid design separates semantic modeling from geometric control, resulting in improved generalization, interpretability, and stability. Furthermore, TexAvatars captures fine-grained expression effects—including muscle-induced wrinkles, glabellar lines, and realistic mouth cavity geometry— with high fidelity. Our method achieves state-of-theart performance under extreme pose and expression variations, demonstrating strong generalization in challenging head reenactment settings.

Main idea 1: Bounded Gradient via Jacobian Transformation

where \(G\) denotes Gaussian attributes, subscripts \(d\), \(c\), and \(\ell\) denote deformed, canonical, and local respectively, \(\mathcal{L}\) is the image-space training loss, \(f_\theta\) is the Gaussian texel map decoder with parameters \(\theta\), \(\mathbf{T}\) is the analytic mesh-based transformation, and \(\mathbf{J}\) is its Jacobian.

Prior Methods

\( G_d = G_c + \Delta G \)

\( \nabla_\theta \mathcal{L} = \frac{\partial \mathcal{L}}{\partial G_d} \frac{\partial G_d}{\partial \theta} = \frac{\partial \mathcal{L}}{\partial G_d} \frac{\partial \Delta G}{\partial \theta} \)

The gradient directly scales with the magnitude of global-space displacements \(\Delta G\), which can become arbitrarily large under extreme poses and expressions. This leads to unstable training and requires heuristic regularizers (e.g., penalizing scale or position offsets) that reduce expressiveness.

Unbounded Gradients Poor Extrapolation

Ours (TexAvatars)

\( G_d = \mathbf{T}(G_\ell) \), where \( G_\ell = f_\theta(x) \)

\( \nabla_\theta \mathcal{L} = \frac{\partial \mathcal{L}}{\partial G_d} \underbrace{\frac{\partial G_d}{\partial G_\ell}}_{\mathbf{J}} \frac{\partial G_\ell}{\partial \theta} \)

Since face-level deformations are dominated by rotations with only mild shear or scaling, \(\mathbf{J}\) is near-isometric with \(\|\mathbf{J}\| \leq C\). This ensures bounded gradients:

\(\|\nabla_\theta \mathcal{L}\| \leq C \left\| \frac{\partial \mathcal{L}}{\partial G_d} \right\| \left\| \frac{\partial G_\ell}{\partial \theta} \right\|\)

Stable Training Better Extrapolation

Example: Comparison with RGCA

Comparison of TexAvatars vs Prior Methods

RGCA stabilizes training by scaling the tracked mesh large so that offsets remain small. However, under strong stretching or out-of-distribution expressions, the tiny offset budget leads to dotted artifacts. In contrast, TexAvatars maintains stable, artifact-free rendering even under extreme deformations.

Main idea 2: Quasi-Phong Jacobian Field

Prior Methods

\( \boldsymbol{\mu} = \mathbf{J}_F \boldsymbol{\mu}^\ell + \mathbf{T}_F \)

Per-face Jacobians \(\mathbf{J}_F\) are piecewise-constant. Each triangle defines its frame independently, causing discontinuities at face boundaries when interpolating attributes across adjacent texels.

Discontinuous Boundary Artifacts

Ours (TexAvatars)

\( \boldsymbol{\mu} = \text{GridSample}_{\text{lerp}}(\mathbf{J}_{uv} \boldsymbol{\mu}^\ell_{uv} + \mathbf{T}_{uv}) \)

We unwrap Jacobians into UV space and apply bilinear interpolation, blending across adjacent faces. This Quasi-Phong Jacobian Field yields smooth, continuous deformations analogous to Phong shading.

Smooth Deformations No Boundary Artifacts

Example: Effect of Global UV Sampling and Jacobian

Ablation on Global UV Sampling and Jacobian

Remapping mesh Jacobians to texel space enables smooth blending of attributes across triangle boundaries, eliminating discontinuity artifacts.

Self-Reenactment

Cross-Reenactment

Related Links

For more work on similar tasks, please check out the following papers.

Mesh + Neural Deformation (Predicting additional delta offsets): GEM, RGBAvatar, SurFhead, RGCA

Mesh-Only Deformation (Deformation purely from mesh transformation): GaussianAvatars