This is an overview of Gaussian Splatting, condensing the keypoints of the paper and including some implementation snippets from the original author’s implementation:
Motivation / Overview
Radiance field methods require neural network training, which is costly and render, while faster methods trade off speed for quality.
Three elements allows SOTA visual quality while maintaining competitive training times, and importantly allow high-quality real-time ( 30 fps) novel-view synthesis at 1080p resolution:
- Scene representation with 3D Gaussians
- Interleaved optimization/density control of the 3D Gaussians (optimizing anisotropic covariance)
- Fast GPU algorithms for rendering.
Input
The input to the Gaussian Splatting method is a set of images of a static scene, together with the corresponding cameras calibrated by SfM, which produces a sparse point cloud.
3D Gaussians
Gaussian Representation
From the SfM points, a set of 3D Gaussians is produced. Each Gaussian is defined by a position (mean), covariance matrix, and opacity .
Specifically, each Gaussian is defined by a full 3D covariance matrix defined in world space at point (mean) :
This Gaussian is multiplied by in our blending process. It’s implemented in their repo as part of the rasterization submodule:
Covariance Matrix Representation
The covariance of 3D Gaussians must be represented in a way that can be optimized. (The purpose of this optimization is to adjust the shape, orientation, and size of the Gaussians so that they accurately represent the radiance field in 3D space.)
An obvious approach would be to directly optimize the covariance matrix to obtain 3D Gaussians that represent the radiance field. However, covariance matrices have physical meaning only when they are positive semi-definite. For optimization of parameters, gradient descent is used, which cannot be easily constrained to produce positive semi-definite matrices, and update steps and gradients can very easily create invalid covariance matrices.
Thus, a more intuitive expression is used for covariance, such that it can be optimized. The covariance matrix of a 3D Gaussian is analogous to describing the configuration of an ellipsoid. Given a scaling matrix and a rotation matrix , we can find the corresponding :
To allow independent optimization of both factors, we store them separately: a 3D vector 𝑠 for scaling and a quaternion 𝑞 to represent rotation. These can be trivially converted to their respective matrices and combined, making sure to normalize 𝑞 to obtain a valid unit quaternion.
The implementation of this is shown below, also as part of the rasterization submodule.
Projection to 2D
The 3D Gaussians must be projected to 2D for rendering. Given a viewing transformation , the covariance matrix in camera coordinates is given as follows:
where is the Jacobian of the affine approximation of the projective transformation.
This is done in the computeCov2D
function of diff-gaussian-rasterization/cuda-rasterizer/forward.cu
.
Anisotropy
A key point made throughout the paper is that this covariance is anisotropic – this allows the Gaussian to have different spreads in different directions, resulting in an ellipsoid shape rather than a sphere. The orientation and lengths of the ellipsoid’s axes are determined by the covariance matrix. This allows for better scene representation as we are not restricted representing symmetric features, which would be the isotropic case.
Optimization and Density Control of Gaussians
The optimization step creates a dense set of 3D Gaussians accurately representing the scene for free-view synthesis. In addition to positions , , and covariance , spherical harmonic (SH) coefficients are also optimized, representing color of each Gaussian to correctly capture the view-dependent appearance of the scene. The optimization of these parameters is interleaved with steps that control the density of the Gaussians to better represent the scene.
The optimization is based on successive iterations of rendering and comparing the resulting image to the training views in the captured dataset. Inevitably, geometry may be incorrectly placed due to the ambiguities of 3D to 2D projection. Thus, the optimization needs to be able to create, destroy, and move geometry if it has been incorrectly positioned.
SGD is used for optimization.
Setup
- A sigmoid activation function is used for opacity to constraint it in the 0-1 range and obtain smooth gradients.
- An exponential activation function for the scale of the covariance for similar reasons.
- The initial covariance matrix is estimated as an isotropic Gaussian with axes equal to the mean of the distance to the closest three points
Loss
A standard exponential decay scheduling technique is used. The loss function is combined with a D-SSIM term:
Implementation:
train.py
:
loss_utils.py
:
Densification
Initially, the method starts with a set of sparse points from SfM, but densification is done to adaptively control the number of Gaussians and their density over unit volume. This allows us to go from an initial sparse set of Gaussians to a denser set that better represents the scene, and with correct parameters.
During training, densification is done every 100 iterations (after optimization warm-up).
There are two cases where densification needs to be done:
- Under-Reconstruction: Regions with missing geometric features. In this case, small Gaussians in the space are cloned and moved in the direction of the positional gradient.
- Over-Reconstruction: Regions where Gaussians cover large areas in a scene. Here, large Gaussians in regions with high variance need to be split into smaller Gaussians.
Also, Gaussians that are essentially transparent, i.e., with less than a threshold , are removed.
Implementation:
An effective way to moderate the increase in the number of Gaussians is to set the value close to zero every . The optimization then increases the for the Gaussians where this is needed, while allowing our culling approach to remove Gaussians with less than as described above.
Fast Differentiable Rasterization
Essentially the authors create a fast tile-based rasterizer for Gaussian splats. The exact methods behind this don’t seem very relevant for our use-case. The code is contained in the diff-gaussian-rasterization
submodule.