-
Notifications
You must be signed in to change notification settings - Fork 370
Dealing with native spec failing GPUs #5607
Description
I think we've probably answered this before as "don't care" but I thought I'd raise it again.
Do we need any more feature flags or warnings for non-compliant GPUs on things we're currently letting slide?
Concrete examples:
- devices failing at texture offsets - this includes all modern Macs with
textureSampleBias. In other words, it includes modern devices, not just old ones. * devices failing to wrap texture coordinates correctly - devices returning significantly off values.
I bring this up because as it is these are silent failures. The developer has no idea they are using a feature that will be broken on some user's devices. The usual thinking is that it will just be visual glitch so not a big deal. But, I asked some LLM do we need to care. Here's some answers that may or may not be hallucinated (I don't have the expertise to know)
When GPUs take shortcuts on this math or fail to meet Vulkan/DirectX/OpenGL specifications for edge filtering and precision, it absolutely breaks non-visual algorithms.
Here are several well-known algorithms and domains where developers use cubemaps for pure mathematics, and where hardware inaccuracies will produce fundamentally incorrect data:
- Scientific Computing & CFD: The "Cubed Sphere"
In computational fluid dynamics (CFD) and planetary sciences (like weather modeling), developers need to simulate systems on the surface of a sphere. Standard spherical coordinates (latitude/longitude) suffer from severe mathematical singularities at the poles, which causes simulations to break down or require massive performance overhead to stabilize.
To solve this, scientists use a "Cubed Sphere" grid—mapping the sphere onto six faces of a cube.
The Exploit: GPU-accelerated simulations often store simulation state (pressure, velocity) in cubemaps to take advantage of the GPU's free, hardware-accelerated bilinear filtering and neighbor-fetching.
The Failure: If the GPU's seamless cubemap filtering calculates edge or corner weights incorrectly, the simulation will introduce artificial energy or mass at the seams of the cube. Over thousands of iterations, this "visual glitch" becomes a mathematical divergence, causing the fluid simulation to destabilize or crash.
- Robotics and Autonomous Vision: Spherical CNNs
Autonomous vehicles and robots frequently use 360-degree cameras or LIDAR arrays. Processing this omnidirectional data with standard Convolutional Neural Networks (CNNs) causes immense distortion.
The Exploit: A common technique is to project the 360-degree sensor data onto a cubemap. Neural networks (specifically Cubemap CNNs) are then trained to run convolutions across these six faces, relying on hardware texture samplers to fetch and pool data accurately across the boundaries.
The Failure: If the API spec for coordinate projection or edge filtering is violated, the neural network receives mathematically warped input data at the boundaries. This isn't a visual artifact; it translates directly to a drop in the AI's confidence score, potentially causing a robot to miscalculate the trajectory of an object passing between two faces of its spatial map.
- Acoustic Simulation (Ray-Traced Audio)
Modern physical audio engines calculate sound propagation (reverb, occlusion, diffraction) by ray-tracing sound waves from a source.
The Exploit: The resulting directional energy impulse responses are often captured and encoded into spherical harmonics, which are then evaluated or stored using cubemaps to determine how sound bounces around a listener.
The Failure: If the math dictating how a 3D vector maps to the cubemap's texels is slightly off, the spatialization of the audio breaks. Frequencies will artificially spike or drop off when the listener turns their head, resulting in mathematically incorrect audio waveforms.
- Omnidirectional Shadow Mapping (Depth Mathematics)
While shadows are visual, the algorithm is a pure mathematical boolean test.
The Exploit: Point lights generate depth data in all directions, storing the distance from the light to the geometry in a depth-formatted cubemap.
The Failure: During the lighting pass, the GPU compares the distance of a pixel to the value stored in the cubemap. If the GPU's coordinate math for fetching the cubemap depth is imprecise—particularly at the corners of the cube—the depth comparison yields the wrong boolean result. The algorithm mathematically concludes an object is in shadow when it isn't (or vice versa), which can entirely break game logic if compute shaders are using that shadow data for AI visibility checks (e.g., "Can the guard see the player, or is the player in the dark?").
Here are three strict, mathematically rigorous domains where 2D texture offsets combined with wrapping (Periodic Boundary Conditions) are mandatory, and where a failure results in catastrophic data corruption.
- Electromagnetics: FDTD Simulations for Metamaterials
The Finite-Difference Time-Domain (FDTD) method is the industry standard for simulating how electromagnetic waves (light, radio, microwaves) interact with structures.
-
The Use Case: When engineers design metamaterials, photonic crystals, or solar cell coatings, they simulate a single microscopic "unit cell." To figure out how the material behaves at scale, they apply strict Periodic Boundary Conditions (wrapping) so a wave exiting the right side of the cell enters the left side, simulating an infinite lattice.
-
The Math: FDTD works by calculating the curl of electric and magnetic fields using spatial derivatives. This is done via 3D stencils using texture offsets (e.g., fetching x+1, x-1).
-
The Failure: If the offset fails to wrap correctly at the boundary of the unit cell, the simulation introduces an artificial physical wall. Instead of propagating through the lattice, the electromagnetic wave will reflect off the boundary or lose energy. The resulting transmission and reflection coefficients—data used to manufacture real-world optical lenses or antennas—will be entirely false.
- Materials Science: Phase-Field Modeling
Phase-field models are used in metallurgy and materials science to simulate how microstructures form—for example, predicting how a new aerospace alloy will crystallize as it cools, or how lithium dendrites grow inside a battery.
-
The Use Case: These simulations are run on periodic grids to represent a small sample of "bulk" material without artificial edge effects.
-
The Math: The simulation relies on solving equations like the Cahn-Hilliard equation, which requires computing the Laplacian (the divergence of the gradient) of the material's concentration. This is a heavy stencil operation relying entirely on fetching neighboring cells via offsets.
-
The Failure: Phase-field equations are strictly governed by the laws of thermodynamics; they must conserve mass and minimize free energy. If a broken offset wrap returns a zero, clamps, or fetches garbage at the boundary, mass is instantly created or destroyed. The thermodynamic model breaks, and the simulation predicts impossible material structures.
- Medical Image Processing: Cylindrical/Tubular Convolutions
In medical imaging, scans of tubular structures—like blood vessels, intestines, or unwrapped retinal maps—are often mathematically "unrolled" into 2D maps for algorithmic analysis or neural network processing.
-
The Use Case: To run feature extraction, edge detection, or AI analysis on these unrolled maps, the convolution kernels must wrap horizontally so that the algorithm understands the left edge and right edge are physically connected in real space.
-
The Math: The convolution applies a kernel (e.g., a
$3 \times 3$ or$5 \times 5$ grid of weights) across the texture. Texture offsets are the standard way to grab these neighboring pixels in a single pass. -
The Failure: If the wrap fails on the offset, the algorithm reads a sudden, massive discontinuity in the tissue at the edge of the map. In an automated diagnostic pipeline, this mathematical seam can easily be falsely classified by a diagnostic algorithm as a tissue rupture, a lesion, or an anomaly.
The Takeaway:
Developers absolutely assume that if a feature is in the API spec, the hardware implements the math correctly. When hardware vendors silently fail these specs, they are introducing silent, accumulating math errors into compute pipelines.
Examples of possible solutions
-
make some of these optional features only turn them on on devices that need them
This would be a breaking change and I know people don't want feature soup. Still, telling people a[4 + offset(6)] accesses a[10] but it actually doesn't seems problematic. If people aren't using these features then there's no harm in making them optional. If they are using these features, maybe we should tell them they're broken?
-
emit warnings
This would probably annoying but emit a warning, a feature is not portable but might be ok if it's only used for visuals and not math. This would at least make devs aware that the feature doesn't work. Maybe a link to tell them where it breaks with test code for them to feature detect.
-
polyfill
Many of these issues are difficult to poly fill as they require sampler state.
-
add documentation about which things don't work
Docs are only good if they are read. So while this is possible solution I'd expect lots of developers to not notice.
WebGPU is being used in more and more places. It's not just the browser. WebGPU runs in node, deno, python, rust, and many other places where it's possibly, more likely than the browser, to be used for portable GPU accelerated math.
If WebGPU is trying to be the portable solution, should we be doing anything more here?