Info about Float precision on GPU, bugs/features

Started by
13 comments, last by Acosix 2 years, 8 months ago

Full blog post, more info there https://arugl.medium.com/float-precision-on-gpu-bugs-features-178ddd030f

Interesting results betwen AMD/Nvidia GPU and Angle DX11/OpenGL/Vulkan:

Small uint-bits value to float can be considered as denormal and converted to 0 on CPU and/or GPU.

Shader https://www.shadertoy.com/view/tlfBRB

Shader compiler may precompile static code using 32 or 64 bit floats.

Shader https://www.shadertoy.com/view/sllXW8

Nvidia

AMD

More info in linked bog post.

Advertisement

This reads to me as finding out what is meant in the “what every programmer should know about floating point” article.

Most floating point values are an approximation to the real (math) value. It differs between hardware, performed operations, order of operations, and what not.

Assume that float_a == float_b is false by definition, and you'll be fine.

This reads to me as finding out what is meant in the “what every programmer should know about floating point” article.

GPU floating point is close to, but not quite, IEEE 754 floating point. You don't get floating point exceptions. Rounding options are different. Denormal handling is different. NAN and INF handling may be different. In games, this is rarely an issue. People trying to climb gradients for ML may care about this kind of thing. I used to do physics engines, which did some gradient climbing and sometimes had small differences between large numbers with the expected loss of significance problem. That sort of thing might be different on a GPU.

NVidia has a document on this.

In practice. most floating point problems involve:

  • Subtracting two values that are very close
  • Going out of range because of a near divide by zero
  • Expecting mathematical identities such as sin(x)^2 + cos(x)^2 = 1 to hold precisely.
  • Not having enough precision when far from the origin. If your game world is a a kilometer across, 32-bit floating point is not enough. You may have to re-origin now and then.
  • Using double precision too much and losing performance.
  • Doing something iteratively for too many cycles, like repeatedly multiplying rotations without re-normalizing. Like trying to go in a circle by moving along a vector and rotating the vector slightly.

Beyond that, it's time to talk to a numerical analyst.

Nagle said:
Rounding options are different. Denormal handling is different.

Also GPU has:

RelaxedPrecision - allows 32-bit integer and 32-bit floating-point operations to execute with a relaxed precision of somewhere between 16 and 32 bits.

That means you can not even quarantee that your operation will be executed as 32 bit operation.

Also working with UINT on GPU can be buggged by this specs(not just denormals like in my screenshot)

intBitsToFloat and uintBitsToFloat return the encoding passed in parameter x as a floating-point value. If the encoding of a NaN is passed in x, it will not signal and the resulting value will be undefined. If the encoding of a floating point infinity is passed in parameter x, the resulting floating-point value is the corresponding (positive or negative) floating point infinity.

Try using smaller ammount of floating points(less decimal digits)?

Acosix said:
Try using smaller ammount of floating points(less decimal digits)?

This is a common misconception.

A floating point number has a fixed number of bits. You cannot use less or more bits other than by switching type (ie float versus double). A double always computes in double precision, you cannot drop some bits. The hardware doesn't support that.

Note this is normal behavior, a 32 bit integer performs 32 bit integer operations even if you only “use” a few bits of it (eg computing with 32 bit values 1 + 2 does perform a 32 bit addition at all times).

The number of digits printed when you convert to text (so you can read it) has no influence at all on the precision of computing.

Alberth said:

Acosix said:
Try using smaller ammount of floating points(less decimal digits)?

This is a common misconception.

A floating point number has a fixed number of bits. You cannot use less or more bits other than by switching type (ie float versus double). A double always computes in double precision, you cannot drop some bits. The hardware doesn't support that.

Note this is normal behavior, a 32 bit integer performs 32 bit integer operations even if you only “use” a few bits of it (eg computing with 32 bit values 1 + 2 does perform a 32 bit addition at all times).

The number of digits printed when you convert to text (so you can read it) has no influence at all on the precision of computing.

I was talking from the design point, not coding.

Alberth said:
You cannot use less or more bits other than by switching type

Maybe you can - using unpackHalf2x16 for 16 bit floats

Or you can emulate floats by INT even on GPU and it will work on most of modern top tier GPUs, using unpackUnorm2x16, unpackSnorm2x16, unpackUnorm4x8, and unpackSnorm4x8

The number of digits printed when you convert to text (so you can read it) has no influence at all on the precision of computing.

I meant to not overload the calculation.

@acosix What do you mean “from design point” and “not overload calculation”?

Current CPU hardware has IEEE float standards implemented, don't know what GPU has, but I would guess you cannot configure the number of bit mantissa and exponent, so at best there is a fixed set of available formats.

Now you can say from a design point of view “I want just 3 bits mantissa” (for example, assuming such a mantissa length isn't provided as a format above), but there is no hardware for you to compute with that precision (which is not unlikely) and you don't want to use slow software floating point code, then what does your idea mean? How do you compute anything float then?

This topic is closed to new replies.

Advertisement