This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 112e. See for the official list.


2752. Excess-precision floating-point literals

Section: 5.13.4  [lex.fcon]     Status: open     Submitter: Peter Dimov     Date: 2023-06-29     Liaison: EWG


  int main()
    constexpr auto x = 3.14f;
    assert( x == 3.14f );         // can fail?
    static_assert( x == 3.14f );  // can fail?

Can a conforming implementation represent a floating-point literal with excess precision, causing the comparisons to fail?

Subclause 5.13.4 [lex.fcon] paragraph 3 specifies:

If the scaled value is not in the range of representable values for its type, the program is ill-formed. Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

This phrasing leaves little leeway for excess precision. In contrast, C23 (WG14 N3096) specifies in section paragraph 6:

The values of floating constants may be represented in greater range and precision than that required by the type (determined by the suffix); the types are not changed thereby. ...

Subclause 7.1 [expr.pre] paragraph 6 allows excess precision for floating-point computations (including their operands):

The values of the floating-point operands and the results of floating-point expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby. [ Footnote: The cast and assignment operators must still perform their specific conversions as described in [expr.type.conv], 7.6.3 [expr.cast], [expr.static.cast] and 7.6.19 [expr.ass]. -- end footnote ]

Taken together, that means that 314.f / 100.f can be computed and represented more precisely than 3.14f, which is hard to justify. The footnote appears to imply that (float)3.14f is required to yield a value with float precision, but that conversion (eventually) ends up at 9.4.1 [dcl.init.general] bullet 16.9:

This phrasing leaves no permission to discard excess precision when converting from a float value to type float ("... is the value...").

However, if initialization is intended to drop excess precision, then an overloaded operator returning float can never behave like a built-in operation with excess precision, because returning a value means initializing the return value.

The C++ standard library inherits the FLT_EVAL_METHOD macro from the C standard library. C23 (WG14 N3096) specifies it as follows in section

0 evaluate all operations and constants just to the range and precision of the type;
1 evaluate operations and constants of type float and double to the range and precision of the double type, evaluate long double operations and constants to the range and precision of the long double type;
2 evaluate all operations and constants to the range and precision of the long double type.

Taken together, a conforming C++ implementation cannot define FLT_EVAL_METHOD to 1 or 2, because literals (= "constants") cannot be represented with excess precision in C++.

Additional notes (June, 2023)

Forwarded to EWG via cplusplus/papers#1584, by decision of the CWG chair.