This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 110. See for the official list.


238. Precision and accuracy constraints on floating point

Section: Clause 7  [expr]     Status: CD4     Submitter: Christophe de Dinechin     Date: 31 Jul 2000

[Adopted at the February, 2016 meeting.]

It is not clear what constraints are placed on a floating point implementation by the wording of the Standard. For instance, is an implementation permitted to generate a "fused multiply-add" instruction if the result would be different from what would be obtained by performing the operations separately? To what extent does the "as-if" rule allow the kinds of optimizations (e.g., loop unrolling) performed by FORTRAN compilers?

Proposed resolution (September, 2015):

Change 6.8.2 [basic.fundamental] paragraph 8 as follows:

There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. [Note: This International Standard imposes no requirements on the accuracy of floating-point operations; see also 17.3 [support.limits]. —end note] Integral and floating types are collectively called arithmetic types. Specializations of the standard library template std::numeric_limits (17.3 [support.limits]) shall specify the maximum and minimum values of each arithmetic type for an implementation.