This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 115e. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2024-11-11


350. signed char underlying representation for objects

Section: 6.8  [basic.types]     Status: open     Submitter: Noah Stein     Date: 16 April 2002     Liaison: WG14

Sent in by David Abrahams:

Yes, and to add to this tangent, 6.8.2 [basic.fundamental] paragraph 1 states "Plain char, signed char, and unsigned char are three distinct types." Strangely, 6.8 [basic.types] paragraph 2 talks about how "... the underlying bytes making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value." I guess there's no requirement that this copying work properly with signed chars!

Notes from October 2002 meeting:

We should do whatever C99 does. 6.5p6 of the C99 standard says "array of character type", and "character type" includes signed char (6.2.5p15), and 6.5p7 says "character type". But see also 6.2.6.1p4, which mentions (only) an array of unsigned char.

Proposed resolution (April 2003):

Change 6.7.3 [basic.life] bullet 5.3 from

to

Change 6.7.3 [basic.life] bullet 6.3 from

to

Change the beginning of 6.8 [basic.types] paragraph 2 from

For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (6.7.1 [intro.memory]) making up the object can be copied into an array of char or unsigned char.

to

For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (6.7.1 [intro.memory]) making up the object can be copied into an array of byte-character type.

Add the indicated text to 6.8.2 [basic.fundamental] paragraph 1:

Objects declared as characters (char) shall be large enough to store any member of the implementation's basic character set. If a character from this set is stored in a character object, the integral value of that character object is equal to the value of the single character literal form of that character. It is implementation-defined whether a char object can hold negative values. Characters can be explicitly declared unsigned or signed. Plain char, signed char, and unsigned char are three distinct types, called the byte-character types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (6.8 [basic.types]); that is, they have the same object representation. For byte-character types, all bits of the object representation participate in the value representation. For unsigned byte-character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types. In any particular implementation, a plain char object can take on either the same values as a signed char or an unsigned char; which one is implementation-defined.

Change 7.2.1 [basic.lval] paragraph 15 last bullet from

to

Notes from October 2003 meeting:

It appears that in C99 signed char may have padding bits but no trap representation, whereas in C++ signed char has no padding bits but may have -0. A memcpy in C++ would have to copy the array preserving the actual representation and not just the value.

March 2004: The liaisons to the C committee have been asked to tell us whether this change would introduce any unnecessary incompatibilities with C.

Notes from October 2004 meeting:

The C99 Standard appears to be inconsistent in its requirements. For example, 6.2.6.1 paragraph 4 says:

The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

On the other hand, 6.2 paragraph 6 says,

If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.

Mike Miller will investigate further.

Proposed resolution (February, 2010):

  1. Change 6.7.3 [basic.life] bullet 5.4 as follows:

  2. ...The program has undefined behavior if:

  3. Change 6.7.3 [basic.life] bullet 6.4 as follows:

  4. ...The program has undefined behavior if:

  5. Change 6.8 [basic.types] paragraph 2 as follows:

  6. For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (6.7.1 [intro.memory]) making up the object can be copied into an array of char or unsigned char a byte-character type (6.8.2 [basic.fundamental]).39 If the content of the that array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value. [Example:...
  7. Change 6.8.2 [basic.fundamental] paragraph 1 as follows:

  8. ...Characters can be explicitly declared unsigned or signed. Plain char, signed char, and unsigned char are three distinct types, called the byte-character types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (6.7.6 [basic.align]); that is, they have the same object representation. For byte-character types, all bits of the object representation participate in the value representation. For unsigned character types unsigned char, all possible bit patterns of the value representation represent numbers...
  9. Change 7.2.1 [basic.lval] paragraph 15 final bullet as follows:

  10. If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined 52

  11. Change 6.7.6 [basic.align] paragraph 6 as follows:

  12. The alignment requirement of a complete type can be queried using an alignof expression (7.6.2.6 [expr.alignof]). Furthermore, the byte-character types (6.8.2 [basic.fundamental]) char, signed char, and unsigned char shall have the weakest alignment requirement. [Note: this enables the byte-character types to be used as the underlying type for an aligned memory area (9.12.2 [dcl.align]). —end note]
  13. Change 7.6.2.8 [expr.new] paragraph 10 as follows:

  14. ...For arrays of char and unsigned char a byte-character type (6.8.2 [basic.fundamental]), the difference between the result of the new-expression and the address returned by the allocation function shall be an integral multiple of the strictest fundamental alignment requirement (6.7.6 [basic.align]) of any object type whose size is no greater than the size of the array being created. [Note: Because allocation functions are assumed to return pointers to storage that is appropriately aligned for objects of any type with fundamental alignment, this constraint on array allocation overhead permits the common idiom of allocating byte-character arrays into which objects of other types will later be placed. —end note]

Notes from the March, 2010 meeting:

The CWG was not convinced that there was a need to change the existing specification at this time. Some were concerned that there might be implementation difficulties with giving signed char the requisite semantics; implementations for which that is true can currently make char equivalent to unsigned char and avoid those problems, but the suggested change would undermine that strategy.

Additional note, November, 2014:

There is now the term “narrow character type” that should be used instead of “byte-character type”.