CWG Issue 2779

This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 117b. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2025-08-11

2779. Restrictions on the ordinary literal encoding

Section: 5.3.1 [lex.charset] Status: open Submitter: Jim X Date: 2023-03-28

(From submission #285.)

There are no restrictions on the implementation's choice of ordinary literal encoding. However, there is an implicit assumption that a code unit value must fit into a char.

Tangentially related to that, "cannot be encoded as a single code unit" could be interpreted as referring to the values of the code units as opposed to the fact that multiple code units might be needed.

Possible resolution:

Change in 5.3.1 [lex.charset] paragraph 8 as follows and add to the index of implementation-defined behavior:

A code unit is an integer value of character type (6.9.2 [basic.fundamental]). Characters in a character-literal other than a multicharacter ~~or non-encodable character~~ literal or in a string-literal are encoded as a sequence of one or more code units, as determined by the encoding-prefix (5.13.3 [lex.ccon], 5.13.5 [lex.string]); this is termed the respective literal encoding. The ordinary literal encoding is the implementation-defined encoding applied to an ordinary character or string literal; its code units are of type unsigned char. The wide literal encoding is the implementation-defined encoding applied to a wide character or string literal; its code units are of type wchar_t.
Change in 5.13.3 [lex.ccon] bullet 3.1 as follows:
- A character-literal with a c-char-sequence consisting of a single basic-c-char , simple-escape-sequence, or universal-character-name is the code unit value of the specified character as encoded in the literal's associated character encoding. If the specified character lacks representation in the literal's associated character encoding or if it ~~cannot be encoded as a single code unit~~ is encoded with multiple code units, then the program is ill-formed.
- ...