This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 113d. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2024-03-20


1656. Encoding of numerically-escaped characters

Section: 5.13.3  [lex.ccon]     Status: CD6     Submitter: Mike Miller     Date: 2013-04-30

[Accepted at the November, 2020 meeting as part of paper P2029R4.]

According to 5.13.3 [lex.ccon] paragraph 4,

The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character. The escape \xhhh consists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character. There is no limit to the number of digits in a hexadecimal sequence. A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for literals with no prefix), char16_t (for literals prefixed by 'u'), char32_t (for literals prefixed by 'U'), or wchar_t (for literals prefixed by 'L').

It is not clearly stated whether the “desired character” being specified reflects the source or the target encoding. This particularly affects UTF-8 string literals (5.13.5 [lex.string] paragraph 7) :

A string literal that begins with u8, such as u8"asdf", is a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8.

For example, assuming the source encoding is Latin-1, is u8"\xff" supposed to specify a three-byte string whose first two bytes are 0xc3 0xbf (the UTF-8 encoding of \u00ff) or a two-byte string whose first byte has the value 0xff? (At least some current implementations assume the latter interpretation.)

Notes from the September, 2013 meeting:

The second interpretation (that the escape sequence specifies the execution-time code unit) is intended.