This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 116a. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2024-12-19


1999. Representation of source characters as universal-character-names

Section: 5.2  [lex.phases]     Status: CD4     Submitter: Richard Smith     Date: 2014-09-09

[Moved to DR at the May, 2015 meeting.]

According to 5.2 [lex.phases] paragraph 1, first phase,

Any source file character not in the basic source character set (5.3.1 [lex.charset]) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.)

This wording is obviously not intended to exclude the use of characters with code points larger than 0xffff, but the reference to “the \uXXXX notation” might suggest that the \Uxxxxxxxx form is not allowed.

Proposed resolution (April, 2015):

Change 5.2 [lex.phases] paragraph 1 number 1 as follows:

...(An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e. e.g., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.)