This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 115e. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2024-11-11


558. Excluded characters in universal character names

Section: 5.3  [lex.charset]     Status: CD1     Submitter: Daveed Vandevoorde     Date: 8 February 2006

[Moved to DR at October 2007 meeting.]

C99 and C++ differ in their approach to universal character names (UCNs).

Issue 248 already covers the differences in UCNs allowed for identifiers, but a more fundamental issue is that of UCNs that correspond to codes reserved by ISO 10676 for surrogate pair forms.

Specifically, C99 does not allow UCNs whose short names are in the range 0xD800 to 0xDFFF. I think C++ should have the same constraint. If someone really wants to place such a code in a character or string literal, they should use a hexadecimal escape sequence instead, for example:

    wchar_t  w1 = L'\xD900'; // Okay.
    wchar_t  w2 = L'\uD900'; // Error, not a valid character.

(Compare 6.4.3 paragraph 2 in ISO/IEC 9899/1999 with 5.3 [lex.charset] paragraph 2 in the C++ standard.)

Proposed resolution (October, 2007):

This issue is resolved by the adoption of paper J16/07-0030 = WG21 N2170.