This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 114b. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2024-05-06


2640. Allow more characters in an n-char sequence

Section: 5.3  [lex.charset]     Status: C++23     Submitter: US     Date: 2022-11-03

P2720R0 comment US 1-028

[Accepted at the November, 2022 meeting.]

The n-char grammar term is defined to match only the Latin uppercase, Latin digit, hyphen and space characters. This results in \N{ABC} matching named-universal-character while \N{abc} does not. This leads to programs like the following being unexpectedly well-formed because the \N{abc} sequence is lexed as the preprocessing token sequence , N, {, abc, }. The expansion of macro a then leads to the token sequence being passed as an argument to macro z where it is discarded.

  #define z(x) 0
  #define a z(
  int x = a\N{abc});

Changes to make the above program ill-formed would provide two benefits:

Proposed resolution (approved by CWG 2022-11-07):

Change the grammar in 5.3 [lex.charset] paragraph 3 as follows:

n-char:
     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
     0 1 2 3 4 5 6 7 8 9
     U+002d hyphen-minus
     U+0020 space
     any member of the translation character set except the U+007D RIGHT CURLY BRACKET or new-line character