This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 118d. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2025-10-28


3094. Rework phases for string literal concatenation and token formation

Section: 5.2  [lex.phases]     Status: review     Submitter: US     Date: 2025-10-01

N5028 comment US 6-020
N5028 comment US 7-019

Merge phases 5 and 6, because both deal with the same contiguous sequences of string literals. Then, move the conversion of pp-tokens to tokens into a new phase 6.

Possible resolution:

  1. Change in 5.2 [lex.phases] paragraph 5 through 7 as follows:

    5. For a sequence of two or more adjacent string-literal preprocessing tokens, a common encoding-prefix is determined as specified in 5.13.5 [lex.string]. Each such string-literal preprocessing token is then considered to have that common encoding-prefix. 6. Adjacent Then, adjacent string-literal preprocessing tokens are concatenated (5.13.5 [lex.string]).

    7. 6. Each preprocessing token is converted into a token (5.10 [lex.token]).

    7. The resulting tokens constitute a translation unit and are syntactically and semantically analyzed as a translation-unit (6.7 [basic.link]) and translated. ...

  2. Change in 5.5 [lex.pptoken] paragraph 1 as follows:

    A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6 5.
  3. Change in 5.8 [lex.operators] paragraph 1 as follows:

    ... Each operator-or-punctuator is converted to a single token in translation phase 7 6 (5.2 [lex.phases]).
  4. Change in 5.13.5 [lex.string] paragraph 8 as follows:

    In translation phase 6 5 (5.2 [lex.phases]), adjacent string-literals are concatenated. The lexical structure and grouping of the contents of the individual string-literals is retained.
  5. Change in 5.13.9 [lex.ext] paragraph 8 as follows:

    In translation phase 6 5 (5.2 [lex.phases]), adjacent string-literals are concatenated and user-defined-string-literals are considered string-literals for that purpose. During concatenation, ud-suffix es are removed and ignored and the concatenation process occurs as described in 5.13.5 [lex.string]. At the end of phase 6 5, if a string-literal is the result of a concatenation involving at least one user-defined-string-literal, all the participating user-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.
  6. Change in 21.4.16 [meta.reflection.define.aggregate] bullet 5.2 as follows (addresses alternative tokens (e.g. xor) and exceptions instead of evaluation failure):

    Throws: meta::exception unless the following conditions are met:
    • ...
    • if options.name contains a value, then:
      • holds_alternative<u8string>(options.name->contents ) is true and get<u8string>( options.name->contents ) contains the spelling of a valid token that is an identifier identifier (5.11 [lex.name]) that is not a keyword (5.12 [lex.key]) when interpreted with UTF-8, or
      • holds_alternative<string>(options.name->contents ) is true and get<string>(options.name->contents ) contains the spelling of a valid token that is an identifier identifier (5.11 [lex.name]) that is not a keyword (5.12 [lex.key]) when interpreted with the ordinary literal encoding;
      [Note 3: The name corresponds to the spelling of an identifier token after phase 6 of translation (5.2 [lex.phases]). Lexical constructs like universal-character-names (5.3.2 [lex.universal.char]) are not processed and will cause evaluation to fail. For example, R"(\u03B1)" is an invalid identifier and is not interpreted as "a". —end note]
    • ...