CWG Issue 3094

This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 119a. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2025-12-20

3094. Rework phases for string literal concatenation and token formation

Section: 5.2 [lex.phases] Status: accepted Submitter: US Date: 2025-10-01

[Accepted at the November, 2025 meeting.]

N5028 comment US 6-020
N5028 comment US 7-019

Merge phases 5 and 6, because both deal with the same contiguous sequences of string literals. Then, move the conversion of pp-tokens to tokens into a new phase 6.

Proposed resolution (approved by CWG 2025-11-04):

Change in 5.2 [lex.phases] paragraph 5 through 7 as follows:

5. For a sequence of two or more adjacent string-literal preprocessing tokens, a common encoding-prefix is determined as specified in 5.13.5 [lex.string]. Each such string-literal preprocessing token is then considered to have that common encoding-prefix. ~~6. Adjacent~~ Then, adjacent string-literal preprocessing tokens are concatenated (5.13.5 [lex.string]).

7. 6. Each preprocessing token is converted into a token (5.10 [lex.token]).

7. The ~~resulting~~ tokens constitute a translation unit and are syntactically and semantically analyzed as a translation-unit (6.7 [basic.link]) and translated. ...
Change in 5.5 [lex.pptoken] paragraph 1 as follows:

A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6 5.
Change in 5.8 [lex.operators] paragraph 1 as follows:

... Each operator-or-punctuator is converted to a single token in translation phase 7 6 (5.2 [lex.phases]).
Change in 5.13.5 [lex.string] paragraph 8 as follows:

In translation phase 6 5 (5.2 [lex.phases]), adjacent string-literals are concatenated. The lexical structure and grouping of the contents of the individual string-literals is retained.
Change in 5.13.9 [lex.ext] paragraph 8 as follows:

In translation phase 6 5 (5.2 [lex.phases]), adjacent string-literals are concatenated and user-defined-string-literals are considered string-literals for that purpose. During concatenation, ud-suffix es are removed and ignored and the concatenation process occurs as described in 5.13.5 [lex.string]. At the end of phase 6 5, if a string-literal is the result of a concatenation involving at least one user-defined-string-literal, all the participating user-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.
Change in 21.4.16 [meta.reflection.define.aggregate] bullet 5.2 as follows (addresses alternative tokens (e.g. xor) and exceptions instead of evaluation failure):
Throws: meta::exception unless the following conditions are met:
- ...
- if options.name contains a value, then:
  - holds_alternative<u8string>(options.name->contents ) is true and get<u8string>( options.name->contents ) contains the spelling of a valid token that is an identifier ~~identifier~~ (5.11 [lex.name]) ~~that is not a keyword (5.12 [lex.key])~~ when interpreted with UTF-8, or
  - holds_alternative<string>(options.name->contents ) is true and get<string>(options.name->contents ) contains the spelling of a valid token that is an identifier ~~identifier~~ (5.11 [lex.name]) ~~that is not a keyword (5.12 [lex.key])~~ when interpreted with the ordinary literal encoding;
  [Note 3: ~~The name corresponds to the spelling of an identifier token after phase 6 of translation (5.2 [lex.phases]).~~ Lexical constructs like universal-character-names (5.3.2 [lex.universal.char]) are not processed ~~and will cause evaluation to fail~~. For example, R"(\u03B1)" is an invalid identifier and is not interpreted as "a". —end note]
- ...