This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 115f. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.
2024-12-06
[Voted into WP at March, 2010 meeting as document N3077.]
The specification of raw string literals interacts poorly with the specification of preprocessing tokens. The grammar in 5.4 [lex.pptoken] has a production reading
This is echoed in the max-munch rule in paragraph 3:
If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail.
This raises questions about the handling of raw string literals. Consider, for instance,
#define R "x" const char* s = R"y";
The character sequence R"y" does not satisfy the syntactic requirements for a raw string. Should it be diagnosed as an ill-formed attempt at a raw string, or should it be well-formed, interpreting R as a preprocessor token that is a macro name and thus initializing s with a pointer to the string "xy"?
For another example, consider:
#define R "]" const char* x = R"foo[";
Presumably this means that the entire rest of the file must be scanned for the characters ]foo" and, if they are not found, macro-expand R and initialize x with a pointer to the string "]foo[". Is this the intended result?
Finally, does the requirement in 5.13.5 [lex.string] that
A d-char-sequence shall consist of at most 16 characters.
mean that
#define R "x" const char* y = R"12345678901234567[y]12345678901234567";
is ill-formed, or a valid initialization of y with a pointer to the string "x12345678901234567[y]12345678901234567"?
Additional note, June, 2009:
The translation of characters that are not in the basic source character set into universal-character-names in translation phase 1 raises an additional problem: each such character will occupy at least six of the 16 r-chars that are permitted. Thus, for example, R"@@@[]@@@" is ill-formed because @@@ becomes \u0040\u0040\u0040, which is 18 characters.
One possibility for addressing this might be to disallow the \ character completely as an d-char, which would have the effect of restricting r-chars to the basic source character set.
Proposed resolution (October, 2009):
Change the grammar in 5.13.5 [lex.string] as follows:
Change 5.13.5 [lex.string] paragraph 2 as follows:
A string literal that has an R in the prefix is a raw string literal. The d-char-sequence serves as a delimiter. The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence. A d-char-sequence shall consist of at most 16 characters. If the input stream contains a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", those characters are considered to begin a raw string literal even if that literal is not well-formed. [Example:
#define R "x" const char* s = R"y"; // ill-formed raw string, not "x" "y"
—end example]