This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of New status.

3603. Matching of null characters by regular expressions is underspecified

Section: 32.7.2 [re.regex.construct], 32.10 [re.alg] Status: New Submitter: Jonathan Wakely Opened: 2021-09-27 Last modified: 2021-10-14

Priority: 3

View other active issues in [re.regex.construct].

View all other issues in [re.regex.construct].

View all issues with New status.

Discussion:

ECMAScript says that \0 is an ordinary character and can be matched. POSIX says the opposite:

"The interfaces specified in POSIX.1-2017 do not permit the inclusion of a NUL character in an RE or in the string to be matched. If during the operation of a standard utility a NUL is included in the text designated to be matched, that NUL may designate the end of the text string for the purposes of matching."

So does that mean std::regex{"", 1, regex::basic} should throw an exception?

And std::regex_match(string{"a\0b", 3}, regex{"a.b", regex::basic}) should fail?

The POSIX rule is because those interfaces are specified with NTBS arguments, so there's no way to distinguish "a\0b" and "a". The C++ interfaces could allow it, but we never specify any divergence from POSIX, so presumably the rule still applies. Is that what was intended and is it what we want?

[2021-10-14; Reflector poll]

Set priority to 3 after reflector poll.

Proposed resolution: