This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of New status.
regex
FSM is underspecifiedSection: 28.6.12 [re.grammar] Status: New Submitter: Hubert Tong Opened: 2017-06-25 Last modified: 2017-07-12
Priority: 4
View other active issues in [re.grammar].
View all other issues in [re.grammar].
View all issues with New status.
Discussion:
In N4660 subclause 31.13 [re.grammar] paragraph 5:
The productions
ClassAtomExClass
,ClassAtomCollatingElement
andClassAtomEquivalence
provide functionality equivalent to that of the same features in regular expressions in POSIX.
The broadness of the above statement makes it sound like it is merely a statement of intent; however, this appears to
be a necessary normative statement insofar as identifying the general semantics to be associated with the syntactic
forms identified. In any case, if it is meant for ClassAtomCollatingElement
to provide functionality equivalent
to a collating symbol in a POSIX bracket expression, multi-character collating elements need to be considered.
The behavior of the internal finite state machine representation when used to match a sequence of characters is as described in ECMA-262. The behavior is modified according to any
match_flag_type
flags specified when using the regular expression object in one of the regular expression algorithms. The behavior is also localized by interaction with the traits class template parameter as follows: [bullets 14.1 to 14.4]
In none of the bullets does the wording handle multi-character collating elements in a clear manner:
14.1 deals in characters.
14.2 deals in characters (traits_inst.translate
accepts only a single character).
14.3 might handle a multi-character collating element; however, there is no specification of how such a collating element is to be identified from the sequence of characters. Additionally, the definition of primary equivalence class specifies that it is a set of characters (not of collating elements).
14.4 deals in characters.
The ECMA-262 specification for ClassRanges also deals in characters.
[2017-07 Toronto Monday issue prioritization]
Priority 4
Proposed resolution: