2273. regex_match ambiguity

Section: 31.11.2 [re.alg.match] Status: C++17 Submitter: Howard Hinnant Opened: 2013-07-14 Last modified: 2017-07-30

Priority: 2

View all other issues in [re.alg.match].

View all issues with C++17 status.

Discussion:

31.11.2 [re.alg.match] p2 in describing regex_match says:

-2- Effects: Determines whether there is a match between the regular expression e, and all of the character sequence [first,last). The parameter flags is used to control how the expression is matched against the character sequence. Returns true if such a match exists, false otherwise.

It has come to my attention that different people are interpreting the first sentence of p2 in different ways:

  1. If a search of the input string using the regular expression e matches the entire input string, regex_match should return true.

  2. Search the input string using the regular expression e. Reject all matches that do not match the entire input string. If a such a match is found, return true.

The difference between these two subtly different interpretations is found using the following ECMAScript example:

std::regex re("Get|GetValue");

Using regex_search, this re can never match the input string "GetValue", because ECMA specifies that alternations are ordered, not greedy. As soon as "Get" is matched in the left alternation, the matching algorithm stops.

Using definition 1, regex_match would return false for an input string of "GetValue".

However definition 2 alters the grammar and appears equivalent to augmenting the regex with a trailing '$', which is an anchor that specifies, reject any matches which do not come at the end of the input sequence. So, using definition 2, regex_match would return true for an input string of "GetValue".

My opinion is that it would be strange to have regex_match return true for a string/regex pair that regex_search could never find. I.e. I favor definition 1.

John Maddock writes:

The intention was always that regex_match would reject any match candidate which didn't match the entire input string. So it would find GetValue in this case because the "Get" alternative had already been rejected as not matching. Note that the comparison with ECMA script is somewhat moot, as ECMAScript defines the regex grammar (the bit we've imported), it does not define anything like regex_match, nor do we import from ECMAScript the behaviour of that function. So IMO the function should behave consistently regardless of the regex dialect chosen. Saying "use awk regexes" doesn't cut it, because that changes the grammar in other ways.

(John favors definition 2).

We need to clarify 31.11.2 [re.alg.match]/p2 in one of these two directions.

[2014-06-21, Rapperswil]

AM: I think there's a clear direction and consensus we agree with John Maddock's position, and if noone else thinks we need the other function I won't ask for it.

Marshall Clow and STL to draft.

[2015-06-10, Marshall suggests concrete wording]

[2015-01-11, Telecon]

Move to Tenatatively Ready

Proposed resolution:

This wording is relative to N4527.

  1. Change 31.11.2 [re.alg.match]/2, as follows:

    template <class BidirectionalIterator, class Allocator, class charT, class traits>
      bool regex_match(BidirectionalIterator first, BidirectionalIterator last,
                       match_results<BidirectionalIterator, Allocator>& m,
                       const basic_regex<charT, traits>& e,
                       regex_constants::match_flag_type flags =
                         regex_constants::match_default);
    

    -1- Requires: The type BidirectionalIterator shall satisfy the requirements of a Bidirectional Iterator (24.2.6).

    -2- Effects: Determines whether there is a match between the regular expression e, and all of the character sequence [first,last). The parameter flags is used to control how the expression is matched against the character sequence. When determining if there is a match, only potential matches that match the entire character sequence are considered. Returns true if such a match exists, false otherwise. [Example:

    std::regex re("Get|GetValue");
    std::cmatch m;
    regex_search("GetValue", m, re);	// returns true, and m[0] contains "Get"
    regex_match ("GetValue", m, re);	// returns true, and m[0] contains "GetValue"
    regex_search("GetValues", m, re);	// returns true, and m[0] contains "Get"
    regex_match ("GetValues", m, re);	// returns false
    

    end example]

    […]