Issue 202: unique() effects unclear when predicate not an equivalence relation

This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of CD1 status.

202. unique() effects unclear when predicate not an equivalence relation

Section: 26.7.9 [alg.unique] Status: CD1 Submitter: Andrew Koenig Opened: 2000-01-13 Last modified: 2016-01-28

Priority: Not Prioritized

View other active issues in [alg.unique].

View all other issues in [alg.unique].

View all issues with CD1 status.

Discussion:

What should unique() do if you give it a predicate that is not an equivalence relation? There are at least two plausible answers:

1. You can't, because 25.2.8 says that it it "eliminates all but the first element from every consecutive group of equal elements..." and it wouldn't make sense to interpret "equal" as meaning anything but an equivalence relation. [It also doesn't make sense to interpret "equal" as meaning ==, because then there would never be any sense in giving a predicate as an argument at all.]

2. The word "equal" should be interpreted to mean whatever the predicate says, even if it is not an equivalence relation (and in particular, even if it is not transitive).

The example that raised this question is from Usenet:

int f[] = { 1, 3, 7, 1, 2 };
int* z = unique(f, f+5, greater<int>());

If one blindly applies the definition using the predicate greater<int>, and ignore the word "equal", you get:

Eliminates all but the first element from every consecutive group of elements referred to by the iterator i in the range [first, last) for which *i > *(i - 1).

The first surprise is the order of the comparison. If we wanted to allow for the predicate not being an equivalence relation, then we should surely compare elements the other way: pred(*(i - 1), *i). If we do that, then the description would seem to say: "Break the sequence into subsequences whose elements are in strictly increasing order, and keep only the first element of each subsequence". So the result would be 1, 1, 2. If we take the description at its word, it would seem to call for strictly DEcreasing order, in which case the result should be 1, 3, 7, 2.

In fact, the SGI implementation of unique() does neither: It yields 1, 3, 7.

Proposed resolution:

Change 26.7.9 [alg.unique] paragraph 1 to:

For a nonempty range, eliminates all but the first element from every consecutive group of equivalent elements referred to by the iterator i in the range [first+1, last) for which the following conditions hold: *(i-1) == *i or pred(*(i-1), *i) != false.

Also insert a new paragraph, paragraph 2a, that reads: "Requires: The comparison function must be an equivalence relation."

[Redmond: discussed arguments for and against requiring the comparison function to be an equivalence relation. Straw poll: 14-2-5. First number is to require that it be an equivalence relation, second number is to explicitly not require that it be an equivalence relation, third number is people who believe they need more time to consider the issue. A separate issue: Andy Sawyer pointed out that "i-1" is incorrect, since "i" can refer to the first iterator in the range. Matt provided wording to address this problem.]

[Curaçao: The LWG changed "... the range (first, last)..." to "... the range [first+1, last)..." for clarity. They considered this change close enough to editorial to not require another round of review.]

Rationale:

The LWG also considered an alternative resolution: change 26.7.9 [alg.unique] paragraph 1 to:

For a nonempty range, eliminates all but the first element from every consecutive group of elements referred to by the iterator i in the range (first, last) for which the following conditions hold: *(i-1) == *i or pred(*(i-1), *i) != false.

Also insert a new paragraph, paragraph 1a, that reads: "Notes: The comparison function need not be an equivalence relation."

Informally: the proposed resolution imposes an explicit requirement that the comparison function be an equivalence relation. The alternative resolution does not, and it gives enough information so that the behavior of unique() for a non-equivalence relation is specified. Both resolutions are consistent with the behavior of existing implementations.