This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of Resolved status.
regex_iterator
and join_view
don't work together very wellSection: 28.6.11 [re.iter], 25.7.14 [range.join] Status: Resolved Submitter: Barry Revzin Opened: 2022-05-12 Last modified: 2023-03-23
Priority: 2
View all other issues in [re.iter].
View all issues with Resolved status.
Discussion:
Consider this example (from StackOverflow):
#include <ranges> #include <regex> #include <iostream> int main() { char const text[] = "Hello"; std::regex regex{"[a-z]"}; auto lower = std::ranges::subrange( std::cregex_iterator( std::ranges::begin(text), std::ranges::end(text), regex), std::cregex_iterator{} ) | std::views::join | std::views::transform([](auto const& sm) { return std::string_view(sm.first, sm.second); }); for (auto const& sv : lower) { std::cout << sv << '\n'; } }
This example seems sound, having lower
be a range of string_view
that should refer
back into text
, which is in scope for all this time. The std::regex
object is also
in scope for all this time.
Yet, if run this through address sanitizer, this blows up in the first call to the dereference operator
of the underlying transform_view
's iterator with heap-use-after-free.
The problem here is ultimately that regex_iterator
is a stashing iterator (it has a member
match_results
) yet advertises itself as a forward_iterator
(despite violating
24.3.5.5 [forward.iterators] p6 and 24.3.4.11 [iterator.concept.forward] p3.
Then, join_view
's iterator stores an outer iterator (the regex_iterator
) and an
inner_iterator
(an iterator into the container that the regex_iterator
stashes).
Copying that iterator effectively invalidates it — since the new iterator's inner iterator will
refer to the old iterator's outer iterator's container. These aren't (and can't be) independent copies.
In this particular example, join_view
's begin
iterator is copied into the
transform_view
's iterator, and then the original is destroyed (which owns the container that
the new inner iterator still points to), which causes us to have a dangling iterator.
Note that the example is well-formed in libc++ because libc++ moves instead of copying an iterator, which happens to work. But I can produce other non-transform-view related examples that fail.
This is actually two different problems:
regex_iterator
is really an input iterator, not a forward iterator. It does not meet either
the C++17 or the C++20 forward iterator requirements.
join_view
can't handle stashing iterators, and would need to additionally store the outer
iterator in a non-propagating-cache for input ranges (similar to how it already potentially stores the
inner iterator in a non-propagating-cache).
(So potentially this could be two different LWG issues, but it seems nicer to think of them together.)
[2022-05-17; Reflector poll]
Set priority to 2 after reflector poll.
[Kona 2022-11-08; Move to Open]
Tim to write a paper
[2023-01-16; Tim comments]
The paper P2770R0 is provided with proposed wording.
[2023-03-22 Resolved by the adoption of P2770R0 in Issaquah. Status changed: Open → Resolved.]
Proposed resolution: