This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of Resolved status.
regex_iterator and join_view don't work together very wellSection: 28.6.11 [re.iter], 25.7.14 [range.join] Status: Resolved Submitter: Barry Revzin Opened: 2022-05-12 Last modified: 2023-03-23
Priority: 2
View all other issues in [re.iter].
View all issues with Resolved status.
Discussion:
Consider this example (from StackOverflow):
#include <ranges>
#include <regex>
#include <iostream>
int main() {
char const text[] = "Hello";
std::regex regex{"[a-z]"};
auto lower = std::ranges::subrange(
std::cregex_iterator(
std::ranges::begin(text),
std::ranges::end(text),
regex),
std::cregex_iterator{}
)
| std::views::join
| std::views::transform([](auto const& sm) {
return std::string_view(sm.first, sm.second);
});
for (auto const& sv : lower) {
std::cout << sv << '\n';
}
}
This example seems sound, having lower be a range of string_view that should refer
back into text, which is in scope for all this time. The std::regex object is also
in scope for all this time.
transform_view's iterator with heap-use-after-free.
The problem here is ultimately that regex_iterator is a stashing iterator (it has a member
match_results) yet advertises itself as a forward_iterator (despite violating
24.3.5.5 [forward.iterators] p6 and 24.3.4.11 [iterator.concept.forward] p3.
Then, join_view's iterator stores an outer iterator (the regex_iterator) and an
inner_iterator (an iterator into the container that the regex_iterator stashes).
Copying that iterator effectively invalidates it — since the new iterator's inner iterator will
refer to the old iterator's outer iterator's container. These aren't (and can't be) independent copies.
In this particular example, join_view's begin iterator is copied into the
transform_view's iterator, and then the original is destroyed (which owns the container that
the new inner iterator still points to), which causes us to have a dangling iterator.
Note that the example is well-formed in libc++ because libc++ moves instead of copying an iterator,
which happens to work. But I can produce other non-transform-view related examples that fail.
This is actually two different problems:
regex_iterator is really an input iterator, not a forward iterator. It does not meet either
the C++17 or the C++20 forward iterator requirements.
join_view can't handle stashing iterators, and would need to additionally store the outer
iterator in a non-propagating-cache for input ranges (similar to how it already potentially stores the
inner iterator in a non-propagating-cache).
(So potentially this could be two different LWG issues, but it seems nicer to think of them together.)
[2022-05-17; Reflector poll]
Set priority to 2 after reflector poll.
[Kona 2022-11-08; Move to Open]
Tim to write a paper
[2023-01-16; Tim comments]
The paper P2770R0 is provided with proposed wording.
[2023-03-22 Resolved by the adoption of P2770R0 in Issaquah. Status changed: Open → Resolved.]
Proposed resolution: