This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of New status.

4378. Inconsistency between std::basic_string's data() and operator[] specification

Section: 27.4.3.6 [string.access] Status: New Submitter: Peter Bindels Opened: 2025-09-16 Last modified: 2025-11-12

Priority: 4

View all other issues in [string.access].

View all issues with New status.

Discussion:

From the working draft N5014, the specification for operator[] in 27.4.3.6 [string.access] p2 says:

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object to any value other than charT() leads to undefined behavior.

The specification for data() in 27.4.3.8.1 [string.accessors] p1 (and p4) says, however:

Returns: A pointer p such that p + i == addressof(operator[](i)) for each i in [0, size()].

The former implies that str[str.size()] is allowed to be the address of any null terminator, while the latter restricts it to only being the null terminator belonging to the string.

Suggested fix: Change wording around operator[] to

Returns: *(begin() + pos) if pos <= size(). The program shall not modify the value stored at size() to any value other than charT(); otherwise, the behavior is undefined.

This moves it inline with the data() specification. Given the hardened precondition that pos <= size() this does not change behavior for any in-contract access, and we do not define what the feature does when called with broken preconditions. I have been looking at the latter but that will be an EWG paper instead.

[2025-10-21; Reflector poll.]

Set priority to 4 after reflector poll.

"NAD. begin() + size() is not dereferenceable and should remain that way."

"Saying "if pos <= size() is redundant given the precondition above."

"The resolution removes any guarantee that the value at str[str.size()] is charT(). Furthermore, the premise of the issue is incorrect, returning the address of a different null terminator not belonging to the string would make traversing it with other string operations UB, so it has to return a reference to a terminator that's within the same array."

"*(begin() = size()) is UB, but could use *(data() + size()) instead. Personally I'd like *end() to be valid, but that's certainly LEWG business requiring a paper."

Previous resolution [SUPERSEDED]:

This wording is relative to N5014.

  1. Modify 27.4.3.6 [string.access] as indicated:

    constexpr const_reference operator[](size_type pos) const;
    constexpr       reference operator[](size_type pos);
    

    -1- Hardened preconditions: pos <= size() is true.

    -2- Returns: *(begin() + pos) if pos <= size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object to any value other than charT() leads to undefined behavior.

    -3- Throws: Nothing.

    -4- Complexity: Constant time.

    -?- Remarks The program shall not modify the value stored at size() to any value other than charT(); otherwise, the behavior is undefined

[2025-11-11; Jonathan provides new wording]

We say that basic_string is a contiguous container, which makes the addressof wording in c_str() and data() redundant. The front matter says that there's a null terminator present, so we can move the rule about not modifying the terminator there instead of repeating it in operator[] and c_str().

We can also permit modifying the string contents through const_cast<char*>(str.c_str())[0]. There's no reason for that to be undefined when const_cast<string&>(str)[0] and const_cast<string&>(str).data()[0] are both allowed. The only restriction should be on changing the null terminator. Changing any other characters through c_str() const or data() const is no different to changing them through the non-const data(), and does not need to cause undefined behaviour.

Proposed resolution:

  1. Modify 27.4.3.1 [basic.string.general] as indicated:

    -3- In all cases, [data(), data() + size()] is a valid range, data() + size() points at an object with value charT() (a "null terminator"), and size() <= capacity() is true. Non-const access to the null terminator is possible, e.g. using *(data()+size()), but the program has undefined behavior if the null terminator is modified to any value other than charT().

  2. Modify 27.4.3.6 [string.access] as indicated:

    constexpr const_reference operator[](size_type pos) const;
    constexpr reference       operator[](size_type pos);
    

    -1- Hardened Preconditions: pos <= size() is true.

    -2- Returns: *(data() + pos). *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object to any value other than charT() leads to undefined behavior.

    -3- Throws: Nothing.

    -4- Complexity: Constant time.

  3. Modify 27.4.3.8.1 [string.accessors] as indicated:

    constexpr const charT* c_str() const noexcept;
    constexpr const charT* data() const noexcept;
    constexpr charT* data() noexcept;
    

    -1- Returns: to_address(begin()). A pointer p such that p + i == addressof(operator[](i)) for each i in [0, size()].

    -2- Complexity: Constant time.

    -3- Remarks: The program shall not modify any of the values stored in the character array; otherwise, the behavior is undefined.

    constexpr charT* data() noexcept;
    

    -4- Returns: A pointer p such that p + i == addressof(operator[](i)) for each i in [0, size()].

    -5- Complexity: Constant time.

    -6- Remarks: The program shall not modify the value stored at p + size() to any value other than charT(); otherwise, the behavior is undefined.