This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 116a. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2024-12-19


232. Is indirection through a null pointer undefined behavior?

Section: 7.6.2.2  [expr.unary.op]     Status: NAD     Submitter: Mike Miller     Date: 5 Jun 2000

At least a couple of places in the IS state that indirection through a null pointer produces undefined behavior: 6.9.1 [intro.execution] paragraph 4 gives "dereferencing the null pointer" as an example of undefined behavior, and 9.3.4.3 [dcl.ref] paragraph 4 (in a note) uses this supposedly undefined behavior as justification for the nonexistence of "null references."

However, 7.6.2.2 [expr.unary.op] paragraph 1, which describes the unary "*" operator, does not say that the behavior is undefined if the operand is a null pointer, as one might expect. Furthermore, at least one passage gives dereferencing a null pointer well-defined behavior: 7.6.1.8 [expr.typeid] paragraph 2 says

If the lvalue expression is obtained by applying the unary * operator to a pointer and the pointer is a null pointer value (7.3.12 [conv.ptr]), the typeid expression throws the bad_typeid exception (17.7.5 [bad.typeid]).

This is inconsistent and should be cleaned up.

Bill Gibbons:

At one point we agreed that dereferencing a null pointer was not undefined; only using the resulting value had undefined behavior.

For example:

    char *p = 0;
    char *q = &*p;

Similarly, dereferencing a pointer to the end of an array should be allowed as long as the value is not used:

    char a[10];
    char *b = &a[10];   // equivalent to "char *b = &*(a+10);"

Both cases come up often enough in real code that they should be allowed.

Mike Miller:

I can see the value in this, but it doesn't seem to be well reflected in the wording of the Standard. For instance, presumably *p above would have to be an lvalue in order to be the operand of "&", but the definition of "lvalue" in 7.2.1 [basic.lval] paragraph 2 says that "an lvalue refers to an object." What's the object in *p? If we were to allow this, we would need to augment the definition to include the result of dereferencing null and one-past-the-end-of-array.

Tom Plum:

Just to add one more recollection of the intent: I was very happy when (I thought) we decided that it was only the attempt to actually fetch a value that creates undefined behavior. The words which (I thought) were intended to clarify that are the first three sentences of the lvalue-to-rvalue conversion, 7.3.2 [conv.lval]:

An lvalue (7.2.1 [basic.lval]) of a non-function, non-array type T can be converted to an rvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the lvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

In other words, it is only the act of "fetching", of lvalue-to-rvalue conversion, that triggers the ill-formed or undefined behavior. Simply forming the lvalue expression, and then for example taking its address, does not trigger either of those errors. I described this approach to WG14 and it may have been incorporated into C 1999.

Mike Miller:

If we admit the possibility of null lvalues, as Tom is suggesting here, that significantly undercuts the rationale for prohibiting "null references" -- what is a reference, after all, but a named lvalue? If it's okay to create a null lvalue, as long as I don't invoke the lvalue-to-rvalue conversion on it, why shouldn't I be able to capture that null lvalue as a reference, with the same restrictions on its use?

I am not arguing in favor of null references. I don't want them in the language. What I am saying is that we need to think carefully about adopting the permissive approach of saying that it's all right to create null lvalues, as long as you don't use them in certain ways. If we do that, it will be very natural for people to question why they can't pass such an lvalue to a function, as long as the function doesn't do anything that is not permitted on a null lvalue.

If we want to allow &*(p=0), maybe we should change the definition of "&" to handle dereferenced null specially, just as typeid has special handling, rather than changing the definition of lvalue to include dereferenced nulls, and similarly for the array_end+1 case. It's not as general, but I think it might cause us fewer problems in the long run.

Notes from the October 2003 meeting:

See also issue 315, which deals with the call of a static member function through a null pointer.

We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior.

Proposed resolution (October, 2004):

(Note: the resolution of issue 453 also resolves part of this issue.)

  1. Add the indicated words to 7.2.1 [basic.lval] paragraph 2:

    An lvalue refers to an object or function or is an empty lvalue (7.6.2.2 [expr.unary.op]).
  2. Add the indicated words to 7.6.2.2 [expr.unary.op] paragraph 1:

    The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points, if any. If the pointer is a null pointer value (7.3.12 [conv.ptr]) or points one past the last element of an array object (7.6.6 [expr.add]), the result is an empty lvalue and does not refer to any object or function. An empty lvalue is not modifiable. If the type of the expression is “pointer to T,” the type of the result is “T.” [Note: a pointer to an incomplete type (other than cv void) can be dereferenced. The lvalue thus obtained can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to an rvalue, see 7.3.2 [conv.lval].—end note]
  3. Add the indicated words to 7.3.2 [conv.lval] paragraph 1:

    If the object to which the lvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, or if the lvalue is an empty lvalue (7.6.2.2 [expr.unary.op]), a program that necessitates this conversion has undefined behavior.
  4. Change 6.9.1 [intro.execution] as indicated:

    Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer division by zero).

Note (March, 2005):

The 10/2004 resolution interacts with the resolution of issue 73. We added wording to 6.8.4 [basic.compound] paragraph 3 to the effect that a pointer containing the address one past the end of an array is considered to “point to” another object of the same type that might be located there. The 10/2004 resolution now says that it would be undefined behavior to use such a pointer to fetch the value of that object. There is at least the appearance of conflict here; it may be all right, but it at needs to be discussed further.

Notes from the April, 2005 meeting:

The CWG agreed that there is no contradiction between this direction and the resolution of issue 73. However, “not modifiable” is a compile-time concept, while in fact this deals with runtime values and thus should produce undefined behavior instead. Also, there are other contexts in which lvalues can occur, such as the left operand of . or .*, which should also be restricted. Additional drafting is required.

(See also issue 1102.)

CWG 2023-11-06

There is no consensus to pursue the introduction of empty lvalues, without prejudice to a potential future paper addressed to EWG. The implicit undefined behavior in 7.6.2.2 [expr.unary.op] paragraph 1 is handled in issue 2823.