1200. "surprising" char_traits<T>::int_type requirements

Section: 24.2.2 [char.traits.typedefs] Status: NAD Submitter: Sean Hunt Opened: 2009-09-03 Last modified: 2016-02-10

Priority: Not Prioritized

View all other issues in [char.traits.typedefs].

View all issues with NAD status.

Discussion:

The footnote for int_type in 24.2.2 [char.traits.typedefs] says that

If eof() can be held in char_type then some iostreams implementations may give surprising results.

This implies that int_type should be a superset of char_type. However, the requirements for char16_t and char32_t define int_type to be equal to int_least16_t and int_least32_t respectively. int_least16_t is likely to be the same size as char_16_t, which may lead to surprising behavior, even if eof() is not a valid UTF-16 code unit. The standard should not prescribe surprising behavior, especially without saying what it is (it's apparently not undefined, just surprising). The same applies for 32-bit types.

I personally recommend that behavior be undefined if eof() is a member of char_type, and another type be chosen for int_type (my personal favorite has always been a struct {bool eof; char_type c;}). Alternatively, the exact results of such a situation should be defined, at least so far that I/O could be conducted on these types as long as the code units remain valid. Note that the argument that no one streams char16_t or char32_t is not really valid as it would be perfectly reasonable to use a basic_stringstream in conjunction with UTF character types.

[ 2009-10-28 Ganesh provides two possible resolutions and expresses a preference for the second: ]

  1. Replace 24.2.3.2 [char.traits.specializations.char16_t] para 3 with:

    The member eof() shall return an implementation-defined constant that cannot appear as a valid UTF-16 code unit UINT_LEAST16_MAX [Note: this value is guaranteed to be a permanently reserved UCS-2 code position if UINT_LEAST16_MAX == 0xFFFF and it's not a UCS-2 code position otherwise — end note].

    Replace 24.2.3.3 [char.traits.specializations.char32_t] para 3 with:

    The member eof() shall return an implementation-defined constant that cannot appear as a Unicode code point UINT_LEAST32_MAX [Note: this value is guaranteed to be a permanently reserved UCS-4 code position if UINT_LEAST32_MAX == 0xFFFFFFFF and it's not a UCS-4 code position otherwise — end note].

  2. In 24.2.3.2 [char.traits.specializations.char16_t], in the definition of char_traits<char16_t> replace the definition of nested typedef int_type with:

    namespace std {
      template<> struct char_traits<char16_t> {
        typedef char16_t         char_type;
        typedef uint_least16_t uint_fast16_t int_type;
         ...
    

    Replace 24.2.3.2 [char.traits.specializations.char16_t] para 3 with:

    The member eof() shall return an implementation-defined constant that cannot appear as a valid UTF-16 code unit UINT_FAST16_MAX [Note: this value is guaranteed to be a permanently reserved UCS-2 code position if UINT_FAST16_MAX == 0xFFFF and it's not a UCS-2 code position otherwise — end note].

    In 24.2.3.3 [char.traits.specializations.char32_t], in the definition of char_traits<char32_t> replace the definition of nested typedef int_type with:

    namespace std {
      template<> struct char_traits<char32_t> {
        typedef char32_t         char_type;
        typedef uint_least32_t uint_fast32_t int_type;
         ...
    

    Replace 24.2.3.3 [char.traits.specializations.char32_t] para 3 with:

    The member eof() shall return an implementation-defined constant that cannot appear as a Unicode code point UINT_FAST32_MAX [Note: this value is guaranteed to be a permanently reserved UCS-4 code position if UINT_FAST32_MAX == 0xFFFFFFFF and it's not a UCS-4 code position otherwise — end note].

[ 2010 Rapperswil: ]

This seems an overspecification, and it is not clear what problem is being solved - these values can be used portably by using the named functions; there is no need for the value itself to be portable. Move to Tentatively NAD.

[ Moved to NAD at 2010-11 Batavia ]

Proposed resolution: