This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of WP status.
char
to sequences of wchar_t
Section: 28.5.6.4 [format.formatter.spec] Status: WP Submitter: Mark de Wever Opened: 2023-06-01 Last modified: 2024-07-08
Priority: 3
View other active issues in [format.formatter.spec].
View all other issues in [format.formatter.spec].
View all issues with WP status.
Discussion:
I noticed some interesting features introduced by the range based formatters in C++23
// Ill-formed in C++20 and C++23 const char* cstr = "hello"; char* str = const_cast<char*>(cstr); std::format(L"{}", str); std::format(L"{}",cstr); // Ill-formed in C++20 // In C++23 they give L"['h', 'e', 'l', 'l', 'o']" std::format(L"{}", "hello"); // A libc++ bug prevents this from working. std::format(L"{}", std::string_view("hello")); std::format(L"{}", std::string("hello")); std::format(L"{}", std::vector{'h', 'e', 'l', 'l', 'o'});
An example is shown here. This only shows libc++ since libstdc++ and MSVC STL have not implemented the formatting ranges papers (P2286R8 and P2585R0) yet.
The difference between C++20 and C++23 is the existence of range formatters. These formatters use the formatter specializationformatter<char, wchar_t>
which converts the sequence of char
s
to a sequence of wchar_t
s.
In this conversion same_as<char, charT>
is false
, thus the requirements
of the range-type s
and ?s
([tab:formatter.range.type]) aren't met. So
the following is ill-formed:
std::format(L"{:s}", std::string("hello")); // Not L"hello"
It is surprising that some string types can be formatted as a sequence
of wide-characters, but others not. A sequence of characters can be a
sequence UTF-8 code units. This is explicitly supported in the width
estimation of string types. The conversion of char
to wchar_t
will
convert the individual code units, which will give incorrect results for
multi-byte code points. It will not transcode UTF-8 to UTF-16/32. The
current behavior is not in line with the note in
28.5.6.4 [format.formatter.spec]/2
[Note 1: Specializations such as
formatter<wchar_t, char>
andformatter<const char*, wchar_t>
that would require implicit multibyte / wide string or character conversion are disabled. — end note]
Disabling this could be done by explicitly disabling the char
to wchar_t
sequence formatter. Something along the lines of
template<ranges::input_range R> requires(format_kind<R> == range_format::sequence && same_as<remove_cvref_t<ranges::range_reference_t<R>>, char>) struct formatter<R, wchar_t> : __disabled_formatter {};
where __disabled_formatter
satisfies 28.5.6.4 [format.formatter.spec]/5, would
do the trick. This disables the conversion for all sequences not only
the string types. So vector
, array
, span
, etc. would be disabled.
range_formatter
. This allows
users to explicitly opt in to this formatter for their own
specializations.
An alternative would be to only disable this conversion for string type
specializations (28.5.6.4 [format.formatter.spec]/2.2) where char
to
wchar_t
is used:
template<size_t N> struct formatter<charT[N], charT>; template<class traits, class Allocator> struct formatter<basic_string<charT, traits, Allocator>, charT>; template<class traits> struct formatter<basic_string_view<charT, traits>, charT>;
Disabling following the following two is not strictly required:
template<> struct formatter<char*, wchar_t>; template<> struct formatter<const char*, wchar_t>;
However, if (const
) char*
becomes an input_range
in a future version C++, these formatters would become enabled.
Disabling all five instead of the three required specializations seems like a
future proof solution.
template<> struct formatter<wchar_t, char>;
there are no issues for wchar_t
to char
conversions.
Do we want to allow string types of char
s to be formatted as
sequences of wchar_t
s?
Do we want to allow non string type sequences of char
s to be
formatted as sequences of wchar_t
s?
Should we disable char
to wchar_t
conversion in the range_formatter
?
SG16 has indicated they would like to discuss this issue during a telecon.
[2023-06-08; Reflector poll]
Set status to SG16 and priority to 3 after reflector poll.
[2023-07-26; Mark de Wever provides wording confirmed by SG16]
[2024-03-18; Tokyo: move to Ready]
[St. Louis 2024-06-29; Status changed: Voting → WP.]
Proposed resolution:
This wording is relative to N4950.
Modify 28.5.6.4 [format.formatter.spec] as indicated:
[Drafting note: The unwanted conversion happens due to the
formatter
base class specialization (28.5.7.3 [format.range.fmtdef])struct range-default-formatter<range_format::sequence, R, charT>which is defined the header
<format>
. Therefore the disabling is only needed in this header) — end drafting note]
-2- […]
Theparse
member functions of these formatters interpret the format specification as a std-format-spec as described in 28.5.2.2 [format.string.std]. [Note 1: Specializations such asformatter<wchar_t, char>
andthat would require implicit multibyte / wide string or character conversion are disabled. — end note] -?- The headerformatter<const char*, wchar_t>
<format>
provides the following disabled specializations:
(?.1) — The string type specializations
template<> struct formatter<char*, wchar_t>; template<> struct formatter<const char*, wchar_t>; template<size_t N> struct formatter<char[N], wchar_t>; template<class traits, class Allocator> struct formatter<basic_string<char, traits, Allocator>, wchar_t>; template<class traits> struct formatter<basic_string_view<char, traits>, wchar_t>;-3- For any types
T
andcharT
for which neither the library nor the user provides an explicit or partial specialization of the class templateformatter
,formatter<T, charT>
is disabled.