This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of Open status.
std::formatter<std::filesystem::path>
Section: 31.12.6.9.2 [fs.path.fmtr.funcs] Status: Open Submitter: Jonathan Wakely Opened: 2024-04-19 Last modified: 2025-09-12
Priority: 2
View all issues with Open status.
Discussion:
31.12.6.9.2 [fs.path.fmtr.funcs] says:
IfcharT
ischar
,path::value_type
iswchar_t
, and the literal encoding is UTF-8, then the escaped path is transcoded from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with u+fffd replacement character per the Unicode Standard [...]. Otherwise, transcoding is implementation-defined.
This seems to mean that the Unicode substitutions are only done
for an escaped path, i.e. when the ?
option is used. Otherwise, the form
of transcoding is completely implementation-defined.
However, this makes no sense.
An escaped string will have no ill-formed subsequences, because they will
already have been replaced as per 28.5.6.5 [format.string.escaped]:
Otherwise (X is a sequence of ill-formed code units), each code unit U is appended to E in order as the sequence\x{hex-digit-sequence}
, wherehex-digit-sequence
is the shortest hexadecimal representation of U using lower-case hexadecimal digits.
So only unescaped strings can have ill-formed sequences by the time
we do transcoding to char
, but whether or not any
u+fffd substitution
occurs is just implementation-defined.
I believe we want to specify the substitutions are done when transcoding an unescaped path (and it doesn't matter whether we specify it for escaped paths, because it's a no-op if escaping happens first, as is apparently intended).
It does matter whether we escape first or perform substitutions first.
If we escape first then every code unit in an ill-formed sequence is
individually escaped as \x{hex-digit-sequence}
.
So an ill-formed sequence of two wchar_t
values will be escaped as
two \x{...}
strings, which are then transcoded to UTF-8.
If we transcode (with substitutions first) then the entire
ill-formed sequence is replaced with a single replacement character,
which will then be escaped as \x{fffd}
.
SG16 should be asked to confirm that escaping first is intended,
so that an escaped string shows the original invalid code units.
For a non-escaped string, we want the ill-formed sequence to be
formatted as �, which the proposed resolution tries to ensure.
[2024-05-08; Reflector poll]
Set priority to 2 after reflector poll.
Previous resolution [SUPERSEDED]:
This wording is relative to N4981.
Modify 31.12.6.9.2 [fs.path.fmtr.funcs] as indicated:
template<class FormatContext> typename FormatContext::iterator format(const filesystem::path& p, FormatContext& ctx) const;
-5- Effects: Lets
bep.generic_string<filesystem::path::value_type>()
if theg
option is used, otherwisep.native()
. Writess
intoctx.out()
, adjusted according to the path-format-spec. IfcharT
ischar
,path::value_type
iswchar_t
, and the literal encoding is UTF-8, then theescaped path(possibly escaped) string is transcoded from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. IfcharT
andpath::value_type
are the same then no transcoding is performed. Otherwise, transcoding is implementation-defined.- Modify the entry in the index of implementation-defined behavior as indicated:
transcoding of a formattedpath
whencharT
andpath::value_type
differ and not converting fromwchar_t
to UTF-8
[2025-06-11; SG16 comments and improves wording]
The "and not converting from wchar_t
to UTF-8" wording added in the index of implementation-defined
behavior by the current proposed resolution should be changed to "and the literal encoding is not UTF-8".
wchar_t
to UTF-8" with "and the literal encoding
is not UTF-8". The optional change is to insert "ordinary" before "literal encoding" as well. Once that is done,
I'll have SG16 confirm they are content with the new proposed resolution.
Previous resolution [SUPERSEDED]:
This wording is relative to N5008.
Modify 31.12.6.9.2 [fs.path.fmtr.funcs] as indicated:
template<class FormatContext> typename FormatContext::iterator format(const filesystem::path& p, FormatContext& ctx) const;
-5- Effects: Let
s
bep.generic_string<filesystem::path::value_type>()
if theg
option is used, otherwisep.native()
. Writess
intoctx.out()
, adjusted according to the path-format-spec. IfcharT
ischar
,path::value_type
iswchar_t
, and the ordinary literal encoding is UTF-8, then theescaped path(possibly escaped) string is transcoded from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. IfcharT
andpath::value_type
are the same then no transcoding is performed. Otherwise, transcoding is implementation-defined.- Modify the entry in the index of implementation-defined behavior as indicated:
transcoding of a formattedpath
whencharT
andpath::value_type
differ and the ordinary literal encoding is not UTF-8
[2025-07-30; SG16 meeting]
SG16 unanimously approved new wording produced during the discussion. The group concluded that the intended behavior would be best specified by introducing additional names to denote the sequence of transformations that produce the intended effect. Status → Open.
Proposed resolution:
This wording is relative to N5014.
Modify 31.12.6.9.2 [fs.path.fmtr.funcs] as indicated:
template<class FormatContext> typename FormatContext::iterator format(const filesystem::path& p, FormatContext& ctx) const;
-5- Effects: Let
s
bep.generic_string
if the<filesystem::path::value_type>()g
option is used, otherwisep.native()
. Lets2
bes
adjusted according to the path-format-spec. Lets3
be defined as follows:Writes
- (5.1) — If
charT
ischar
,path::value_type
iswchar_t
, and the ordinary literal encoding is UTF-8,s3
is the result of transcodings2
from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with U+FFFD REPLACEMENT CHARACTER per the Unicode Standard, Chapter 3.9 U+FFFD Substitution in Conversion.- (5.2) — If
charT
andpath::value_type
are the same, thens3
is the same ass2
.- (5.3) — Otherwise,
s3
is the result of an implementation-defined transcoding ofs2
.s3
intoctx.out()
.Writess
intoctx.out()
, adjusted according to the path-format-spec. IfcharT
ischar
,path::value_type
iswchar_t
, and the literal encoding is UTF-8, then the escaped path is transcoded from the native encoding for wide character strings to UTF-8 with maximal subparts of ill-formed subsequences substituted with u+fffd replacement character per the Unicode Standard, Chapter 3.9 u+fffd Substitution in Conversion. IfcharT
andpath::value_type
are the same then no transcoding is performed. Otherwise, transcoding is implementation-defined.
transcoding of a formattedpath
whencharT
andpath::value_type
differ and the ordinary literal encoding is not UTF-8