This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of Open status.

3840. filesystem::u8path should be undeprecated

Section: D.21 [depr.fs.path.factory] Status: Open Submitter: Daniel Krügler Opened: 2022-12-10 Last modified: 2024-01-29

Priority: 3

View all other issues in [depr.fs.path.factory].

View all issues with Open status.

Discussion:

The filesystem::u8path function became deprecated with the adoption of P0482R6, but the rationale for that change is rather thin:

"The C++ standard must improve support for UTF-8 by removing the existing barriers that result in redundant tagging of character encodings, non-generic UTF-8 specific workarounds like u8path."

The u8path function is still useful if my original string source is a char sequence and I do know that the encoding of this sequence is UTF-8.

The deprecation note suggests that one should use std::u8string instead, which costs me an additional transformation and doesn't work without reinterpret_cast.

Even in the presence of char8_t, legacy code bases often are still ABI-bound to char. In the future we may solve this problem using the tools provided by P2626 instead, but right now this is not part of the standard and it wasn't at the time when u8path became deprecated. This is in my opinion a good reason to undeprecate u8path now and decide later on the appropriate time to deprecate it again (if it really turns out to be obsolete by alternative functionality).

Billy O'Neal provides a concrete example where the current deprecation status causes pain:

Example: vcpkg-tool files.cpp#L21-L45

Before p0482, we could just call std::u8path and it would do the right thing on both POSIX and Windows. After compilers started implementing '20, we have to make assumptions about the correct 'internal' std::path encoding because there is no longer a way to arrive to std::path with a char buffer that we know is UTF-8 encoded and get the correct results.

It's one of the reasons we completely ripped out use of std::filesystem on most platforms from vcpkg, so you won't see this in current sources.

[2023-01-06; Reflector poll]

Set priority to 3 after reflector poll. Set status to LEWG.

[2023-05-30; status to "Open"]

LEWG discussed this in January and had no consensus for undeprecation.

Proposed resolution:

This wording is relative to N4917.

  1. Restore the u8path declarations to 31.12.4 [fs.filesystem.syn], header <filesystem> synopsis, as indicated:

    namespace std::filesystem {
      // 31.12.6 [fs.class.path], paths
      class path;
    
      // 31.12.6.8 [fs.path.nonmember], path non-member functions
      void swap(path& lhs, path& rhs) noexcept;
      size_t hash_value(const path& p) noexcept;
      
      // [fs.path.factory], path factory functions
      template<class Source>
        path u8path(const Source& source);
      template<class InputIterator>
        path u8path(InputIterator first, InputIterator last);
    
      // 31.12.7 [fs.class.filesystem.error], filesystem errors
      class filesystem_error;
    […]
    }
    
  2. Restore the previous sub-clause [fs.path.factory] by copying the contents of D.21 [depr.fs.path.factory] to a new sub-clause [fs.path.factory] between 31.12.6.8 [fs.path.nonmember] and 31.12.6.10 [fs.path.hash] and without Note 1 as indicated:

    [Drafting note: As additional stylistic adaption we replace the obsolete Requires element by a Preconditions element plus a Mandates element (similar to that of 31.12.6.5.1 [fs.path.construct] p5).

    As a second stylistic improvement we convert the now more unusual "if […]; otherwise" construction in bullets by "Otherwise, if […]" constructions.]

    ? Factory functions [fs.path.factory]

    template<class Source>
      path u8path(const Source& source);
    template<class InputIterator>
      path u8path(InputIterator first, InputIterator last);
    

    -?- Mandates: The value type of Source and InputIterator is char or char8_t.

    -?- Preconditions: The source and [first, last) sequences are UTF-8 encoded.

    -?- Returns:

    1. (?.1) — If value_type is char and the current native narrow encoding (31.12.6.3.2 [fs.path.type.cvt]) is UTF-8, return path(source) or path(first, last).

    2. (?.2) — Otherwise, if value_type is wchar_t and the native wide encoding is UTF-16, or if value_type is char16_t or char32_t, convert source or [first, last) to a temporary, tmp, of type string_type and return path(tmp).

    3. (?.3) — Otherwise, convert source or [first, last) to a temporary, tmp, of type u32string and return path(tmp).

    -?- Remarks: Argument format conversion (31.12.6.3.1 [fs.path.fmt.cvt]) applies to the arguments for these functions. How Unicode encoding conversions are performed is unspecified.

    -?- [Example 1: A string is to be read from a database that is encoded in UTF-8, and used to create a directory using the native encoding for filenames:

    namespace fs = std::filesystem;
    std::string utf8_string = read_utf8_data();
    fs::create_directory(fs::u8path(utf8_string));
    

    For POSIX-based operating systems with the native narrow encoding set to UTF-8, no encoding or type conversion occurs.

    For POSIX-based operating systems with the native narrow encoding not set to UTF-8, a conversion to UTF-32 occurs, followed by a conversion to the current native narrow encoding. Some Unicode characters may have no native character set representation.

    For Windows-based operating systems a conversion from UTF-8 to UTF-16 occurs. — end example]

  3. Delete sub-clause D.21 [depr.fs.path.factory] in its entirety.