This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of SG16 status.

3780. format's width estimation is too approximate and not forward compatible

Section: 22.14.2.2 [format.string.std] Status: SG16 Submitter: Corentin Jabot Opened: 2022-09-15 Last modified: 2022-10-12

Priority: 3

View other active issues in [format.string.std].

View all other issues in [format.string.std].

View all issues with SG16 status.

Discussion:

For the purpose of width estimation, format considers ranges of codepoints initially derived from an implementation of wcwidth with modifications (see P1868R1).

This however present a number of challenges:

Instead, we propose to

Note that per UAX-11

This change:

For the following code points, the estimated width used to be 1, and is 2 after the suggested change:

For the following code points, the estimated width used to be 2, and is 1 after the suggested change:

[2022-10-12; Reflector poll]

Set priority to 3 after reflector poll. Send to SG16.

Proposed resolution:

This wording is relative to N4917.

  1. Modify 22.14.2.2 [format.string.std] as indicated:

    -12- For a string in a Unicode encoding, implementations should estimate the width of a string as the sum of estimated widths of the first code points in its extended grapheme clusters. The extended grapheme clusters of a string are defined by UAX #29. The estimated width of the following code points is 2:

    1. (12.1) — U+1100 – U+115F

    2. (12.2) — U+2329 – U+232A

    3. (12.3) — U+2E80 – U+303E

    4. (12.4) — U+3040 – U+A4CF

    5. (12.5) — U+AC00 – U+D7A3

    6. (12.6) — U+F900 – U+FAFF

    7. (12.7) — U+FE10 – U+FE19

    8. (12.8) — U+FE30 – U+FE6F

    9. (12.9) — U+FF00 – U+FF60

    10. (12.10) — U+FFE0 – U+FFE6

    11. (12.11) — U+1F300 – U+1F64F

    12. (12.12) — U+1F900 – U+1F9FF

    13. (12.13) — U+20000 – U+2FFFD

    14. (12.14) — U+30000 – U+3FFFD

    15. (?.1) — Any code point with the East_Asian_Width="W" or East_Asian_Width="F" Derived Extracted Property as described by UAX #44

    16. (?.2) — U+4DC0 – U+4DFF (Yijing Hexagram Symbols)

    17. (?.3) — U+1F300 – U+1F5FF (Miscellaneous Symbols and Pictographs)

    18. (?.4) — U+1F900 – U+1F9FF (Supplemental Symbols and Pictographs)

    The estimated width of other code points is 1.