This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of CD1 status.
Section: 22.9.4 [bitset.operators] Status: CD1 Submitter: Matt Austern Opened: 2001-02-05 Last modified: 2016-01-28
Priority: Not Prioritized
View all other issues in [bitset.operators].
View all issues with CD1 status.
Discussion:
In 23.3.5.3, we are told that bitset
's input operator
"Extracts up to N (single-byte) characters from
is.", where is is a stream of type
basic_istream<charT, traits>
.
The standard does not say what it means to extract single byte
characters from a stream whose character type, charT
, is in
general not a single-byte character type. Existing implementations
differ.
A reasonable solution will probably involve widen()
and/or
narrow()
, since they are the supplied mechanism for
converting a single character between char
and
arbitrary charT
.
Narrowing the input characters is not the same as widening the
literals '0'
and '1'
, because there may be some
locales in which more than one wide character maps to the narrow
character '0'
. Narrowing means that alternate
representations may be used for bitset input, widening means that
they may not be.
Note that for numeric input, num_get<>
(22.2.2.1.2/8) compares input characters to widened version of narrow
character literals.
From Pete Becker, in c++std-lib-8224:
Different writing systems can have different representations for the digits that represent 0 and 1. For example, in the Unicode representation of the Devanagari script (used in many of the Indic languages) the digit 0 is 0x0966, and the digit 1 is 0x0967. Calling narrow would translate those into '0' and '1'. But Unicode also provides the ASCII values 0x0030 and 0x0031 for for the Latin representations of '0' and '1', as well as code points for the same numeric values in several other scripts (Tamil has no character for 0, but does have the digits 1-9), and any of these values would also be narrowed to '0' and '1'.
...
It's fairly common to intermix both native and Latin representations of numbers in a document. So I think the rule has to be that if a wide character represents a digit whose value is 0 then the bit should be cleared; if it represents a digit whose value is 1 then the bit should be set; otherwise throw an exception. So in a Devanagari locale, both 0x0966 and 0x0030 would clear the bit, and both 0x0967 and 0x0031 would set it. Widen can't do that. It would pick one of those two values, and exclude the other one.
From Jens Maurer, in c++std-lib-8233:
Whatever we decide, I would find it most surprising if bitset conversion worked differently from int conversion with regard to alternate local representations of numbers.
Thus, I think the options are:
- Have a new defect issue for 22.2.2.1.2/8 so that it will require the use of narrow().
- Have a defect issue for bitset() which describes clearly that widen() is to be used.
Proposed resolution:
Replace the first two sentences of paragraph 5 with:
Extracts up to N characters from is. Stores these characters in a temporary object str of type
basic_string<charT, traits>
, then evaluates the expressionx = bitset<N>(str)
.
Replace the third bullet item in paragraph 5 with:
is.widen(0)
nor is.widen(1)
(in which case the input character
is not extracted).
Rationale:
Input for bitset
should work the same way as numeric
input. Using widen
does mean that alternative digit
representations will not be recognized, but this was a known
consequence of the design choice.