character literal
From cppreference.com
Syntax
' c-char '
|
(1) | ||||||||
u8 ' c-char '
|
(2) | (since C++17) | |||||||
u ' c-char '
|
(3) | (since C++11) | |||||||
U ' c-char '
|
(4) | (since C++11) | |||||||
L ' c-char '
|
(5) | ||||||||
' c-char-sequence '
|
(6) | ||||||||
where
- c-char is either
- a character from the source character set minus single-quote (
'
), backslash (\
), or the newline character, - escape sequence, as defined in escape sequences
- universal character name, as defined in escape sequences
- a character from the source character set minus single-quote (
- c-char-sequence is a sequence of two or more c-chars.
1) narrow character literal or ordinary character literal, e.g. 'a' or '\n' or '\13'. Such literal has type char and the value equal to the representation of c-char in the execution character set. If c-char is not representable as a single byte in the execution character set, the literal has type int and implementation-defined value
2) UTF-8 character literal, e.g. u8'a'. Such literal has type char (until C++20)char8_t (since C++20) and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-8 code unit (that is, c-char is in the range 0x0-0x7F, inclusive). If c-char is not representable with a single UTF-8 code unit, the program is ill-formed.
3) UTF-16 character literal, e.g. u'貓', but not u'🍌' (u'\U0001f34c'). Such literal has type char16_t and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-16 code unit (that is, c-char is in the range 0x0-0xFFFF, inclusive). If c-char is not representable with a single UTF-16 code unit, the program is ill-formed.
4) UTF-32 character literal, e.g. U'貓' or U'🍌'. Such literal has type char32_t and the value equal to the value and the value equal to ISO 10646 code point value of c-char.
5) wide character literal, e.g. L'β' or L'貓'. Such literal has type wchar_t and the value equal to the value of c-char in the execution wide character set. If c-char is not representable in the execution character set (e.g. a non-BMP value on Windows where wchar_t is 16-bit), the value of the literal is implementation-defined.
6) Multicharacter literal, e.g. 'AB', has type int and implementation-defined value.
Notes
Multicharacter literals were inherited by C from the B programming language. Although not specified by the C or C++ standard, compilers implement multicharacter literals as specified in B: the values of each char in the literal initialize successive bytes of the resulting integer, in big-endian zero-padded right-adjusted order, e.g. the value of '\1' is 0x00000001 and the value of '\1\2\3\4' is 0x01020304.
In C, character constants such as 'a' or '\n' have type int, rather than char.
See also
user-defined literals | literals with user-defined suffix (C++11) |