string literal
Syntax
" (unescaped_character|escaped_character)* "
|
(1) | ||||||||
L " (unescaped_character|escaped_character)* "
|
(2) | ||||||||
u8 " (unescaped_character|escaped_character)* "
|
(3) | (since C++11) | |||||||
u " (unescaped_character|escaped_character)* "
|
(4) | (since C++11) | |||||||
U " (unescaped_character|escaped_character)* "
|
(5) | (since C++11) | |||||||
prefix(optional) R " delimiter( raw_characters ) delimiter"
|
(6) | (since C++11) | |||||||
Explanation
unescaped_character | - | Any valid character except the double-quote ", backslash \, or new-line character |
escaped_character | - | See escape sequences |
prefix | - | One of L , u8 , u , U
|
delimiter | - | A character sequence made of any source character but parentheses, backslash and spaces (can be empty, and at most 16 characters long) |
raw_characters | - | Any character sequence, except that it must not contain the closing sequence ) delimiter"
|
N
is the size of the string in code units of the execution narrow encoding, including the null terminator.N
is the size of the string in code units of the execution wide encoding, including the null terminator.N
is the size of the string in UTF-8 code units including the null terminator.N
is the size of the string in UTF-16 code units including the null terminator.N
is the size of the string in UTF-32 code units including the null terminator.Notes
The null character ('\0', L'\0', char16_t(), etc) is always appended to the string literal: thus, a string literal "Hello" is a const char[6] holding the characters 'H', 'e', 'l', 'l', 'o', and '\0'.
The encoding of narrow multibyte string literals (1) and wide string literals (2) is implementation-defined. For example, gcc selects them with the commandline options -fexec-charset and -fwide-exec-charset.
String literals placed side-by-side are concatenated at translation phase 6 (after the preprocessor). That is, "Hello," " world!" yields the (single) string "Hello, world!". If the two strings have the same encoding prefix (or neither has one), the resulting string will have the same encoding prefix (or no prefix).
If one of the strings has an encoding prefix and the other doesn't, the one that doesn't will be considered to have the same encoding prefix as the other. L"Δx = %" PRId16 // at phase 4, PRId16 expands to "d" // at phase 6, L"Δx = %" and "d" form L"Δx = %d" If a UTF-8 string literal and a wide string literal are side by side, the program is ill-formed. |
(since C++11) |
Any other combination of encoding prefixes may or may not be supported by the implementation. The result of such a concatenation is implementation-defined.
String literals have static storage duration, and thus exist in memory for the life of the program.
String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
The compiler is allowed, but not required, to combine storage for equal or overlapping string literals. That means that identical string literals may or may not compare equal when compared by pointer.
bool b = "bar" == 3+"foobar" // could be true or false, implementation-defined
Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata
) or combined with other string literals:
const char* pc = "Hello"; char* p = const_cast<char*>(pc); p[0] = 'M'; // undefined behavior
In C, string literals are of type char[], and can be assigned directly to a (non-const) char*. C++03 allowed it as well (but deprecated it, as literals are const in C++). C++11 no longer allows such assignments without a cast.
A string literal is not necessarily a C string: if a string literal has embedded null characters, it represents an array which contains more than one string.
const char* p = "abc\0def"; // std::strlen(p) == 3, but the array has size 8
If a valid hex digit follows a hex escape in a string literal, it would fail to compile as an invalid escape sequence. String concatenation can be used as a workaround:
//const char* p = "\xfff"; // error: hex escape sequence out of range const char* p = "\xff""f"; // OK: the literal is const char[3] holding {'\xff','f','\0'}
Example
#include <iostream> char array1[] = "Foo" "bar"; // same as char array2[] = { 'F', 'o', 'o', 'b', 'a', 'r', '\0' }; const char* s1 = R"foo( Hello World )foo"; //same as const char* s2 = "\nHello\nWorld\n"; int main() { std::cout << array1 << '\n'; std::cout << array2 << '\n'; std::cout << s1; std::cout << s2; }
Output:
Foobar Foobar Hello World Hello World
See also
user-defined literals | literals with user-defined suffix (C++11) |