Null-terminated multibyte strings
来自cppreference.com
该页由英文版wiki使用Google Translate机器翻译而来。
该翻译可能存在错误或用词不当。鼠标停留在文本上可以看到原版本。你可以帮助我们修正错误或改进翻译。参见说明请点击这里. |
以null结尾的多字节的字符串(NTMBS),或“多字节字符串”,是一个序列的非零字节一个字节的值为零(终止空字符).
原文:
A null-terminated multibyte string (NTMBS), or "multibyte string", is a sequence of nonzero bytes followed by a byte with value zero (the terminating null character).
存储在字符串中的每个字符占用一个字节以上。用来表示一个多字节字符串中的字符的编码是特定于语言环境的:它可能是UTF-8,GB18030,EUC-JP,SHIFT-JIS,等。例如,char数组{'\xe4','\xbd','\xa0','\xe5','\xa5','\xbd','\0'}是容纳字符串的NTMBS,"你好"在UTF-8多字节编码的前三个字节编码字符你,在接下来的三个字节编码字符好。在GB18030编码是相同的字符串char数组{'\xc4', '\xe3', '\xba', '\xc3', '\0'},其中每两个字符被编码为两个字节的序列.
原文:
Each character stored in the string may occupy more than one byte. The encoding used to represent characters in a multibyte character string is locale-specific: it may be UTF-8, GB18030, EUC-JP, Shift-JIS, etc. For example, the char array {'\xe4','\xbd','\xa0','\xe5','\xa5','\xbd','\0'} is an NTMBS holding the string "你好" in UTF-8 multibyte encoding: the first three bytes encode the character 你, the next three bytes encode the character 好. The same string encoded in GB18030 is the char array {'\xc4', '\xe3', '\xba', '\xc3', '\0'}, where each of the two characters is encoded as a two-byte sequence.
在一些多字节编码的,任何给定的多字节字符序列可能代表不同的字符,根据前一个字节序列,被称为“移位序列”。这种编码被称为依赖状态:当前的移动状态的知识来解释每个字符。一个NTMBS是唯一有效的,如果它的开始和结束的初始位移状态:如果移位序列,相应的不印字序列之前应终止空字符。实施例这种编码是BOCU的-1和SCSU.
原文:
In some multibyte encodings, any given multibyte character sequence may represent different characters depending on the previous byte sequences, known as "shift sequences". Such encodings are known as state-dependent: knowledge of the current shift state is required to interpret each character. An NTMBS is only valid if it begins and ends in the initial shift state: if a shift sequence was used, the corresponding unshift sequence has to be present before the terminating null character. Examples of such encodings are BOCU-1 and SCSU.
一个多字节字符串是布局兼容的空终止字节的字符串(非关税壁垒),可以存储,复制,并探讨了使用同样的设施,除了计算的字符数。如果是在正确的语言环境中,I / O功能,同时处理多字节字符串。多字节字符串可以被转换成宽字符串的codecvt成员函数,wstring_convert,或以下的语言环境相关的转换函数:
原文:
A multibyte character string is layout-compatible with null-terminated byte string (NTBS), that is, can be stored, copied, and examined using the same facilities, except for calculating the number of characters. If the correct locale is in effect, I/O functions also handle multibyte strings. Multibyte strings can be converted to and from wide strings using the codecvt member functions, wstring_convert, or the following locale-dependent conversion functions:
[编辑] 多字节/宽字符转换
在头文件
<stdlib.h> 中定义 | |
在未来的多字节字符,返回的字节数 原文: returns the number of bytes in the next multibyte character (函数) | |
未来的多字节字符转换为宽字符 原文: converts the next multibyte character to wide character (函数) | |
一个宽字符转换成多字节表示 原文: converts a wide character to its multibyte representation (函数) | |
多字节字符串转换成一个狭窄的宽字符串 原文: converts a narrow multibyte character string to wide string (函数) | |
缩小的多字节字符串转换成宽字符串 原文: converts a wide string to narrow multibyte character string (函数) | |
在头文件
<wchar.h> 中定义 | |
检查如果的mbstate_t对象初始移位状态 原文: checks if the mbstate_t object represents initial shift state (函数) | |
扩大一个窄字符的单字节宽字符,如果可能的话 原文: widens a single-byte narrow character to wide character, if possible (函数) | |
缩小了宽到窄字符的单字节字符,如果可能的话 原文: narrows a wide character to a single-byte narrow character, if possible (函数) | |
在给定的状态下的多字节字符,返回的字节数 原文: returns the number of bytes in the next multibyte character, given state (函数) | |
下的多字节字符转换为宽字符,给定的状态中 原文: converts the next multibyte character to wide character, given state (函数) | |
一个宽字符转换为多字节表示,给定的状态 原文: converts a wide character to its multibyte representation, given state (函数) | |
将一个狭窄的多字节字符的字符串,宽字符串,给定的状态 原文: converts a narrow multibyte character string to wide string, given state (函数) | |
宽字符串转换成窄的多字节字符串,给定的状态 原文: converts a wide string to narrow multibyte character string, given state (函数) | |
在头文件
<uchar.h> 中定义 | |
(C11) |
产生下一个16位宽的字符从一个狭窄的多字节字符串 原文: generate the next 16-bit wide character from a narrow multibyte string (函数) |
(C11) |
一个16位的宽字符转换成多字节字符串缩小 原文: convert a 16-bit wide character to narrow multibyte string (函数) |
(C11) |
产生下一个32位宽从一个狭窄的多字节字符串的字符 原文: generate the next 32-bit wide character from a narrow multibyte string (函数) |
(C11) |
一个32位的宽字符转换成多字节字符串缩小 原文: convert a 32-bit wide character to narrow multibyte string (函数) |
[编辑] 类型
在头文件
<wchar.h> 中定义 | |
迭代多字节字符串转换所需的状态信息 原文: conversion state information necessary to iterate multibyte character strings (类) | |
在头文件
<uchar.h> 中定义 | |
(C11) |
16位字符类型 (typedef) |
(C11) |
32位字符类型 (typedef) |
[编辑] 宏
在头文件
<limits.h> 中定义 | |
MB_LEN_MAX |
多字节字符的最大字节数 原文: maximum number of bytes in a multibyte character (常量宏) |
在头文件
<stdlib.h> 中定义 | |
MB_CUR_MAX |
在当前的Clocale
(宏变量)多字节字符的最大字节数 原文: maximum number of bytes in a multibyte character in the current C locale (宏变量) |
在头文件
<uchar.h> 中定义 | |
__STDC_UTF_16__ (C11) |
(常量宏) |
__STDC_UTF_32__ (C11) |
(常量宏) |