C16RTOMB(3C) Standard C Library Functions C16RTOMB(3C)

NAME


c16rtomb, c32rtomb, wcrtomb, wcrtomb_l - convert wide-characters to
character sequences

SYNOPSIS


#include <uchar.h>

size_t
c16rtomb(char *restrict str, char16_t c16, mbstate_t *restrict ps);

size_t
c32rtomb(char *restrict str, char32_t c32, mbstate_t *restrict ps);

#include <stdio.h>

size_t
wcrtomb(char *restrict str, wchar_t wc, mbstate_t *restrict ps);

#include <stdio.h>
#include <xlocale.h>

size_t
wcrtomb_l(char *restrict str, wchar_t wc, mbstate_t *restrict ps,
locale_t loc);

DESCRIPTION


The c16rtomb(), c32rtomb(), wcrtomb(), and wcrtomb_l() functions convert
wide-character sequences into a series of multi-byte characters. The
functions work in the following formats:

c16rtomb()
A UTF-16 code sequence, where every code point is represented by
one or two char16_t. The UTF-16 encoding will encode certain
Unicode code points as a pair of two 16-bit code sequences,
commonly referred to as a surrogate pair.

c32rtomb()
A UTF-32 code sequence, where every code point is represented by
a single char32_t. It is illegal to pass reserved Unicode code
points.

wcrtomb(), wcrtomb_l()
Wide characters, being a 32-bit value where every code point is
represented by a single wchar_t. While the wchar_t and char32_t
are different types, in this implementation, they are similar
encodings.

The functions all work by looking at the passed in wide-character (c16,
c32, wc) and appending it to the current conversion state, ps. Once a
valid code point, based on the current locale, is found, then it will be
converted into a series of characters that are stored in str. Up to
MB_CUR_MAX bytes will be stored in str. It is the caller's responsibility
to ensure that there is sufficient space in str.

The functions are all influenced by the LC_CTYPE category of the current
locale for determining what is considered a valid character. For example,
in the C locale, only ASCII characters are recognized, while in a UTF-8
based locale like en_us.UTF-8, all valid Unicode code points are recognized
and will be converted into the corresponding multi-byte sequence. The
wcrtomb_l() function uses the locale passed in loc rather than the locale
of the current thread.

The ps argument represents a multi-byte conversion state which can be used
across multiple calls to a given function (but not mixed between
functions). These allow for characters to be consumed from subsequent
buffers, e.g. different values of str. The functions may be called from
multiple threads as long as they use unique values for ps. If ps is NULL,
then a function-specific buffer will be used for the conversion state;
however, this is stored between all threads and its use is not recommended.

The functions all have a special behavior when NULL is passed for str.
They instead will treat it as though a the NULL wide-character was passed
in c16, c32, or wc and an internal buffer (buf) will be used to write out
the results of the conversion. In other words, the functions would be
called as:

c16rtomb(buf, L'\0', ps)
c32rtomb(buf, L'\0', ps)
wcrtomb(buf, L'\0', ps)
wcrtomb_l(buf, L'\0', ps, loc)

Locale Details


Not all locales in the system are Unicode based locales. For example, ISO
8859 family locales have code points with values that do not match their
counterparts in Unicode. When using these functions with non-Unicode based
locales, the code points returned will be those determined by the locale.
They will not be converted from the corresponding Unicode code point. For
example, if using the Euro sign in ISO 8859-15, these functions will not
encode the Unicode value 0x20ac into the ISO 8859-15 value 0xa4.

Regardless of the locale, the characters returned will be encoded as though
the code point were the corresponding value in Unicode. This means that
when using UTF-16, if the corresponding code point were in the range for
surrogate pairs, then the c16rtomb() function will expect to receive that
code point in that fashion.

This behavior of the c16rtomb() and c32rtomb() functions should not be
relied upon, is not portable, and subject to change for non-Unicode
locales.

RETURN VALUES


Upon successful completion, the c16rtomb(), c32rtomb(), wcrtomb(), and
wcrtomb_l() functions return the number of bytes stored in str. Otherwise,
(size_t)-1 is returned to indicate an encoding error and errno is set.

EXAMPLES


Example 1 Converting a UTF-32 character into a multi-byte character
sequence.

#include <locale.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>
#include <stdio.h>
#include <uchar.h>

int
main(void)
{
mbstate_t mbs;
size_t ret;
char buf[MB_CUR_MAX];
char32_t val = 0x5149;
const char *uchar_exp = "\xe5\x85\x89";

(void) memset(&mbs, 0, sizeof (mbs));
(void) setlocale(LC_CTYPE, "en_US.UTF-8");
ret = c32rtomb(buf, val, &mbs);
if (ret != strlen(uchar_exp)) {
errx(EXIT_FAILURE, "failed to convert string, got %zd",
ret);
}

if (strncmp(buf, uchar_exp, ret) != 0) {
errx(EXIT_FAILURE, "converted char32_t does not match "
"expected value");
}

return (0);
}

ERRORS


The c16rtomb(), c32rtomb(), wcrtomb(), and wcrtomb_l() functions will fail
if:

EINVAL The conversion state in ps is invalid.

EILSEQ An invalid character sequence has been detected.

MT-LEVEL
The c16rtomb(), c32rtomb(), wcrtomb(), and wcrtomb_l() functions are
MT-Safe as long as different mbstate_t structures are passed in ps. If ps
is NULL or different threads use the same value for ps, then the functions
are Unsafe.

INTERFACE STABILITY


Committed

SEE ALSO


mbrtoc16(3C), mbrtoc32(3C), mbrtowc(3C), newlocale(3C), setlocale(3C),
uselocale(3C), uchar.h(3HEAD), environ(7)

illumos December 2, 2023 illumos