C++ cstdlib mbtowc

int mbtowc(wchar_t * restrict pwc , const char * restrict s , size_t n);

The C++ <cstdlib> mbtowc function convert the multibyte characters pointed to by the ‘s’ pointer into it’s wide character and stores the wide character in the object pointed to by ‘pwc’ wchar_t pointer.The number of bytes to be converted into wide character is determine by the third argument ‘n’ beginning from the byte pointed by the ‘s’ pointer.

pwc -The wide character object that stores the multibyte character that is converted to the wchar_t type.

s -The pointer to the multibyte character that is to be converted to the wchar_t type.

n -Number of byte to be taken into account for the conversion of the second argument object to the wchar_t type.

Return type
   a)It returns the number of byte that is converted to the wide character.
    b) It returns nonzero or zero if ‘s’ points to NULL and the multibyte character encodings do or do not have state-dependent encodings.
    c) If ‘s’ points to ‘\0’ ,it returns 0.
    d)If an error occur it returns -1.
    e) Whatever the case it does not return a value greater than ‘n’ or the value of the MB_CUR_MAX macro.

Code example : include <cstring> to use the ‘strlen’ function

wchar_t wc ;
int ret ;
const char st[]=”New” ;

for( int i=0 ; i<strlen(st) ; i++ ) //strlen() calculates the length of ‘st’ string
ret=mbtowc( &wc , &st[i] , 1 );
cout<< “ret=” << ret ;
wcout<< ” wc=” << wc << endl ;

Output in Code::Blocks,

ret=1 wc=N
ret=1 wc=e
ret=1 wc=w

The first argument is an address of the wide character type object and the second argument is a pointer to const char* type.We passed the third argument as 1 because we want to convert 1 byte memory to wchar_t type at a time.Even if you pass 2 or more value as the third argument only one character will be converted at a time.

The topic below discuss passing a UTF-8 encoded characters and also various cases when multilingual characters are passed.

Passing UTF-8 encoded characters ,char16_t and char32_t type characters and unicode character

Passing UTF-8 encoding characters

Passing UTF-8 encoded characters also known as UTF-8 narrow multibyte characters are converted to the corresponding character using the implementation-defined encoding method and stored in the first argument object.Note to specify UTF-8 encoding characters attach the value ‘u8‘ at the front of the string.An example is shown below.

Code example

const char uni[]=u8″\u0020\u0021\u0022\u0023\u0024\u0040\u0085″ ;
wchar_t wc ;
int ret ;

for( size_t i=0 ; i<strlen(uni) ; i++ )
ret=mbtowc( &wc , &uni[i] , 1 );
cout<< “sz=” << sz << ” ” ;
wcout<< wc << endl ;

Output in code::blocks,

sz=1 !
sz=1 ”
sz=1 #
sz=1 $
sz=1 @
sz=1 ┬
sz=1 à

Passing char16_t and char32_t type string.

The char16_t and char32_t type character occupy 2 bytes and 4 bytes size respectively.We cannot pass the characters of char16_t and char32_t as the second argument to ‘mbtowc’ function.The ‘mbtowc’ accept only const char* type and also each character of the second argument passed must occupy only 1 byte but it is not in the case of char16_t and char32_t.Hence ‘mbtowc’ cannot be used to convert the said types characters into wchar_t type.

Code example

const char16_t c16[]=u”Normal” ;
const char32_t c32[]=U”Strange” ;
wchar_t w1 ;

mbtowc( &w1 , &c16[0] , 1 ); //error
mbtowc( &w1 , &c32[0] , 1 ); //error

Passing multilingual unicode characters

Multilingual unicode characters occupy more than 1 byte size.And also const char* cannot hold them,except in Linux where char type can represent them.In both Windows and Linux however,we cannot use the function ‘mbtowc’ to convert the unicode to wchar_t type.A code example is given below.

Code example : in Windows with code::blocks

wchar_t wc ;

mbtowc( &wc , “बईबईसई” , 1 );

cout<< wc ;

The output doesn’t make sense,you can check them yourself.

Code example : in Linux(Ubuntu) with code::blocks

wchar_t wc ;
const char c[ ]=”한자” ;

mbtowc( &wc , &c[0] , 1 );

cout<< wc ; ///give 0