CNV_UTF8BytesFrom8Bit

CryptoSys PKI Pro Manual

CNV_UTF8BytesFrom8Bit

Convert a string of 8-bit single-byte-character-set (SBCS) characters into a UTF-8 encoded array of bytes for different code pages.

VBA/VB6 Syntax


Public Declare Function CNV_UTF8BytesFrom8Bit Lib "diCrPKI.dll" (ByRef lpOutput As Byte, ByVal nOutBytes As Long, ByVal strInput As String, ByVal nOptions as Long) As Long

nLen = CNV_UTF8BytesFrom8Bit(lpOutput(0), nOutBytes, strInput, nOptions)

C/C++ Syntax


long __stdcall CNV_UTF8BytesFrom8Bit(unsigned char *lpOutput, long nOutBytes, const char *szInput, long nOptions);

Parameters

lpOutput: [out] array suitably dimensioned to receive output.
nOutBytes: [in] specifying the maximum number of bytes to be received.
szInput: [in] string of 8-bit characters to be converted.
nOptions: [in] Code page. Select one of:
PKI_CNV_CP_LATIN1 (0) ISO-8859-1 (Latin-1) code page (default)
PKI_CNV_CP_LATIN9 ISO-8859-15 (Latin-9) code page
PKI_CNV_CP_CP1252 Windows-1252 (CP1252) code page

Returns (VBA/C)

If successful, the return value is a positive number indicating the number of bytes in the output array, or number of bytes required if nOutBytes is set to zero; otherwise it returns a negative error code.

VBA Wrapper Syntax

Public Function cnvUTF8BytesFrom8Bit (szInput As String, nOptions As Long) As Byte

Remarks

This function will set up to nOutBytes bytes in the output array. If nOutBytes is zero or lpOutput is NULL, it returns the required number of bytes.

This function is for ANSI C and VBA/VB6 programmers only.

This is a more general version of CNV_UTF8BytesFromLatin1 with options for different code pages. It solves the problem of dealing with characters like the Euro Sign € (U+20AC) not available in the strict Latin-1 (ISO-8859-1), but available in other SBCS code pages such as Latin-9 (ISO-8859-15) and Windows-1252. These code pages differ in the code range 0x80 to 0x9F.

ISO-8859-1 (Latin-1)

The code range 0x80 to 0x9F is reserved for control characters (known as the C1 controls) of little or no practical use.

ISO-8859-15 (Latin-9)

Identical to ISO-8859-1 except 8 characters in the code range 0x80 to 0x9F are replaced by printable characters. In particular the code point 0xA4 (164) is defined as the Euro Sign (U+20AC).

Windows-1252 (CP1252)

Identical to ISO-8859-1 except 27 characters in the code range 0x80 to 0x9F are replaced by printable characters. In particular the code point 0x80 (128) is defined as the Euro Sign (U+20AC). This code page is the misleadingly labelled Windows "ANSI character set", not actually approved by ANSI, but widely used.

Example (VBA wrapper function)

Dim strData As String
Dim lpDataUTF8() As Byte  ' Chr(128) == EUR sign
strData = Chr(128) & "100.99"
Debug.Print "Latin string='" & strData & "'"
Debug.Print " (" & Len(strData) & " characters)"
lpDataUTF8 = cnvUTF8BytesFrom8Bit(strData, PKI_CNV_CP_CP1252)
Debug.Print "UTF-8=(0x)" & cnvHexStrFromBytes(lpDataUTF8)
Debug.Print "OK=       E282AC3130302E3939"
Debug.Print " (" & cnvBytesLen(lpDataUTF8) & " bytes)"
Debug.Print "cnvCheckUTF8Bytes returns " & cnvCheckUTF8Bytes(lpDataUTF8) & " (expected 3)"

Latin string='€100.99'
 (7 characters)
UTF-8=(0x)E282AC3130302E3939
OK=       E282AC3130302E3939
 (9 bytes)
cnvCheckUTF8Bytes returns 3 (expected 3)

Example (ANSI C)

The input is the string "€100.99" encoded for the Windows-1252 code page. The output is the UTF-8-encoded byte array E2 82 AC 31 30 30 2E 39 39 where E2 82 AC is the UTF-8 encoding of the Euro Sign (U+20AC).

long utflen, sbcslen;
unsigned char *cpu;
int i;
char *sbcsdata = "\x80" "100.99";   /* Windows-1252 code page */
/* Expected UTF-8:       U+20AC EUR    1   0   0   .   9   9 */
unsigned char *utf8ok = "\xE2\x82\xAC\x31\x30\x30\x2E\x39\x39";

sbcslen = (long)strlen(sbcsdata);
printf("SBCS input is %ld bytes\n", sbcslen);
/* Find required length in bytes of output */
utflen = CNV_UTF8BytesFrom8Bit(NULL, 0, sbcsdata, PKI_CNV_CP_CP1252);
printf("UTF-8 output is %ld bytes\n", utflen);
assert(utflen >= 0);
/* Allocate buffer memory exact bytes */
cpu = malloc(utflen);
assert(cpu != NULL);
/* Do the business */
utflen = CNV_UTF8BytesFrom8Bit(cpu, utflen, sbcsdata, PKI_CNV_CP_CP1252);
for (i = 0; i < utflen; i++) {  /* Display output in hex */
    printf("%02x", cpu[i]);
}
/* Check we get expected value */
assert(0 == memcmp(cpu, utf8ok, utflen));
free(cpu);

SBCS input is 7 bytes
UTF-8 output is 9 bytes
e282ac3130302e3939