CNV_CheckUTF8

CryptoSys PKI examples VB6 to VB.NET

CNV_CheckUTF8

Checks if a string is valid UTF-8.

VB6/VBA

Debug.Print "Testing CNV_CheckUTF8 ..."
Dim strData As String
Dim strDataUTF8 As String
Dim nRet As Long
Dim nLen As Long

' Our original string data is in "Latin-1" encoding
strData = "Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México|"
Debug.Print "Latin-1 string:"
Debug.Print strData

' Check if this is valid UTF-8 (it's not)
nRet = CNV_CheckUTF8(strData)
Debug.Print "CNV_CheckUTF8 returns " & nRet

' So convert to UTF-8
nLen = CNV_UTF8FromLatin1("", 0, strData)
If nLen < 0 Then
    Debug.Print "Failed to convert to UTF-8: " & nLen
    Exit Sub
End If
strDataUTF8 = String(nLen, " ")
nLen = CNV_UTF8FromLatin1(strDataUTF8, nLen, strData)
' Which may not display correctly in VB6...!
Debug.Print "UTF-8 string:"
Debug.Print strDataUTF8

' And check again (expected result = 2
' => Valid UTF-8, contains at least one 8-bit ANSI character
nRet = CNV_CheckUTF8(strDataUTF8)
Debug.Print "CNV_CheckUTF8 returns " & nRet

Output

Testing CNV_CheckUTF8 ...
Latin-1 string:
Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México|
CNV_CheckUTF8 returns 0
UTF-8 string:
AsociaciÃ³n Mexicana de EstÃ¡ndares para el Comercio ElectrÃ³nico A.C.|MÃ©xico|
CNV_CheckUTF8 returns 2

VB.NET

Console.WriteLine("Testing CNV_CheckUTF8 ...")
Dim strData As String
Dim strDataUTF8 As String
Dim nRet As Integer
''Dim nLen As Integer

' Our original string data is in "Latin-1" encoding
strData = "Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México|"
Console.WriteLine("Latin-1 string:")
Console.WriteLine(strData)
Console.WriteLine("strData.Length=" & strData.Length)

' Check if this is valid UTF-8 (it's not)
nRet = Cnv.CheckUTF8(strData)
Console.WriteLine("Cnv.CheckUTF8 returns " & nRet)

' So convert to UTF-8
Dim abData() As Byte
abData = System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(strData)
Console.WriteLine("abData.Length(iso-8859-1) =" & abData.Length)
abData = System.Text.Encoding.UTF8.GetBytes(strData)
Console.WriteLine("abData.Length(UTF-8)      =" & abData.Length)

' [FUDGE!]
strDataUTF8 = System.Text.Encoding.Default.GetString(abData)
' Which may not display correctly...!
Console.WriteLine("UTF-8 string:")
Console.WriteLine(strDataUTF8)
Console.WriteLine("strDataUTF8.Length=" & strDataUTF8.Length)

' And check again (expected result = 2
' => Valid UTF-8, contains at least one 8-bit ANSI character
nRet = Cnv.CheckUTF8(strDataUTF8)
Console.WriteLine("Cnv.CheckUTF8 returns " & nRet)

Remarks

The .NET "conversion" above is a bit of fudge. We effectively force the bytes that make up the UTF-8 string into another string by (correctly) converting to a byte array abData() using GetBytes, and then using the default option for GetString to put these bytes back into a string (naughty). This string will appear to be UTF-8 for the purposes of our Cnv.CheckUTF8 method but may be of limited use.

More useful is to note how we could get a byte array in either the Latin-1 encoding or UTF-8 encoding from the same string using GetBytes. Using an unambiguous byte array is the way to go if you intend creating a hash of the UTF-8 string. This will save you lots of grief especially when dealing with XML signatures.

[Contents]

[HOME] [NEXT: HASH_Bytes...]