Checks if a string is valid UTF-8.
VB6/VBA
Debug.Print "Testing CNV_CheckUTF8 ..." Dim strData As String Dim strDataUTF8 As String Dim nRet As Long Dim nLen As Long ' Our original string data is in "Latin-1" encoding strData = "Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México|" Debug.Print "Latin-1 string:" Debug.Print strData ' Check if this is valid UTF-8 (it's not) nRet = CNV_CheckUTF8(strData) Debug.Print "CNV_CheckUTF8 returns " & nRet ' So convert to UTF-8 nLen = CNV_UTF8FromLatin1("", 0, strData) If nLen < 0 Then Debug.Print "Failed to convert to UTF-8: " & nLen Exit Sub End If strDataUTF8 = String(nLen, " ") nLen = CNV_UTF8FromLatin1(strDataUTF8, nLen, strData) ' Which may not display correctly in VB6...! Debug.Print "UTF-8 string:" Debug.Print strDataUTF8 ' And check again (expected result = 2 ' => Valid UTF-8, contains at least one 8-bit ANSI character nRet = CNV_CheckUTF8(strDataUTF8) Debug.Print "CNV_CheckUTF8 returns " & nRet
Output
Testing CNV_CheckUTF8 ... Latin-1 string: Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México| CNV_CheckUTF8 returns 0 UTF-8 string: Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México| CNV_CheckUTF8 returns 2
VB.NET
Console.WriteLine("Testing CNV_CheckUTF8 ...")
Dim strData As String
Dim strDataUTF8 As String
Dim nRet As Integer
''Dim nLen As Integer
' Our original string data is in "Latin-1" encoding
strData = "Asociación Mexicana de Estándares para el Comercio Electrónico A.C.|México|"
Console.WriteLine("Latin-1 string:")
Console.WriteLine(strData)
Console.WriteLine("strData.Length=" & strData.Length)
' Check if this is valid UTF-8 (it's not)
nRet = Cnv.CheckUTF8(strData)
Console.WriteLine("Cnv.CheckUTF8 returns " & nRet)
' So convert to UTF-8
Dim abData() As Byte
abData = System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(strData)
Console.WriteLine("abData.Length(iso-8859-1) =" & abData.Length)
abData = System.Text.Encoding.UTF8.GetBytes(strData)
Console.WriteLine("abData.Length(UTF-8) =" & abData.Length)
' [FUDGE!]
strDataUTF8 = System.Text.Encoding.Default.GetString(abData)
' Which may not display correctly...!
Console.WriteLine("UTF-8 string:")
Console.WriteLine(strDataUTF8)
Console.WriteLine("strDataUTF8.Length=" & strDataUTF8.Length)
' And check again (expected result = 2
' => Valid UTF-8, contains at least one 8-bit ANSI character
nRet = Cnv.CheckUTF8(strDataUTF8)
Console.WriteLine("Cnv.CheckUTF8 returns " & nRet)
Remarks
The .NET "conversion" above is a bit of fudge.
We effectively force the bytes that make up the UTF-8 string into another string
by (correctly) converting to a byte array abData() using GetBytes,
and then using the default option for GetString to put these bytes back into a string (naughty).
This string will appear to be UTF-8 for the purposes of our Cnv.CheckUTF8 method but may be of limited use.
More useful is to note how we could get a byte array in either the Latin-1 encoding or UTF-8 encoding from the same string using GetBytes.
Using an unambiguous byte array is the way to go if you intend creating a hash of the UTF-8 string.
This will save you lots of grief especially when dealing with XML signatures.
[Contents]