Author Topic: tags18 utf-16  (Read 361 times)

jpf

  • Posts: 182
tags18 utf-16
« on: 1 Sep '22 - 05:17 »
Is it possible to get UTF-16 encoded strings from tags18's TAGS_Read or TAGS_ReadEx? Is it possible under Windows XP?

I tried setting TAGS_SetUTF8(BASSTRUE) and converting to UTF-16 but it doesn't seem to work. Maybe BASSTRUE isn't the right value for the argument?

I'm coding in VB6, so I don't have proper debugging tools to show unicode strings. Also I don't have many files with non-ansi tags. Currently I'm testing a few files with Japanese tags, but unfortunatelly my XP OS shows them as squares. I don't know why. I tried all the installed fonts and none seems to show them. Cyrilic and Chinese filenames are correctly displayed in the explorer (and also in my VB6 application using a few API tricks), but Japanese characters don't. I can use some help on this point, too.

This is my failed code to get UTF-8 to UTF-16 conversion:
Code: [Select]
Private Declare Function MultiByteToWideCharArray Lib "kernel32" Alias "MultiByteToWideChar" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long

Public Function StrConvUTF8toUTF16(ByVal Txt As String) As String
    Const CP_UTF8 = 65001   '// UTF-8
    Dim lpWideCharStr() As Byte
    Dim cchWideChar As Long
   
    cchWideChar = MultiByteToWideCharArray(CP_UTF8, 0, StrPtr(Txt), LenB(Txt), 0, 0)
    If cchWideChar = 0 Then Exit Function 'error
    ReDim lpWideCharStr(cchWideChar)
    cchWideChar = MultiByteToWideChar(CP_UTF8, 0, StrPtr(Txt), LenB(Txt), VarPtr(lpWideCharStr(0)), cchWideChar)
    StrConvUTF8toUTF16 = lpWideCharStr
End Function


Ian @ un4seen

  • Administrator
  • Posts: 25430
Re: tags18 utf-16
« Reply #1 on: 1 Sep '22 - 12:21 »
You can indeed use MultiByteToWideChar (with CP_UTF8) to convert UTF-8 to UTF-16. I think the issue in your case is that "Txt" is a VB6 String which is UTF-16 already, so you're not actually passing a UTF-8 string to MultiByteToWideChar. It should probably be a Byte array instead, or a Long if it's a pointer to a null-terminated string (eg. as returned by TAGS_Read). Note that when it is null-terminated, you can simply use -1 in the input length parameter:

Code: [Select]
    cchWideChar = MultiByteToWideChar(CP_UTF8, 0, Txt, -1, 0, 0)

There's some code for converting UTF-8 to a VB string (and vice versa) here:

   www.di-mgt.com.au/howto-convert-vba-unicode-to-utf8.html
« Last Edit: 1 Sep '22 - 12:29 by Ian @ un4seen »

jpf

  • Posts: 182
Re: tags18 utf-16
« Reply #2 on: 1 Sep '22 - 19:33 »
Thanks for the tip, Ian.

I didn't get it to work, though.

I was doing at least one wrong thing:
I was getting the lenght of the string returned as a string pointer by TAGS_Read using lstrlenW, but UTF-8 strings aren't wide characters. According to Microsoft a UTF-8 character can occupy from 1 to 3 bytes. Also I don't know if the null terminator should be a wide character. Microsoft says something like "don't asume anything".

So, using your suggestion of passing -1 as the string lenght, I didn't bother to find the actual lenght in advance, but let MultiByteToWideChar guess it.

Code: [Select]
    Dim TagPtr As Long
    Dim Buff As String
    Dim BuffLen As Long
    Const CP_UTF8 As Long = 65001

    TAGS_SetUTF8 BASSTRUE
    TagPtr = TAGS_Read(hTag, TagId)
    If TagPtr <> 0 Then
        BuffLen = MultiByteToWideChar(CP_UTF8, 0, TagPtr, -1, 0, 0)
        If BuffLen = 0 Then Exit Sub
        Buff = String$(BuffLen, vbNullChar)
        BuffLen = MultiByteToWideChar(CP_UTF8, 0, TagPtr, -1, StrPtr(Buff), BuffLen)
        If BuffLen = 0 Then Exit Sub
        Buff = Mid(Buff, 1, BuffLen - 1) 'trim the null terminator
        If Trim(Buff) <> "" Then
            fgPl.TextMatrix(RowX, ColX) = Buff
        End If
    End If
This worked OK for tags containing only Ansi characters, like %YEAR and %TRCK, but those containing non-Ansi characters still show as question marks instead of squares. This is the same behaviour I got with my previous code.

I tried to use the code linked by you but stumbled on the ptroblem of converting the string pointer returned by TAGS_Read into a Byte array. I don't know how to do it.

The usual way to do it would be to Dim the byte array and copy the bytes from the internal buffer of tags18 into it. Something like this:

ReDim abytBuf(lLen)
'Copy the memory contents into a they byte buffer
Call CopyMemory(abytBuf(0), ByVal lPtr, lLen)

But, how can I get the lenght lLen? lstrlenW expects wide characters (and wide null terminator?) and lstrlenA expects a single byte 0 as terminator. None of them would work (I tried both!). I'm lost.
I thought I could use an arbitrary insanely large iLen and trust that MultiByteToWideChar will find the null terminator and discard the rest of the array, but it doesn't seem fool-proof (unless tags are restricted to a known maximum lenght?).

I can't declare TAGS_Read as returning a byte array like TAGS_Read(hTag as Long, TagId as string)() as Byte because a VB6 byte array is a safarray (a structure), not a string.

So I'm stuck.

I discarded other trivial problems like the tags including invalid UTF-8 points and the MSHFlexGrid not being able to display UTF-16 characters with appropiate tests. Everything works as a charm except the non-Ansi tags.

I can use more help.

If you have some working C code that I can pack into a dll and call from VB6 code I'd like to give it a try. I've coded a few dozens of stdcall functions into dlls but this specific problem is beyond me.

Thanks in advance!

Steve Grant

  • Posts: 190
Re: tags18 utf-16
« Reply #3 on: 2 Sep '22 - 09:04 »
I have used many versions of reading Tags from Tags.dll with VB6. This was because my needs changed.

Currently I use
TAGS_SetUTF8 false

Code: [Select]
Private Declare Function GetMem4 Lib "msvbvm60" (Src As Any, Dst As Any) As Long
Private Declare Function SysAllocStringByteLen Lib "oleaut32" (ByVal psz As Long, ByVal cbLen As Long) As Long
Private Declare Function lstrlenA Lib "kernel32" (ByVal lpString As Long) As Long

                Album = StrAnsiPtr(TAGS_Read(Strm, "%ALBM"))
                Year = StrAnsiPtr(TAGS_Read(Strm, "%YEAR"))
                Artist = StrAnsiPtr(TAGS_Read(Strm, "%ARTI"))
               Title = StrAnsiPtr(TAGS_Read(Strm, "%TITL")) etc. Remember Title and Year are reserved words I've just used them for your guidance.

Private Function StrAnsiPtr(ByVal lpStr As Long) As String
    GetMem4 SysAllocStringByteLen(lpStr, lstrlenA(lpStr)), ByVal VarPtr(StrAnsiPtr)
    StrAnsiPtr = StrConv(StrAnsiPtr, vbUnicode)
End Function

I also have an old version using DecodeUTF8(StrAnsiPtr(TAGS_Read(Strm, "%ALBM"))) if TAGS_SetUTF8 true. If this does not meet all your needs, let me know and I will post the other.
« Last Edit: 2 Sep '22 - 11:00 by Steve Grant »

jpf

  • Posts: 182
Re: tags18 utf-16
« Reply #4 on: 2 Sep '22 - 16:26 »
Thanks, Steve!

I tried your code. It won't get the multi-lingual UTF-16 characters they're displayed as question marks), and that's my actual problem. The tags I'm currently missing have some Japanese characters in them (along with latin). Other files have Cyrilic, Chinese, Arabic, Thai or Hebrew characters, and some even mix Cyrilic, Chinese and Latin characters in their tags. I want them all displayed correctly in the MSHFlexGrid of my player under Windows XP.

About your older version, I don't seem to have DecodeUTF8 declared in my player, but maybe it's similar to the one I used time ago (before tags18 was available) to decode UTF-8 strings from BASS_ChannelGetTags into Ansi strings . If that's so, it won't do, either.

Real multilingual support on VB6 running under Windows XP seems hard to achieve, at least for me. If it turns too difficult I'll settle for what I achieved so far. That is: play dropped files with non-Ansi characters in their names and display their multilingual filenames correctly in the FlexGrid. I think I can work out a few other issues, like multilingual InputBox and TextBox. A CommomDialog would be useful, too.

I started translating the VB6 code to VB.net but it's boring; there are thousands of lines. If I can stick to VB6 I'd rather do that. Besides, this player is just for my personal use.

Steve Grant

  • Posts: 190
Re: tags18 utf-16
« Reply #5 on: 2 Sep '22 - 22:16 »
OK Try this. Bear in mind that VB6 will not (in most cases) display unicode text. You will just get ? or boxes.
Here is a Unicode capable msgbox which will show characters correctly if you decode them right.

Code: [Select]
Function MsgBox(Prompt As String, Optional Buttons As VbMsgBoxStyle = vbOKOnly, Optional Title As String) As VbMsgBoxResult
  MsgBox = CreateObject("WScript.Shell").Popup(Prompt, 0&, Title, Buttons)
End Function

Code: [Select]
Public Const CP_UTF8 As Long = 65001 ' UTF-8 Code Page
TAGS_SetUTF8 true
Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, _
    ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long

Public Function DecodeUTF8(ByVal uText As String) As String
    Dim lDataLength As Long, Utf8() As Byte
    If Len(uText) = 0 Then Exit Function
    Utf8 = StrConv(uText, vbFromUnicode)                                                    ' Convert from unicode text to ANSI values
    lDataLength = MultiByteToWideChar(CP_UTF8, 0, VarPtr(Utf8(0)), UBound(Utf8) + 1, 0, 0)  ' Get the length of the data.
    DecodeUTF8 = String$(lDataLength, 0)                                                    ' Create array big enough
    MultiByteToWideChar CP_UTF8, 0, VarPtr(Utf8(0)), UBound(Utf8) + 1, StrPtr(DecodeUTF8), lDataLength
End Function

jpf

  • Posts: 182
Re: tags18 utf-16
« Reply #6 on: 3 Sep '22 - 03:31 »
Thanks, Steve, you made my day!

Your DecodeUTF8 function does exactly what I need. The non-Ansi characters show properly in the MSHFlexGrid cells.

Now I have an additional problem, but not related to the tags:

I keep the filenames, tags, and other parameters of each song in a DAO 3.6 database using the "Microsoft DAO 3.6 Object Library" set as a reference in the project.

It works fine storing Ansi strings in the text fields, but if I assign a string containing non-Ansi characters nasty things happen. In the best case non-Ansi characters get converted to question marks. In the worst case the table cursor is moved to a random row and the Update fails.

I know this is not related to the Bass library so this is not the right forum to ask about it, but if you or anybody else has some suggestion, I can really use it.

Thanks again for all your help!

Steve Grant

  • Posts: 190
Re: tags18 utf-16
« Reply #7 on: 3 Sep '22 - 09:11 »
I am glad it worked out for you.

I am not an expert in databases as in my initial tests (30 years ago) I found searching arrays much much quicker and have stayed that way. Obviously SQLite has come along since and I believe that is much faster. Try https://www.vbforums.com/forumdisplay.php?1-Visual-Basic-6-and-Earlier as there are some really knowledgeable people on there.

jpf

  • Posts: 182
Re: tags18 utf-16
« Reply #8 on: 3 Sep '22 - 17:36 »
Thanks, Steve!

I just registered to the forum and posted the question there.