Author Topic: Tags of some files aren't read correctly  (Read 3618 times)

grokoder

  • Posts: 29
Tags of some files aren't read correctly
« on: 1 Jan '15 - 04:56 »
BASS doesn't read tags of some files correctly. (Tags of these files are read correctly by Windows and Songbird, so this must be a bug in BASS).
It looks like BASS is using ASCII encoding for all non-Unicode tags. It should use system default encoding instead.

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #1 on: 1 Jan '15 - 05:07 »
UPD: BASS is using Windows-1252, while it should use system default encoding instead.
« Last Edit: 1 Jan '15 - 05:20 by grokoder »

Ian @ un4seen

  • Administrator
  • Posts: 23475
Re: Tags of some files aren't read correctly
« Reply #2 on: 2 Jan '15 - 15:45 »
BASS doesn't do any translation of tags itself (except to UTF-8 with Java on Android), ie. BASS_ChannelGetTags just gives what's in the file. Are you using the TAGS add-on? If so, it will produce ANSI text by default, but you can use the "UTF8" option to tell it to produce UTF-8 text instead. For example...

Code: [Select]
title=TAGS_Read(handle, "%UTF8(%TITL)"); // get title in UTF-8 form

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #3 on: 3 Jan '15 - 03:55 »
I'm using a BassTags.BASS_TAG_GetFromFile function in BASS .NET. It has to do some translation, because original tags can be ANSI strings, and this method returns a TAG_INFO class with System.String fields (and this type is always using UTF16).
I could do translation to a proper code page myself, but I don't see any way to detect if the encoding of the original tags is ANSI or Unicode.

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #4 on: 5 Jan '15 - 13:13 »
So, any help with this bug?

Ian @ un4seen

  • Administrator
  • Posts: 23475
Re: Tags of some files aren't read correctly
« Reply #5 on: 5 Jan '15 - 14:33 »
What type of tags do the affected files have, eg. ID3v1?

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #6 on: 6 Jan '15 - 08:31 »
What type of tags do the affected files have, eg. ID3v1?
It returns this type:

      tagType   BASS_TAG_ID3V2   Un4seen.Bass.BASSTag

Ian @ un4seen

  • Administrator
  • Posts: 23475
Re: Tags of some files aren't read correctly
« Reply #7 on: 6 Jan '15 - 17:24 »
OK, it appears to be ID3v2 tags then. ID3v2 tags can specify the text encoding (unlike ID3v1), so there should not be any character set issues with them, assuming that the software that wrote the tags did it properly. The text can be either ISO-8859-1 or UTF-16 (or UTF-8 with ID3v2.4). Which is your files using? If it's ISO-8859-1, then it would be correct to use the Windows-1252 code page for them. If you're unsure, you can upload one of the files and I'll have a look...

   ftp.un4seen.com/incoming/

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #8 on: 7 Jan '15 - 03:04 »
I've uploaded a sample file, but I don't see it in the directory listing.
I'm not sure if these tags are "theoretically correct", but all players I have and Windows read them correctly. Encoding used in this file is Windows-1251, not ISO-8859-1.

Ian @ un4seen

  • Administrator
  • Posts: 23475
Re: Tags of some files aren't read correctly
« Reply #9 on: 7 Jan '15 - 17:39 »
Yep, the problem is that the file's ID3v2 tags say that they have ISO-8859-1 text when they don't. If you have set Windows' non-Unicode code page is set to 1251, then the tags may look OK in some software (that uses Windows' code page setting instead of ISO-8859-1) on your system, but they won't look OK in that software on other systems that have a different code page setting. Those tags should be using UTF-16 text instead.

I've never looked for such a thing myself, so I'm not sure if it exists, but perhaps there is software that can change ID3v2 tags from ISO-8859-1 to UTF-16 and allow you to specify a code page to use (or use the Windows setting) when doing that, which you could use to fix the tags in your file(s).

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #10 on: 8 Jan '15 - 02:34 »
I've never looked for such a thing myself, so I'm not sure if it exists, but perhaps there is software that can change ID3v2 tags from ISO-8859-1 to UTF-16 and allow you to specify a code page to use (or use the Windows setting) when doing that, which you could use to fix the tags in your file(s).
I can do this translation in my code, but I need to know what was the original tags encoding.

Ian @ un4seen

  • Administrator
  • Posts: 23475
Re: Tags of some files aren't read correctly
« Reply #11 on: 8 Jan '15 - 15:34 »
According to the ID3 specs, the text should be ISO-8859-1. When it isn't (like in your example), I don't think there is any way to be certain about what code page should be used for Unicode conversion. You may just need to try various code pages until you find one that looks right.

Google brings up this page that might be useful...

   http://digest.digitaltalker.com/How-to-Auto-Convert-MP3-ID3-Tag-Charset-to-Unicode-UTF-8/

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #12 on: 8 Jan '15 - 16:10 »
According to the ID3 specs, the text should be ISO-8859-1.
It can be another encoding as well. I need to know if tags were in ISO-8859-1, and not UTF16 or UTF8. Because if it was ISO-8859-1, then I need to convert it using system's code page. And if it was UTF16, I don't need to. So, how can I check if ISO-8859-1 was specified?

Ian @ un4seen

  • Administrator
  • Posts: 23475
Re: Tags of some files aren't read correctly
« Reply #13 on: 8 Jan '15 - 17:39 »
To get that information, you would need to look inside the ID3v2 tag block, which is available from BASS_ChannelGetTags (tags=BASS_TAG_ID3V2). Although not common, note that each tag/frame in an ID3v2 block can have different text encoding, ie. some ISO-8859-1 and some UTF-16. If you're unfamiliar with the structure of ID3v2 tags, then you could use a 3rd-party tagging library to parse them. Perhaps such a library will even allow you specify a code page to use when parsing "ISO-8859-1" text.

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #14 on: 9 Jan '15 - 02:25 »
Ughm. Is it impossible to expose this information from BASS?

Ian @ un4seen

  • Administrator
  • Posts: 23475
Re: Tags of some files aren't read correctly
« Reply #15 on: 9 Jan '15 - 13:45 »
It is exposed in the ID3v2 tag block, available from BASS_ChannelGetTags. Please note that BASS doesn't parse the ID3v2 tag block itself, so it wouldn't really make sense for it to start parsing just the text encoding information. Parsing of ID3v2 tags is left up to the user, eg. using the TAGS add-on or other libraries like id3lib or taglib.

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #16 on: 12 Jan '15 - 03:04 »
Hm. Then, is it possible to extend Bass.AddOn.Tags to support this?

Ian @ un4seen

  • Administrator
  • Posts: 23475
Re: Tags of some files aren't read correctly
« Reply #17 on: 12 Jan '15 - 17:36 »
You mean tell what text encoding is used in the tags? I don't think that is feasible for the TAGS add-on, as it can deal with multiple tags at a time and the encoding could be different for each of them. But perhaps an option can be added to specify what code page to use when converting tags from "ISO-8859-1" to UTF-8. I'll look into it.

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #18 on: 13 Jan '15 - 07:30 »
But perhaps an option can be added to specify what code page to use when converting tags from "ISO-8859-1" to UTF-8. I'll look into it.
That would be great.

radio42

  • Posts: 4716
Re: Tags of some files aren't read correctly
« Reply #19 on: 13 Jan '15 - 08:13 »
Quote
But perhaps an option can be added to specify what code page to use when converting tags from "ISO-8859-1" to UTF-8. I'll look into it.
This is what I have done in BASS.NET (which can natively handle ID3v2 tags by its own), there is a "BassNet.UseBrokenLatin1Behavior" property.

As Ian already explained: Many media players and taggers incorrectly treat Latin-1 fields as "default encoding" fields. As such, a tag may end up with Windows-1250 resp. Windows-1252 encoded text (effectively those players/taggers do use the default windows code page). Using the "BassNet.UseBrokenLatin1Behavior" property you can define if you want to treat Latin-1 encoded fields really as native Latin-1 or if you want to use the default windows code page instead.

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #20 on: 13 Jan '15 - 15:19 »
Using the "BassNet.UseBrokenLatin1Behavior" property you can define if you want to treat Latin-1 encoded fields really as native Latin-1 or if you want to use the default windows code page instead.
Cool. Thanks, that's what I needed.

Ian @ un4seen

  • Administrator
  • Posts: 23475
Re: Tags of some files aren't read correctly
« Reply #21 on: 13 Jan '15 - 17:24 »
Here's the TAGS add-on update with a codepage option...

   www.un4seen.com/stuff/tags.zip

It adds a TAGS_SetCP function to set a codepage to use for "ISO-8859-1" tags. The default is 1252 (Windows Latin 1). To use the system default codepage, you would set it to 0 (or CP_ACP)...

Code: [Select]
TAGS_SetCP(0);

grokoder

  • Posts: 29
Re: Tags of some files aren't read correctly
« Reply #22 on: 14 Jan '15 - 01:51 »
Thanks.