unicode support request

Started by Peter, 17 Jun '03 - 12:20

Peter

Hello.

An option to load filenames with Unicode characters would be usefull to me, and probably for other people too.  Some files on my hdd have none standard ANSI characters and those can't currently be loaded by BASS.

Something like :
BASS_StreamCreateFile(FALSE,L"afile.mp3",0,0,BASS_FILE_UNICODE);

Is this possible to implement something similar?

Ian @ un4seen

It shouldn't be a problem to quickly add the option for BASS 1.8a :)

Peter

Thanks for the new feature.
It's working great!

Irrational86

One question Ian, if you implement UNICODE char support, why dont you just make it default, instead of having to use a Flag?? Whats the difference of using a flag as to not using it with regular non-UNICODE filenames?

Ian @ un4seen

Unicode characters are 16-bit, while ANSI characters are 8-bit. So different processing is required for each, and BASS needs to know what type of string the parameter is - hence the flag :)

Irrational86

And what if you always use the flag? will it affect the loading of non-UNICODE filenames?? or affect anything at all?

DanaPaul


QuoteUnicode characters are 16-bit, while ANSI characters are 8-bit. So different processing is required for each, and BASS needs to know what type of string the parameter is - hence the flag :)

Well, the Unicode MP3.ID3v2 tags that I've come across have the first 2 bytes of the string set to indicate a Unicode string, therefore I've been able to detect these (rare) Unicode strings on the fly.  Would something like this accomodate most Unicode instances?

In Delphi speak...

function FixStr(StrToFix: string): string;
begin
  Result := StrToFix;
  if Length(Result) > 1 then begin
  if (Ord(Result[1]) = 255) and (Ord(Result[2]) = 254) then
    begin
    if Length(Result) > 2 then begin
      Result := System.Copy(Result, 3, Length(Result));
      s := WideCharToString(PWideChar(Result));
      Result := StrPas(PChar(s));
      end
    else Result := '';
    end;
  end;
end;


Ian @ un4seen

Here's the problem... for example, if you have the Unicode string "ABC", in byte form that is 'A',0,'B',0,'C',0,0,0 - BASS can't know if that's meant to be a Unicode "ABC" or an ANSI "A". It could check beyond the trailing 0, but that's asking for access violations :)

Irrational86

Ok, ok, now i get it :laugh:...thanks a lot Ian

DanaPaul


QuoteHere's the problem... for example, if you have the Unicode string "ABC", in byte form that is 'A',0,'B',0,'C',0,0,0

I haven't had any problem with Unicode strings that set the first 2 bytes (ahem, one DoubleByte or Word) to FFFE indicating a Unicode string.  Flags scattered about the file or flagged function parameters not needed.

However, you can do as you please, you're the boss :)

Ian @ un4seen

QuoteI haven't had any problem with Unicode strings that set the first 2 bytes (ahem, one DoubleByte or Word) to FFFE indicating a Unicode string.
That's only in ID3v2 tags, to indicate the byte order... you can actually check the previous byte to see if it's a Unicode string, 1 = yes :)

DanaPaul


QuoteThat's only in ID3v2 tags, to indicate the byte order... you can actually check the previous byte to see if it's a Unicode string, 1 = yes :)

Oh?  I'll have to look into that.  The flag is part of the Tag frame, eh?

I haven't parsed WMA files (without Bass) yet, and I don't plan to. :)

Ian @ un4seen

QuoteOh?  I'll have to look into that.  The flag is part of the Tag frame, eh?
Yep...
QuoteIf nothing else is said a string is represented as ISO-8859-1 characters in the range $20 - $FF. Such strings are represented as <text string>, or <full text string> if newlines are allowed, in the frame descriptions. All Unicode strings use 16-bit unicode 2.0 (ISO/IEC 10646-1:1993, UCS-2). Unicode strings must begin with the Unicode BOM ($FF FE or $FE FF) to identify the byte order.

All numeric strings and URLs are always encoded as ISO-8859-1. Terminated strings are terminated with $00 if encoded with ISO-8859-1 and $00 00 if encoded as unicode. If nothing else is said newline character is forbidden. In ISO-8859-1 a new line is represented, when allowed, with $0A only. Frames that allow different types of text encoding have a text encoding description byte directly after the frame size. If ISO-8859-1 is used this byte should be $00, if Unicode is used it should be $01. Strings dependent on encoding is represented as <text string according to encoding>, or <full text string according to encoding> if newlines are allowed. Any empty Unicode strings which are NULL-terminated may have the Unicode BOM followed by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).
See the specs at www.id3.org for full details :)

DanaPaul


QuoteSee the specs at www.id3.org for full details :)

No need to lobby for self identifying strings, already specified in the standards. :)

A quote from id3.org... Hee hee...

(Software which does not behave according to items 1 and 2 above are categorically deemed "broken." Microsoft's Media Player is an example of such software.)