Author Topic: Hashing an audio file without the metadata  (Read 1339 times)

jasona

  • Posts: 35
Hashing an audio file without the metadata
« on: 13 Apr '10 - 11:13 »
Hi there,

I would like to compare the audio content of 2 audio files. I have thought about doing a CRC32 check on each of the files but if the metadata is different in the files then the CRC32s won't match.

Is it possible to read all the audio byte data of one file, create a hash and then read all the audio byte data of another file, create another hash and compare the hashes.

These files will be compared across a network so I can't send the exact file for comparison and this needs to be a fairly quick operation (< 1 sec).

Thanks,
Jason Allen

radio42

  • Posts: 4576
Re: Hashing an audio file without the metadata
« Reply #1 on: 13 Apr '10 - 12:00 »
You might create a decoding stream via BASS_StreamCreateFile and then call BASS_ChannelGetData in a loop to receive all the raw PCM sample data (decoded) and use these to create a hash value.

jasona

  • Posts: 35
Re: Hashing an audio file without the metadata
« Reply #2 on: 13 Apr '10 - 13:12 »
Not the quickest method in the world - takes about 3.5 secs to compare both files.

Would comparing a small subset of data be reliable enough? Say, a few seconds at the same position?

radio42

  • Posts: 4576
Re: Hashing an audio file without the metadata
« Reply #3 on: 13 Apr '10 - 14:23 »
As you want to use a hash value (which anyhow is never fully unique) I guess building it with e.g. a 30 second part of the two tracks should also be pretty save.

As an alternative you might create the decoding stream (as already done) and then call BASS_StreamGetFilePosition with the BASS_FILEPOS_START resp. BASS_FILEPOS_END. This will give you the start and end position in bytes of the raw audio file.
You might then manually (re)open the mp3 file yourself in binary mode (not using BASS, but with a native file function) and calculate the hash using the above positions.
This would also ensure, that only the audio data (in this case the still encoded mp3 data) is read without any TAG data.

jasona

  • Posts: 35
Re: Hashing an audio file without the metadata
« Reply #4 on: 13 Apr '10 - 17:27 »
Perfect, that's exactly what I was looking for.

Thanks :)

Ian @ un4seen

  • Administrator
  • Posts: 20424
Re: Hashing an audio file without the metadata
« Reply #5 on: 14 Apr '10 - 15:48 »
Just a word of caution. That method will be fine with some file formats, but probably not with others. File formats that use ID3 or APE tags (as well as WAV files) should be fine, as the BASS_FILEPOS_START and BASS_FILEPOS_END file positions will take account of those, but they may not take account of other tag types (eg. OGG/WMA/MP4) that are more integrated with the file format.