These advanced information are stored in their own MP4 track, so it isn't part of the meta mp4 header as simple text tags are.
The data aren't stored as XML in the MP4 file anymore, they are joined into the whole MP4 stream to actually stream all the informations (cover images, titles, links) during playing.
One can dump these tracks with mp4box. The titles and the timing they apply to are stored in a text track with fourcc text (track 2 in the one podcast I got from beatport). The dumped rawdata alone isn't much, as it needs the track header which contains the actual offsets and timing information (timescale and durations).
You can get all the informations on how to find what at
http://www.geocities.com/xhelmboyx/quicktime/formats/mp4-layout.txtYes it's big :-) First I thought I'd look at some libraries others have done... Then I found some huge uber complex libraries and decided to take the dirty route and just write a parser which gets me what I want (cover image data).