Iterate over .MP4 atoms « Python recipes

This recipe yields the atoms contained in an MP4 file. Mostly used for extracting the tags contained in it (artist, title etc) using a convenience class (M4ATags). Implemented as an generator.

      import struct

FLAGS= CONTAINER, SKIPPER, TAGITEM, IGNORE, NOVERN, XTAGITEM= [2**_ for _ in xrange(6)]

# CONTAINER: datum contains other boxes
# SKIPPER: ignore first 4 bytes of datum
# TAGITEM: "official" tag item
# NOVERN: datum is 8 bytes (2 4-bytes BE integers)
# XTAGITEM: datum is a triplet (I believe) of "mean", "name", "data" items
CALLBACK= TAGITEM | XTAGITEM
FLAGS.append(CALLBACK)

TAGTYPES= (
    ('ftyp', 0),
    ('moov', CONTAINER),
    ('mdat', 0),
    ('udta', CONTAINER),
    ('meta', CONTAINER|SKIPPER),
    ('ilst', CONTAINER),
    ('\xa9ART', TAGITEM),
    ('\xa9nam', TAGITEM),
    ('\xa9too', TAGITEM),
    ('\xa9alb', TAGITEM),
    ('\xa9day', TAGITEM),
    ('\xa9gen', TAGITEM),
    ('\xa9wrt', TAGITEM),
    ('trkn', TAGITEM|NOVERN),
    ('\xa9cmt', TAGITEM),
    ('trak', CONTAINER),
    ('----', XTAGITEM),
    ('mdia', CONTAINER),
    ('minf', CONTAINER),
)

flagged= {}
for flag in FLAGS:
    flagged[flag]= frozenset(_[0] for _ in TAGTYPES if _[1] & flag)

def _xtra(s):
    "Convert '----' atom data into dictionaries"
    offset= 0
    result= {}
    while offset < len(s):
        atomsize= struct.unpack("!i", s[offset:offset+4])[0]
        atomtype= s[offset+4:offset+8]
        if atomtype == "data":
            result[atomtype]= s[offset+16:offset+atomsize]
        else:
            result[atomtype]= s[offset+12:offset+atomsize]
        offset+= atomsize
    return result

def _analyse(fp, offset0, offset1):
    "Walk the atom tree in a mp4 file"
    offset= offset0
    while offset < offset1:
        fp.seek(offset)
        atomsize= struct.unpack("!i", fp.read(4))[0]
        atomtype= fp.read(4)
        if atomtype in flagged[CONTAINER]:
            data= ''
            for reply in _analyse(fp, offset+(atomtype in flagged[SKIPPER] and 12 or 8),
                offset+atomsize):
                yield reply
        else:
            fp.seek(offset+8)
            if atomtype in flagged[TAGITEM]:
                data=fp.read(atomsize-8)[16:]
                if atomtype in flagged[NOVERN]:
                    data= struct.unpack("!ii", data)
            elif atomtype in flagged[XTAGITEM]:
                data= _xtra(fp.read(atomsize-8))
            else:
                data= fp.read(min(atomsize-8, 32))
        if not atomtype in flagged[IGNORE]: yield atomtype, atomsize, data
        offset+= atomsize

def mp4_atoms(pathname):
    fp= open(pathname, "rb")
    fp.seek(0,2)
    size=fp.tell()
    for atom in _analyse(fp, 0, size):
        yield atom
    fp.close()

class M4ATags(dict):
    "An example class reading .m4a tags"
    cvt= {
        'trkn': 'Track',
        '\xa9ART': 'Artist',
        '\xa9nam': 'Title',
        '\xa9alb': 'Album',
        '\xa9day': 'Year',
        '\xa9gen': 'Genre',
        '\xa9cmt': 'Comment',
        '\xa9wrt': 'Writer',
        '\xa9too': 'Tool',
    }
    def __init__(self, pathname=None):
        super(dict, self).__init__()
        if pathname is None: return
        for atomtype, atomsize, atomdata in mp4_atoms(pathname):
            self.atom2tag(atomtype, atomdata)

    def atom2tag(self, atomtype, atomdata):
        "Insert items using descriptive key instead of atomtype"
        if atomtype == "----":
            key= atomdata['name'].title()
            value= atomdata['data'].decode("utf-8")
        else:
            try: key= self.cvt[atomtype]
            except KeyError: return
            if atomtype == "trkn":
                value= atomdata[0]
            else:
                try: value= atomdata.decode("utf-8")
                except AttributeError:
                    print `atomtype`, `atomdata`
                    raise
        self[key]= value

if __name__=="__main__":
    import sys, pprint
    r= M4ATag(sys.argv[1]) # pathname of an .mp4/.m4a file as first argument
    pprint.pprint(r)

      

This is the result of a lot of trial and error, and it is not guaranteed to be of industrial strength; it's working fine though for tag extraction on all the .m4a files I have in my possession.

I couldn't find another Python implementation or a straightforward reference to the format, so I gathered info from here and there in order to do this; if you are more knowledgeable about MP4 items and their roles, you are very welcome updating the TAGTYPES tuple or any of the code. I assume there must be some options for multiple values in a tag, for example, since that would explain some bytes I tend to completely ignore in the code above, but since I had no such cases or a way to produce multiple values, I wouldn't know.

Hope it helps you.

Tags: files

3 comments

Samuel Gendler 17 years, 6 months ago # | flag

anyone know what's up with Genre tag? none of the open source tag libs I've found are able to correctly parse the genre tag in my aac files. iTunes is able to read it, as are other tools, so it is in there somehow, but the gen tag in this script (and every other I've tried) always comes up empty. I've finally got access to every tag i need, except for genre, so this is important.

Chris Jones 14 years, 8 months ago # | flag

For genre: You need to modify this script slightly. First, add ('gnre', TAGITEM) to the TAGTYPES list at the top. In the class attribute M4ATags.cvt, add 'gnre': 'Genre'

This will add the genre value to the dictionary. However, iTunes does not use a plain text string to encode this. It is a binary packed index value of a hard-coded set of defined genres. To decode this value:

struct.unpack('!h', genre)

This yields an integer. Now take a look at http://kobesearch.cpan.org/htdocs/Audio-M4P/Audio/M4P/QuickTime.pm.html Search for "our @genre_strings", and there is the big list of them. For example, if your genre value is 21, it is Alternative.

Custom genres are stored in the iTunes XML.

Cheers,

David Leach 10 years, 11 months ago # | flag

The script doesn't deal with the special size values. In particular, if the atom size is 1 then the true size is a uint64_t that follows (took me a bit to figure out while playing with this script).

On line 78 (in _analyse()) replace:

data= fp.read(min(atomsize-8, 32))

with:

if atomsize == 1:
    atomsize = struct.unpack("!q", fp.read(8))[0]
    data = fp.read(min(atomsize-16, 32))
else:
    data= fp.read(min(atomsize-8, 32))

◄	Python recipes (4591)	►
◄	Christos Georgiou's recipes (6)	►

Iterate over .MP4 atoms (Python recipe) by Christos Georgiou
ActiveState Code (http://code.activestate.com/recipes/496984/)

3 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Iterate over .MP4 atoms (Python recipe) by Christos Georgiou ActiveState Code (http://code.activestate.com/recipes/496984/)

3 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Iterate over .MP4 atoms (Python recipe) by Christos Georgiou
ActiveState Code (http://code.activestate.com/recipes/496984/)