Welcome, guest | Sign In | My Account | Store | Cart

How to read metadata from flash video files (height, width, etc.)

Code oringinally stolen / ported from http://inlet-media.de/flvtool2

Python, 131 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
from struct import unpack
from datetime import datetime

class FLVReader(dict):
    """
    Reads metadata from FLV files
    """

    # Tag types
    AUDIO = 8
    VIDEO = 9
    META = 18
    UNDEFINED = 0

    def __init__(self, filename):
        """
        Pass the filename of an flv file and it will return a dictionary of meta
        data.
        """
        # Lock on to the file
        self.file = open('x.flv', 'rb')
        self.signature = self.file.read(3)
        assert self.signature == 'FLV', 'Not an flv file'
        self.version = self.readbyte()
        self.typeFlags = self.readbyte()
        self.dataOffset = self.readint()
        extraDataLen = self.dataOffset - self.file.tell()
        self.extraData = self.file.read(extraDataLen)
        self.readtag()

    def readtag(self):
        unknown = self.readint()
        tagType = self.readbyte()
        dataSize = self.read24bit()
        timeStamp = self.read24bit()
        unknown = self.readint()
        if tagType == self.AUDIO:
            print "Can't handle audio tags yet"
        elif tagType == self.VIDEO:
            print "Can't handle video tags yet"
        elif tagType == self.META:
            endpos = self.file.tell() + dataSize
            event = self.readAMFData()
            metaData = self.readAMFData()
            # We got the meta data.
            # Our job is done.
            # We are complete
            self.update(metaData)
        elif tagType == self.UNDEFINED:
            print "Can't handle undefined tags yet"

    def readint(self):
      data = self.file.read(4)
      return unpack('>I', data)[0]

    def readshort(self):
      data = self.file.read(2)
      return unpack('>H', data)[0]

    def readbyte(self):
      data = self.file.read(1)
      return unpack('B', data)[0]

    def read24bit(self):
      b1, b2, b3 = unpack('3B', self.file.read(3))
      return (b1 << 16) + (b2 << 8) + b3

    def readAMFData(self, dataType=None):
        if dataType is None:
            dataType = self.readbyte()
        funcs = {
            0: self.readAMFDouble,
            1: self.readAMFBoolean,
            2: self.readAMFString,
            3: self.readAMFObject,
            8: self.readAMFMixedArray,
           10: self.readAMFArray,
           11: self.readAMFDate
        }
        func = funcs[dataType]
        if callable(func):
            return func()

    def readAMFDouble(self):
        return unpack('>d', self.file.read(8))[0]

    def readAMFBoolean(self):
        return self.readbyte() == 1

    def readAMFString(self):
        size = self.readshort()
        return self.file.read(size)

    def readAMFObject(self):
        data = self.readAMFMixedArray()
        result = object()
        result.__dict__.update(data)
        return result

    def readAMFMixedArray(self):
        size = self.readint()
        result = {}
        for i in range(size):
            key = self.readAMFString()
            dataType = self.readbyte()
            if not key and dataType == 9:
                break
            result[key] = self.readAMFData(dataType)
        return result

    def readAMFArray(self):
        size = self.readint()
        result = []
        for i in range(size):
            result.append(self.readAMFData)
        return result

    def readAMFDate(self):
        return datetime.fromtimestamp(self.readAMFDouble())
        

if __name__ == '__main__':
    import sys
    from pprint import pprint
    if len(sys.argv) == 1:
        print 'Usage: %s filename [filename]...' % sys.argv[0]
        print 'Where filename is a .flv file'
        print 'eg. %s myfile.flv' % sys.argv[0]
    for fn in sys.argv[1:]:
        x = FLVReader(fn)
        pprint(x)

We used it on the exelearning project http://exelearning.org so we can import flash videos and display them properly without the user having manually specify the width and height.

When you display a flash video in a web page, you must specify the width and height of the container, if the height is too little in mozilla bad things happen.

3 comments

Wessel van Norel 17 years, 4 months ago  # | flag

Little bug in the code listed. There is a little bug in the code.

The line:

self.file = open('x.flv', 'rb')

Should be:

self.file = open(filename, 'rb')

I guess you will see that quickly enough once you try to run the code..

justtoknowmoreaboutjava Jd 15 years, 3 months ago  # | flag

Can you please post java version of this program?

Michael Lange 13 years, 7 months ago  # | flag

Nice work, exactly what I have been looking for! I have a couple of suggestions, though to fix a few issues. These are simple workarounds only, a decent fix surely would require some more thought :)

First and most important, FLVReader crashes on many of my files that have a keyframes tag, actually I could not get it to work with any of my files with the keyframes tag. So I made a change that simply omits the keyframes tag (which is probably pretty much useless in most cases anyway) to fix this. The change affects the readAMFMixedArray() method:

def readAMFMixedArray(self):
    size = self.readint()
    result = {}
    for i in range(size):
        key = self.readAMFString()
        dataType = self.readbyte()
        if not key and dataType == 9:
            break
        if key == 'keyframes':
            # some files have a keyframes tag, which looks
            # like this when read with flvlib:
            # 'keyframes': {'times': [0.0,
            #                        0.634,
            #                        0.834],
            #              'filepositions': [9971.0,
            #                                27958.0,
            #                               37502.0]}
            # but with possibly thousands of entries, which causes FLVReader to crash,
            # so omit keyframes here, we probably don't need it anyway
            break
        result[key] = self.readAMFData(dataType)
    return result

Second, any call to readAMFObject() causes a crash here (although these calls do not seem to occure anymore with keyframes disabled) with an AttributeError, so to be on the safe side I added a try...except to catch this:

def readAMFObject(self):
    data = self.readAMFMixedArray()
    try:
        result = object()
        result.__dict__.update(data)# ??? can this ever work ???
    except AttributeError:
        result = {}
    return result

Finally (and mainly cosmetic), I thought there should be a self.file.close() at the end of __init__().