Welcome, guest | Sign In | My Account | Store | Cart

A queue data structure, for string data only, which looks like a File object. This class takes care of the list.append and "".join mess, which is needed for fast string concatenation.

Python, 46 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class StringQueue(object):
    def __init__(self, data=""):
        self.l_buffer = []
        self.s_buffer = ""
        self.write(data)

    def write(self, data):
        #check type here, as wrong data type will cause error on self.read,
        #which may be confusing.
        if type(data) != type(""):
            raise TypeError, "argument 1 must be string, not %s" % type(data).__name__
        #append data to list, no need to "".join just yet.
        self.l_buffer.append(data)

    def _build_str(self):
        #build a new string out of list
        new_string = "".join(self.l_buffer)
        #join string buffer and new string
        self.s_buffer = "".join((self.s_buffer, new_string))
        #clear list
        self.l_buffer = []

    def __len__(self):
        #calculate length without needing to _build_str
        return sum(len(i) for i in self.l_buffer) + len(self.s_buffer)

    def read(self, count=None):
        #if string doesnt have enough chars to satisfy caller, or caller is
        #requesting all data
        if count > len(self.s_buffer) or count==None: self._build_str()
        #if i don't have enough bytes to satisfy caller, return nothing.
        if count > len(self.s_buffer): return ""
        #get data requested by caller
        result = self.s_buffer[:count]
        #remove requested data from string buffer
        self.s_buffer = self.s_buffer[len(result):]
        return result


if __name__ == "__main__":
    sq = StringQueue()
    sq.write('some data')
    print sq.read(4)
    sq.write('_and_some_more_data_!')
    print sq.read(4)
    print sq.read()

I use this class to buffer network data as it arrives over a socket. I think it is a little bit smarter than appending to and slicing up a string. The string joining is only performed when the read method call occurs, and when there is not enough string data to satsify the caller.

  • I've added a type check to the write method, and now return an empty string from the read method, if there are not enough bytes in the queue to satisfy the caller.
  • added __len__ method.

10 comments

Tim Delaney 18 years, 9 months ago  # | flag

(c)StringIO. This need is pretty much met by StringIO and cStringIO. The difference in performance between cStringIO and "".join(iterable) is pretty much dependent entirely on the actual data - there's generally no performance reason to prefer one over the other.

Ori Peleg 18 years, 9 months ago  # | flag

Right, but it's still nice. This recipe effectively maintains two positions, one for writing and one for reading. This way you can keep adding data for read() using write(), which is useful for socket communication.

Here's an implementation that leverages StringIO:

def independent_position(func):
    attrname = "_%s_pos" % id(func)
    def wrapper(self, *args, **kwargs):
        self.seek(getattr(self, attrname, 0))
        try: return func(self, *args, **kwargs)
        finally: setattr(self, attrname, self.tell())
    return wrapper

from StringIO import StringIO
class StringQueue(StringIO):
    write = independent_position(StringIO.write)
    read  = independent_position(StringIO.read)
S W (author) 18 years, 9 months ago  # | flag

Agreed, StringIO doesn't fit the requirement exactly. Neat hack! The independent read and write position is not met by StringIO, hence the above recipe.

I think I like yours more than the original. :-)

Ori Peleg 18 years, 9 months ago  # | flag

I was needing such a thing for a while... ... but kept putting it off. I liked your recipe, and then Tim's comment made me think. :-)

Is there a simple way to use cStringIO instead of StringIO? Can't subclass it...

Tiago Macambira 18 years, 9 months ago  # | flag

cStringIO lacks write. cStringIO lacks write(), AFSIK.

Anyway, the StringIO approach would be more memory hugry: the whole written and read data would be kept in memory for as long as the queue is in use. The recipe, on the other hand, just holds unread data - for most network apps just be enough.

Ian Bicking 18 years, 9 months ago  # | flag

EOF Bug. There's a bug in this, which is hard to get rid of. "" (the empty string) signals EOF; however, if there is no queued data, this does not actually imply there's no additional data.

Sadly there's not a very good way around this. You could change the file protocol and say "" is a reasonable return, and you should catch EOFError to signal the end (assuming you also add a .close() method -- though would it mean you close the reader or the writer interface?).

The other way is to block when no data is waiting, on the hope that some other thread will add data. And you have to set up the proper events to do that properly. Which is a bit painful as well, and of course only makes sense in a threaded environment, since without threads there's no way data could appear while you are blocking on a read.

S W (author) 18 years, 9 months ago  # | flag

EOF Bug? I'm not sure I follow.

I can't see where I would need an EOFError, except to mimic the File object more closely. In all instances where I've used this class, I've not found a need to implement this.

Ian Bicking 18 years, 9 months ago  # | flag

example. An example might go like this:

def poll_file(filename, squeue):
    last_size = 0
    while 1:
        size = os.stat(filename).st_size
        if size <= last_size:
            time.sleep(1)
        f = open(filename, 'rb')
        f.seek(last_size)
        content = f.read()
        f.close()
        last_size += len(content)
        squeue.write(content)

my_squeue = StringQueue()
threading.Thread(target=poll_file, args=('f1.txt' my_squeue)).start()

def consumer():
    while 1:
        content = my_squeue.read(1000)
        if not content:
            break
        print 'New content:', repr(content)

It's a little contrived, but anyway... the point is, if my_squeue was a normal file, then it would return strings in response to .read(), until it returned the empty string, at which point EOF would have been reached and the loop should terminate. But with StringQueue, it will return an empty string if there's no input waiting, but it doesn't mean EOF was reached, just that there's no waiting input.

Of course, this is only an issue if you are really treating the queue just like you would a file. But if you weren't, then a list or Queue.Queue object would work just as well.

S W (author) 18 years, 8 months ago  # | flag

Queue.Queue and list. "But if you weren't, then a list or Queue.Queue object would work just as well"

I use this class to read data from a network socket, where I generally know exactly how many bytes I need to read. As stated above: 'This class takes care of the list.append and "".join mess, which is needed for fast string concatenation.'

It is really just a simple wrapper around a list object, not a complete File-like class.

Robert Kern 18 years, 8 months ago  # | flag

Only if initialized with a string.

import cStringIO
writable = cStringIO.StringIO()
writable.write("You can write to me ...")
unwritable = cStringIO.StringIO("... but not to me.")
Created by S W on Fri, 17 Jun 2005 (PSF)
Python recipes (4591)
S W's recipes (20)

Required Modules

  • (none specified)

Other Information and Tasks