| Store | Cart

[Python-ideas] Create a StringBuilder class and use it everywhere

From: k_bx <k...@ya.ru>
Thu, 25 Aug 2011 12:28:14 +0300
Hi!

There's a certain problem right now in python that when people need to build string from pieces they really often do something like this::

    def main_pure():
        b = u"initial value"
        for i in xrange(30000):
            b += u"more data"
        return b

The bad thing about it is that new string is created every time you do +=, so it performs bad on CPython (and horrible on PyPy). If people would use, for example, list of strings it would be much better (performance)::

    def main_list_append():
        b = [u"initial value"]
        for i in xrange(3000000):
            b.append(u"more data")
        return u"".join(b)

The results are::

    k...@kost-laptop:~/tmp$ time python string_bucket_pure.py 

    real	0m7.194s
    user	0m3.590s
    sys	0m3.580s
    k...@kost-laptop:~/tmp$ time python string_bucket_append.py 

    real	0m0.417s
    user	0m0.330s
    sys	0m0.080s

Fantastic, isn't it?

Also, now let's forget about speed and think about semantics a little: your task is: "build a string from it's pieces", or in other words "build a string from list of pieces", so from this point of view you can say that using [] and u"".join is better in semantic way.

Java has it's StringBuilder class for a long time (I'm not really into java, I've just been told about that), and what I think is that python should have it's own StringBuilder::

    class StringBuilder(object):
        """Use it instead of doing += for building unicode strings from pieces"""
        def __init__(self, val=u""):
            self.val = val
            self.appended = []

        def __iadd__(self, other):
            self.appended.append(other)
            return self

        def __unicode__(self):
            self.val = u"".join((self.val, u"".join(self.appended)))
            self.appended = []
            return self.val

Why StringBuilder class, not just use [] + u''.join ? Well, I have two reasons for that:

1. It has caching
2. You can document it, because when programmer looks at [] + u"" method he doesn't see _WHY_ is it done so, while when he sees StringBuilder class he can go ahead and read it's help().

Performance of StringBuilder is ok compared to [] + u"" (I've increased number of += from 30000 to 30000000):

    def main_bucket():
        b = StringBuilder(u"initial value ")
        for i in xrange(30000000):
            b += u"more data"
        return unicode(b)

For CPython::

	k...@kost-laptop:~/tmp$ time python string_bucket_bucket.py 

	real	0m12.944s
	user	0m11.670s
	sys	0m1.260s

	k...@kost-laptop:~/tmp$ time python string_bucket_append.py 

	real	0m3.540s
	user	0m2.830s
	sys	0m0.690s

For PyPy 1.6::

	(pypy)k...@kost-laptop:~/tmp$ time python string_bucket_bucket.py 

	real	0m18.593s
	user	0m12.930s
	sys	0m5.600s

	(pypy)k...@kost-laptop:~/tmp$ time python string_bucket_append.py 

	real	0m16.214s
	user	0m11.750s
	sys	0m4.280s

Of course, C implementation could be done to make things faster for CPython, I guess, but really, in comparision to += method it doesn't matter now. It's done to be explicit.

p.s.: also, why not use cStringIO?
1. it's not semantically right to create file-like string just to join multiple string pieces into one. 
2. if you talk about using it in your code right away -- you can see that noone still uses it because people want += (while with StringBuilder you give them +=).
3. it's somehow slow on pypy right now :-)

Thanks.
_______________________________________________
Python-ideas mailing list
Pyth...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Recent Messages in this Thread
k_bx Aug 25, 2011 09:28 am
M.-A. Lemburg Aug 25, 2011 09:45 am
k_bx Aug 25, 2011 09:57 am
M.-A. Lemburg Aug 25, 2011 10:19 am
Larry Hastings Aug 25, 2011 10:34 am
Dirkjan Ochtman Aug 29, 2011 08:35 am
M.-A. Lemburg Aug 29, 2011 09:27 am
Masklinn Aug 29, 2011 09:44 am
M.-A. Lemburg Aug 29, 2011 10:25 am
Antoine Pitrou Aug 29, 2011 12:40 pm
Carl Matthew Johnson Aug 25, 2011 09:53 am
k_bx Aug 25, 2011 09:56 am
Masklinn Aug 25, 2011 10:01 am
Steven DAprano Aug 25, 2011 01:57 pm
k_bx Aug 25, 2011 10:38 am
Georg Brandl Aug 25, 2011 10:50 am
k_bx Aug 25, 2011 10:55 am
Terry Reedy Aug 25, 2011 03:41 pm
Antoine Pitrou Aug 25, 2011 04:35 pm
Antoine Pitrou Aug 25, 2011 11:36 am
Nick Coghlan Aug 25, 2011 11:47 am
Mike Graham Aug 25, 2011 12:31 pm
Steven DAprano Aug 25, 2011 02:00 pm
Stefan Behnel Aug 25, 2011 03:15 pm
Arnaud Delobelle Aug 25, 2011 02:02 pm
Terry Reedy Aug 25, 2011 03:24 pm
Antoine Pitrou Aug 27, 2011 12:45 am
Messages in this thread