Welcome, guest | Sign In | My Account | Store | Cart

This is a C-level implementation of a fast, re-entrant, optimistic lock for CPython. It is written in Cython. Under normal conditions, it is about 10x faster than threading.RLock because it avoids all locking unless two or more threads try to acquire it at the same time. Under congestion, it is still about 10% faster than RLock due to being implemented in Cython.

Python, 117 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
from cpython cimport pythread
from cpython.exc cimport PyErr_NoMemory

cdef class FastRLock:
    """Fast, re-entrant locking.

    Under uncongested conditions, the lock is never acquired but only
    counted.  Only when a second thread comes in and notices that the
    lock is needed, it acquires the lock and notifies the first thread
    to release it when it's done.  This is all made possible by the
    wonderful GIL.
    """
    cdef pythread.PyThread_type_lock _real_lock
    cdef long _owner            # ID of thread owning the lock
    cdef int _count             # re-entry count
    cdef int _pending_requests  # number of pending requests for real lock
    cdef bint _is_locked        # whether the real lock is acquired

    def __cinit__(self):
        self._owner = -1
        self._count = 0
        self._is_locked = False
        self._pending_requests = 0
        self._real_lock = pythread.PyThread_allocate_lock()
        if self._real_lock is NULL:
            PyErr_NoMemory()

    def __dealloc__(self):
        if self._real_lock is not NULL:
            pythread.PyThread_free_lock(self._real_lock)
            self._real_lock = NULL

    def acquire(self, bint blocking=True):
        return lock_lock(self, pythread.PyThread_get_thread_ident(), blocking)

    def release(self):
        if self._owner != pythread.PyThread_get_thread_ident():
            raise RuntimeError("cannot release un-acquired lock")
        unlock_lock(self)

    # compatibility with threading.RLock

    def __enter__(self):
        # self.acquire()
        return lock_lock(self, pythread.PyThread_get_thread_ident(), True)

    def __exit__(self, t, v, tb):
        # self.release()
        if self._owner != pythread.PyThread_get_thread_ident():
            raise RuntimeError("cannot release un-acquired lock")
        unlock_lock(self)

    def _is_owned(self):
        return self._owner == pythread.PyThread_get_thread_ident()


cdef inline bint lock_lock(FastRLock lock, long current_thread, bint blocking) nogil:
    # Note that this function *must* hold the GIL when being called.
    # We just use 'nogil' in the signature to make sure that no Python
    # code execution slips in that might free the GIL

    if lock._count:
        # locked! - by myself?
        if current_thread == lock._owner:
            lock._count += 1
            return 1
    elif not lock._pending_requests:
        # not locked, not requested - go!
        lock._owner = current_thread
        lock._count = 1
        return 1
    # need to get the real lock
    return _acquire_lock(
        lock, current_thread,
        pythread.WAIT_LOCK if blocking else pythread.NOWAIT_LOCK)

cdef bint _acquire_lock(FastRLock lock, long current_thread, int wait) nogil:
    # Note that this function *must* hold the GIL when being called.
    # We just use 'nogil' in the signature to make sure that no Python
    # code execution slips in that might free the GIL

    if not lock._is_locked and not lock._pending_requests:
        # someone owns it but didn't acquire the real lock - do that
        # now and tell the owner to release it when done. Note that we
        # do not release the GIL here as we must absolutely be the one
        # who acquires the lock now.
        if not pythread.PyThread_acquire_lock(lock._real_lock, wait):
            return 0
        #assert not lock._is_locked
        lock._is_locked = True
    lock._pending_requests += 1
    with nogil:
        # wait for the lock owning thread to release it
        locked = pythread.PyThread_acquire_lock(lock._real_lock, wait)
    lock._pending_requests -= 1
    #assert not lock._is_locked
    #assert lock._count == 0
    if not locked:
        return 0
    lock._is_locked = True
    lock._owner = current_thread
    lock._count = 1
    return 1

cdef inline void unlock_lock(FastRLock lock) nogil:
    # Note that this function *must* hold the GIL when being called.
    # We just use 'nogil' in the signature to make sure that no Python
    # code execution slips in that might free the GIL

    #assert lock._owner == pythread.PyThread_get_thread_ident()
    #assert lock._count > 0
    lock._count -= 1
    if lock._count == 0:
        lock._owner = -1
        if lock._is_locked:
            pythread.PyThread_release_lock(lock._real_lock)
            lock._is_locked = False

The FastRLock implementation optimises for the non-congested case. It works by exploiting the availability of the GIL. Since it knows that it holds the GIL when the acquire()/release() methods are called, it can safely check the lock for being held by other threads and just count any re-entries as long as it is always the same thread that acquires it. This is a lot faster than actually acquiring the underlying lock.

When a second thread wants to acquire the lock as well, it first checks the lock count and finds out that the lock is already owned. If the underlying lock is also held by another thread already, it then just frees the GIL and asks for acquiring the lock, just like RLock does. If the underlying lock is not held, however, it acquires it immediately and basically hands over the ownership by telling the current owner to free it when it's done. Then, it falls back to the normal non-owner behaviour that asks for the lock and will eventually acquire it when it gets released. This makes sure that the real lock is only acquired when at least two threads want it.

All of these operations are basically atomic because any thread that modifies the lock state always holds the GIL. Note that the implementation must not call any Python code while handling the lock, as calling into Python may lead to a context switch which hands over the GIL to another thread and thus breaks atomicity. Therefore, the code misuses Cython's 'nogil' annotation to make sure that no Python code slips in accidentally.

Here are some timings for 1) five acquire-release cycles ('lock_unlock'), 2) five acquire calls followed by five release calls (nested locking, 'reentrant_lock_unlock'), 3) a mixed and partly nested sequence of acquire and release calls ('mixed_lock_unlock') and 4) five acquire-release cycles that do not block. All four are benchmarked for the single threaded case and the multi threaded case with 10 threads. I also tested it with 20 threads only to see that it then takes about twice the time. Note also that the congested case is substantially slower for both locks, so I only looped 1000x here to get useful timings instead of 100000x for the single threaded case.

Testing threading.RLock

sequential (x100000):
lock_unlock              : 1.408 sec
reentrant_lock_unlock    : 1.089 sec
mixed_lock_unlock        : 1.212 sec
lock_unlock_nonblocking  : 1.415 sec

threaded 10T (x1000):
lock_unlock              : 1.188 sec
reentrant_lock_unlock    : 1.039 sec
mixed_lock_unlock        : 1.068 sec
lock_unlock_nonblocking  : 1.199 sec

Testing FastRLock

sequential (x100000):
lock_unlock              : 0.122 sec
reentrant_lock_unlock    : 0.124 sec
mixed_lock_unlock        : 0.137 sec
lock_unlock_nonblocking  : 0.156 sec

threaded 10T (x1000):
lock_unlock              : 0.911 sec
reentrant_lock_unlock    : 0.938 sec
mixed_lock_unlock        : 0.953 sec
lock_unlock_nonblocking  : 0.916 sec

1 comment

Stefan Behnel (author) 13 years, 7 months ago  # | flag

This is mostly equivalent to the new RLock implementation in Python 3.2. Here is the same benchmark compared to my latest Py3.2a1 build (SVN rev. 83958):

Testing threading.RLock

sequential (x100000):
lock_unlock              : 0.134 sec
reentrant_lock_unlock    : 0.120 sec
mixed_lock_unlock        : 0.151 sec
lock_unlock_nonblocking  : 0.177 sec

threaded 10T (x1000):
lock_unlock              : 0.885 sec
reentrant_lock_unlock    : 0.972 sec
mixed_lock_unlock        : 0.883 sec
lock_unlock_nonblocking  : 0.911 sec

Testing FastRLock

sequential (x100000):
lock_unlock              : 0.093 sec
reentrant_lock_unlock    : 0.093 sec
mixed_lock_unlock        : 0.104 sec
lock_unlock_nonblocking  : 0.112 sec

threaded 10T (x1000):
lock_unlock              : 0.943 sec
reentrant_lock_unlock    : 0.871 sec
mixed_lock_unlock        : 0.920 sec
lock_unlock_nonblocking  : 0.908 sec

So, in the single-threaded case, the C implementation in Py3.2 is only about 20-50% slower than the Cython implementation here, whereas it is more or less as fast in the congested case.