| Store | Cart

[TCLCORE] [PATCH] Initial patch for C10k problem in TCL

From: Daniel Hans <dah...@gmail.com>
Fri, 11 Sep 2009 01:11:27 +0200
Hello,

my name is Daniel Hans. One year ago I participated for TCL as a
Google Summer of Code student. A few months ago I asked on the
TCL-core mailing list for some ideas of a project for my classes which
would optimize something in TCL and make the language more efficient.

Tomasz Kosiak suggested that I could work on the C10k problem [0]. A
very short description is that a server written in TCL can have some
problems if it has to deal with a large numbers of connections at the
same time.

The problem is caused by some constrains of the select function which
is used by TCL to watch descriptors activity.

Firstly, select uses fd_set structures to define which descriptors
should be taken into account. One might say, it has some soft
restriction that its size is 128 bytes by default. It gives an
opportunity to process up to 1024 connections by default and its
behavior is satisfactory. But what if someone wants more? Of course we
can change the limits. The best way is probably (as I have read) to
recompile the kernel, but when I did some tests it is enough to change
one constant variable in sys/select.h header file. Let us say that one
wants to have about 10000 connections. Then we start to deal with much
worse disadvantages of select function.

The problem is that the function monitors all desired descriptors and
when some of them becomes active, it returns a number and rearranges
fd_set structures. Basically we have no information about which
descriptors are really on, thus we need to iterate through the whole
set and check each of them individually, although probably only a few
of them are actually active.

Let us take a look at tclUnixNotfy.c file which is responsible for
those actions. Select function is mainly used in Tcl_WaitForEvent:

numFound = select(tsdPtr->numFdBits,
&(tsdPtr->readyMasks.readable),&(tsdPtr->readyMasks.writable),
&(tsdPtr->readyMasks.exceptional), timeoutPtr);

And then, there is a problem, because numFound is not actually, but we
simple iterate through all open descriptors:

for (filePtr = tsdPtr->firstFileHandlerPtr; (filePtr != NULL);
     filePtr = filePtr->nextPtr) {

           mask = 0;
           if (FD_ISSET(filePtr->fd, &(tsdPtr->readyMasks.readable))) {
        	mask |= TCL_READABLE;
           }
 	    if (FD_ISSET(filePtr->fd, &(tsdPtr->readyMasks.writable))) {
 		mask |= TCL_WRITABLE;
	    }
           if (FD_ISSET(filePtr->fd, &(tsdPtr->readyMasks.exceptional))) {
        	mask |= TCL_EXCEPTION;
	    }
…
}

For each descriptor it is checked if it is active. Whenever we deal
with many descriptors, there is a problem as described above.

I am sending an initial version of a patch which tries to resolve that
problem for Linux and BSD operating systems.

The solution is very simple but straightforward: instead of using
inefficient select, it is better to use some other solutions.
Unfortunately, there is no one universal way to deal with all
technologies. For Linux, we have epoll. For FreeBSD we have kqueue.
Anyway, they all have one very neat feature: not only do they return
the number of activated descriptors, but also they actually fill some
structures which give us access to the descriptors that we want to
process. Therefore, when we have 5 active out of 10000, we have their
numbers and need not check all of them.

I have performed some simple tests which used that patch and it turned
out that both kqueue as well as epoll are much better with a large
number of descriptors. Unfortunately, I was not able to perform any
tests on OpenSolaris, because I had some problems with installing that
system on a virtual machine.

The aim of those test was that there is a large number of connections,
and for each connection, there is a small amount of work to be
performed.

My testing environment contained only three machined, so actual
results could vary and be even better.

As I mentioned above this is an initial version of the patch. If you
had any suggestions (substantial as well as stylish) , found some bugs
or anything, I would really appreciate that and I will try to fix it
as soon as possible.

I am sending two patches. First of them updates tclUnixNotfy.c file
with the new code. The second one updates configure file. If you want
to turn the many connections support option on, you need to run
./configure --enable-connections.

A slight disadvantage of the new solution is that some static-sized
arrays are used instead of lists of fileHandlers and their size may be
large for 10000 connections, but I think the cost is worth if you want
to play with many connections. Btw. that is why I added the
--enable-connections option - no to persuade anyone to use it.

As I said, I know this is just an initial version of that patch (which
btw. was created by a standard linux diff. if it is a wrong format for
TCL, please let me know which one is correct) to 8.5.7 version. I will
appreciate any comments.

If the 10CK problem was not important for TCL, I am sorry for spamming
the list :-)

Regards,
Daniel Hans

[0] http://www.kegel.com/c10k.html

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Tcl-Core mailing list
Tcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tcl-core

Recent Messages in this Thread
Daniel Hans Sep 10, 2009 11:11 pm
Roman Puls Sep 11, 2009 07:15 am
Messages in this thread