When to not just use socket.close() « Python recipes

I have implemented a "broken" client/server to show how socket.shutdown can be more useful than a simple socket.close operation. (I rewrote this to hopefully be more clear)

The close operation is not atomic. It implicitly tries to send any remaining data in _addition_ to closing a descriptor. Splitting this close operation up with the aid of the shutdown command can help avoid bugs. It gives the server one final way to say, "something went wrong". The server would also know that the client did not end correctly, since the socket should remain open when the client finished sending data. For example, if the function exits unexpectedly and python closes the socket for you, the server would not be able to send any data back.

In the server below, the client and server have different ideas about what the end marker should be. The rev_end function is written so as to look for an end marker. And, as long as they agree it should work. The socket.shutdown is for when something goes wrong.

      def recv_end(the_socket):
    End='SERVER WRONG MARKER'
    total_data=[];data='';got_end=False
    while True:
            data=the_socket.recv(8192)
            if not data: break
            if End in data:
                total_data.append(data[:data.find(End)])
                got_end=True
                break
            total_data.append(data)
            if len(total_data)>1:
                #check if end_of_data was split
                last_pair=total_data[-2]+total_data[-1]
                if End in last_pair:
                    total_data[-2]=last_pair[:last_pair.find(End)]
                    total_data.pop()
                    got_end=True
                    break
    return (got_end,''.join(total_data))
         
def basic_server(sock):
    got=[]
    got_end,data = recv_end(sock)
    if not got_end:  
        sock.send('ERROR:no end!') #<--- not possible w/close()
    else: sock.sendall(data*2)
    sock.shutdown(1)
    sock.close()
    
import socket
Port=4444
def start_server():
    sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    sock.bind(('',Port))
    sock.listen(5)
    print 'started on',Port
    while True:
        newsock,address=sock.accept()
        basic_server(newsock)

def send_data(data):
    sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    sock.connect(('localhost',Port))
    print 'connected'
    sock.sendall(data+'CLIENT WRONG MARKER')
    print 'sent',data
    sock.shutdown(1)
    print 'shutdown'
    result=[]
    while True:
       got=sock.recv(2)
       if not got: break
       result.append(got)
    sock.close()
    return ''.join(result)
        
if __name__=='__main__':
    start_server()

      

What can you rely on TCP for: 1)no duplication 2)packets arrive in order What you cannot rely on TCP for: How the data is broken up. What this means is, you always have to loop, since the data could arrive in little bits. Hence, both the server and client loop. What this also means is that you have to take care of how to end the conversation which is where shutdown comes into play.

In bi-directional communication, by default, a client can know when it is done sending, but it cannot know if it is done receiving. And, also the server cannot know whether the client is done sending.

You could do something like put a byte count in front of the data, or have an end marker so the server can know if it got all of the bytes. However, that introduces a problem. What if the byte count is wrong or the end marker never arrives? With a socket.close() the server cannot tell the client, "Strange. You are done sending data to me, but I didn't get all the data", since the client connection is not left open after the client is done sending.

With a socket.shutdown(1) the client can still be told by the server that something was wrong and take appropriate measures.

The shutdown command has three options: 0 = done receiving, 1 = done sending, 2 = both

In the code above focuses on 1, to get rid of the implict send in a close operation. Notice how in send_data the close operation is (relatively) far away from the shutdown. This allows the server to tell the client any parting comment.

Just run the code to start the server. The server is set to recv only 2 bytes at a time for demonstration purposes (it should be something like 8192). To send data to it import it (call it shut_srv or whatever) and call send_data for the client side.

data=('a1234','b1234','c1234','d1234','e1234') for d in data: print shut_srv.send_data(d)

You will get a response like: connected sent a1234 shutdown ERROR:no end! connected sent b1234 shutdown ERROR:no end! connected sent c1234 shutdown ERROR:no end! connected sent d1234 shutdown ERROR:no end! connected sent e1234 shutdown ERROR:no end!

If you make the markers the same. The response should be: connected sent a123456789 shutdown a1234a1234 connected sent b1234 shutdown b1234b1234 connected sent c1234 shutdown c1234c1234 connected sent d1234 shutdown d1234d1234 connected sent e1234 shutdown e1234e1234

Tags: network

9 comments

Josiah Carlson 19 years ago # | flag

And? Why bother making this into a recipe? So...data gets broken up into pieces over a socket. This is a known issue with sockets, and usually happens when you are trying to send something larger than ethernet frames or the buffer size of your TCP/IP stack. Sometimes bits of data are also concatenated.

People who really care about getting their data, generally use protocols which are precise about where certain kinds of data separate themselves. Whether this be that you prefix your data with the amount of data that is going to be sent, or you use some sort of data terminator. Both of these cases are handled properly by the asynchat module, which is a better start than raw sockets.

John Nielsen (author) 19 years ago # | flag

good socket practice. The issue isn't that data is broken up, in fact I have another recipe the talks exactly what you talk about. Perhaps I emphasized that point too much. Using this method may help avoid having to write the code you (and I) talked about with a prefix or some such to say when you are done. (though you of course need to be aware that python can do an implicit close on your socket object)

This primary issue is how you shutdown sockets. Because it is bi-directional communication, closing and sending data at the same time can can result in subtle bugs. So, it is important to know about how you can be explicit, since I do not see any talk of it in the python books I have.

If you want to talk about it off this site you can send me email at pyguy2@yahoo.com. It is obvious I should probably elaborate more on the issues in this recipe when I get time.

Josiah Carlson 19 years ago # | flag

Everything you say assumes that the socket will be "shutdown" or something equivalent. The only time such things happen is if one builds it into the protocol. Real socket connections that fail randomly don't get shutdown(), and causes errors that your code isn't even addressing.

As I said before, one could use asyncore (and/or asynchat) and overload handle_error(), which handles all of the crap, and lets you implement your protocol (if you take the proper cues from other available modules). Part of one's protocol could be to send an empty 'packet' (when using a length/data pair, the length would be 0), which could signal a shutdown/close/whatever. Alternatively, Twisted includes features that make handling errors and implementing arbitrary protocols easier than raw sockets (I tend to prefer asyncore, because it comes with Python, and it fits the way my brain works).

John Nielsen (author) 19 years ago # | flag

I am still missing the point (did you look at the changes in my code). Yes, asyncore is good and I have used it before. I have also played with twisted. I too prefer asyncore since, mainly because I can assume it is present.

The point of the recipe is to show why one may want to use shutdown since almost no one talks about it. With standard close, since the client closes the socket and finishes up sending, the server cannot tell the client that something is wrong. (For example, if the end marker is wrong, or the byte count is wrong).

Did you see my example with the broken server? If a bug forces the client to exit and python calls close() automatically on the socket, the server would know something is wrong since it always gets the last word.

If the client screws up and ends the discussion incorrectly but still calls shutdown. In this case the client is leaving the communication channel open so the server can tell the client something is wrong, since the socket is gone.

Josiah Carlson 19 years ago # | flag

shutdown() and select(). Using socket.shutdown(0/1/2) is no better at communicating the fact that the client no longer wants to send any more data than an explicit protocol saying the same. In fact, in the absense of a protocol saying that the client /will/ shutdown their sending portion of the socket (the server's recieving portion) when a request is finished, the server generally /will/ close the connection, thinking that the client disconnected.

What do I mean?

>>> import socket
>>> import threading
>>> import select
>>> def l():
...     global incoming, addr
...     a = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
...     a.bind(('localhost', 9999))
...     a.listen(1)
...     incoming, addr = a.accept()
...     a.close()
...     print addr
...
>>> def c():
...     global outgoing
...     a = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
...     a.connect(('localhost', 9999))
...     outgoing = a
...     print "connected"
...
>>> threading.Thread(target=l).start()
>>> c()
connected
>>> ('127.0.0.1', 2202)

>>> outgoing.send('hello')
5
>>> outgoing.shutdown(1)
>>> incoming.recv(5)
'hello'
>>> select.select([incoming.fileno()], [], [], 0)
([656], [], [])

Now, that incoming socket, which select just said was readable, has no data waiting. According to asyncore, twisted, and every other asynchronous package I've seen, when a socket is readable(), but has no data waiting, then the remote machine has disconnected. So /any/ server written using such a semantic will do a socket.close(), and the client, which thought it was doing a good thing by using shutdown() to signal to the server that "no more information is coming your way", actually killed itself.

John Nielsen (author) 19 years ago # | flag

if all you have is close, you force _all_ clients to assume. With a socket.close the client assumes that it does not matter how the server reacts to it's last bit of data (when the data arrives successfully).

If all you had available to you was close(), you then _force_ all clients to have to assume that it never matters how the server reacts to the final bit of data.

Sure, asyncore does not take advantage of a client doing a shutdown(1). But that does not mean a shutdown is not useful.

John Nielsen (author) 19 years ago # | flag

the client still a good thing the sever just didn't care. I should respond more directly, when you say the client "killed itself" when it thought it was doing a good thing. The client was in fact irrevocably _done_ sending. So, what happned wasn't a bad thing. The sever just didn't care to send anything back, the sever didn't care that the socket was left open for it.

If all you have is close() then the client is forced to assume that it does not matter how the sever reacts the the final bit of data that arrives successfully. Sometimes that is an ok assumption to make, but when it is not you need shutdown semantics or the client will have to start up another socket connection to find out how the sever reacted to the last thing it sent.

How can a higher level protocol fix that?

Josiah Carlson 19 years ago # | flag

I'll just answer your final question...

Sometimes that is an ok assumption to make, but when it is not you need
shutdown semantics or the client will have to start up another socket
connection to find out how the sever reacted to the last thing it sent.
How can a higher level protocol fix that?

Easy, the client should never use shutdown. When the client gets a response to the last request it wants to make, it can just close its socket and be on its way. Heck, if it wants to make another request, and the connection is still open, it can! Amazing how it works. After the client has closed its connection (because it has all the requests and responses it wants), the server gets a 'readable without data' on the socket, and closes its end, as it should. That's how everyone writes socket clients and servers (though some have explicit "close the connection" requests or "closing the connection" responses). Don't change semantics just to change semantics when the current semantic runs most (if not all) of the TCP/IP backed protocols on the internet.

Josiah Carlson 19 years ago # | flag

"the server cannot know whether the client is done sending."

In standard socket semantics, if the client closes the connection completely then the server knows that the client doesn't want to send (or receive) anything more.

"What if the byte count is wrong"

Then you implemented your protocol incorrectly, or the connection hung.

"With a socket.close() the server cannot tell the client, ''Strange.
You are done sending data to me, but I didn't get all the data'',
since the client connection is not left open after the client is done
sending."

That's why you leave the socket completely open until the server has told the client, "I got your request". If you also include checksums and timeouts in your protocol, then responses like "your request timed out" or "your request is malformed", can be sent to the client.

Note that in the case of the client using sock.shutdown(), there is actually data sent to the server, signaling that the connection has been shutdown in that particular way. If that data gets to the server, then so does any data that the client sent earlier, because of that whole "packets arrive in-order" thing. Now, here's the crucial bit; if the data didn't make it there, then neither does the shutdown notification. The server just notices that the socket is no longer readable. Without timeouts (on either the client or server end), the server will happily wait until TCP/IP times out the connection (which can be on the order of 5 minutes for some stacks, from http://www.netbook.cs.purdue.edu/othrpags/qanda219.htm ).

"With a socket.shutdown(1) the client can still be told by the server
that something was wrong and take appropriate measures."

With no socket.shutdown(), the client can still be told by the server that everything was fine or something was wrong, and take appropriate measures to report success, resend the request, report failure to the user, etc.

What you seem to be missing is that there are more robust features of higher level protocols that handle "the client is done with requests", as well as the various error conditions that can arise; error conditions that can make socket.shutdown() ambiguous to the server.

◄	Python recipes (4591)	►
◄	John Nielsen's recipes (36)	►

When to not just use socket.close() (Python recipe) by John Nielsen
ActiveState Code (http://code.activestate.com/recipes/408997/)

9 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

When to not just use socket.close() (Python recipe) by John Nielsen ActiveState Code (http://code.activestate.com/recipes/408997/)

9 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

When to not just use socket.close() (Python recipe) by John Nielsen
ActiveState Code (http://code.activestate.com/recipes/408997/)