The Python asyncore and aynchat modules
The Python standard library provides two modules—asyncore
and
asynchat
—to help in writing concurrent network servers using
event-based designs. The documentation does not give good examples,
so I am making some notes.
Overview
The basic idea behind the asyncore
module is that:
- there is a function,
asyncore.loop()
that doesselect()
on a bunch of ‘channels’. Channels are thin wrappers around sockets. - when
select
returns, it reports which sockets have data waiting to be read, which ones are now free to send more data, and which ones have errors;loop()
examines the event and the socket’s state to create a higher level event; - it then calls a method on the channel corresponding to the higher level event.
asyncore
provides a low-level, but flexible API to build network
servers. asynchat
builds upon asyncore
and provides an API that is
more suitable for request/response type of protocols.
aysncore
The asyncore module’s API consists of:
- the
loop()
method, to be called by a driver program written by you; - the
dispatcher
class, to be subclassed by you to do useful stuff. Thedispatcher
class is what is called ‘channel’ elsewhere.
+-------------+ +--------+ | driver code |---------> | loop() | +-------------+ +--------+ | | | | loop-dispatcher API (a) | | | +--------------+ | | dispatcher | +----------------->| subclass | +--------------+ | | dispatcher-logic API (b) | +--------------+ | server logic | +--------------+
This is all packaged nicely in an object oriented way. So, we have
the dispatcher
class, that extends/wraps around the socket class (from
the socket
module in the Python standard library). It provides all
the socket
class’ methods, as well as methods to handle the higher
level events. You are supposed to subclass dispatcher
and implement
the event handling methods to do something useful.
The loop-dispatcher API
The loop function looks like this:
loop( [timeout[, use_poll[, map[,count]]]])
What is the map? It is a dictionary whose keys are the
file-descriptors, or fds, of the socket (i.e., socket.fileno()
), and
whose values are the dispatcher objects which you want to handle events on that socket/fd.
When we create a new dispatcher
object, it automatically gets added to a
global list of sockets (which is invisible to us, and managed behind the scenes).
The loop()
function does a select()
on this
list.
We can over-ride the list that loop looks at, by providing an explicit map. But then, we would need to add/remove dispatchers we create to/from this map ourselves. (Hmm… we might always want
to use explicit maps; then our loop calls will be thread safe and we
will be able to launch multiple threads, each calling loop on
different maps.)
Methods a dispatcher subclass should implement
loop()
needs the dispatcher
to implement some methods:
readable()
: should returnTrue
, if you want the fd to be observed for read events;writable()
: should returnTrue
, if you want the fd to be observed for write events;
If either readable
or writable
returns True
, the corresponding fd will be examined
for errors also. Obviously, it makes no sense to have a dispatcher
which returns False
for both readable
and writable
.
Some other methods that loop
calls on dispatcher
s are:
handle_read
: socket is readable; dispatcher.recv() can be used to actually get the datahandle_write
: socket is writable; dispatcher.send(data) can be used to actually send the datahandle_error
: socket encountered an errorhandle_expt
: socket received OOB data (not really used in practice)handle_close
: socket was closed remotely or locally
For server dispatchers, loop
calls one more event:
handle_accept
: a new incoming connection can be accept()ed. Call the accept() method really accept the connection. To create a server socket, call the bind() and listen() methods on it first.
Client sockets get this event:
handle_connect
: connection to remote endpoint has been made. To initiate the connection, first call theconnect
() method on it.
Client sockets are discussed in the asyncore
documentation so I will not discuss them here.
Other socket methods are available in dispatcher
: create_socket
,
close
, set_resue_addr
. They are not called by loop
but are available
so that your code can call them when it needs to create a new socket, close an existing socket, and tell the OS to set the SO_REUSEADDR flag on the server socket.
How to write a server using asyncore
The standard library documentation gives a client example, but not a
server example. Here are some notes on the latter.
- Subclass
dispatcher
to create a listening socket - In its
handle_accept
method, create new dispatchers. They’ll get added to the global socket map.
Note: the handlers must not block or take too much time… or the server won’t be concurrent. This is because when multiple sockets get an event, loop
calls their dispatchers one-by-one, in the same thread.
The socket-like functions that dispatcher extends should not be bypassed in order to access the low level socket functions. They do funky things to detect higher level events. For e.g., how does asyncore figure out that the socket is closed? If I remember correctly, there are two ways to detect whether a non-blocking socket is closed:
- select() returns a read event, but when you call recv()/read() you get zero bytes;
- you call send()/write() and it fails with an error (sending zero bytes is not an error).
(I wish I had a copy of Unix Network Programming by Stevens handy
right now.) dispatcher
will detect both events above and if any one of them occurs, will call handle_close
. This frees you from having to look at low-level events, and think in terms of higher level events.
The code for a server based on asyncore
is below:
asyncore_echo_server.py
import logging import asyncore import socket logging.basicConfig(level=logging.DEBUG, format="%(created)-15s %(msecs)d %(levelname)8s %(thread)d %(name)s %(message)s") log = logging.getLogger(__name__) BACKLOG = 5 SIZE = 1024 class EchoHandler(asyncore.dispatcher): def __init__(self, conn_sock, client_address, server): self.server = server self.client_address = client_address self.buffer = "" # We dont have anything to write, to start with self.is_writable = False # Create ourselves, but with an already provided socket asyncore.dispatcher.__init__(self, conn_sock) log.debug("created handler; waiting for loop") def readable(self): return True # We are always happy to read def writable(self): return self.is_writable # But we might not have # anything to send all the time def handle_read(self): log.debug("handle_read") data = self.recv(SIZE) log.debug("after recv") if data: log.debug("got data") self.buffer += data self.is_writable = True # sth to send back now else: log.debug("got null data") def handle_write(self): log.debug("handle_write") if self.buffer: sent = self.send(self.buffer) log.debug("sent data") self.buffer = self.buffer[sent:] else: log.debug("nothing to send") if len(self.buffer) == 0: self.is_writable = False # Will this ever get called? Does loop() call # handle_close() if we called close, to start with? def handle_close(self): log.debug("handle_close") log.info("conn_closed: client_address=%s:%s" % \ (self.client_address[0], self.client_address[1])) self.close() #pass class EchoServer(asyncore.dispatcher): allow_reuse_address = False request_queue_size = 5 address_family = socket.AF_INET socket_type = socket.SOCK_STREAM def __init__(self, address, handlerClass=EchoHandler): self.address = address self.handlerClass = handlerClass asyncore.dispatcher.__init__(self) self.create_socket(self.address_family, self.socket_type) if self.allow_reuse_address: self.set_reuse_addr() self.server_bind() self.server_activate() def server_bind(self): self.bind(self.address) log.debug("bind: address=%s:%s" % (self.address[0], self.address[1])) def server_activate(self): self.listen(self.request_queue_size) log.debug("listen: backlog=%d" % self.request_queue_size) def fileno(self): return self.socket.fileno() def serve_forever(self): asyncore.loop() # TODO: try to implement handle_request() # Internal use def handle_accept(self): (conn_sock, client_address) = self.accept() if self.verify_request(conn_sock, client_address): self.process_request(conn_sock, client_address) def verify_request(self, conn_sock, client_address): return True def process_request(self, conn_sock, client_address): log.info("conn_made: client_address=%s:%s" % \ (client_address[0], client_address[1])) self.handlerClass(conn_sock, client_address, self) def handle_close(self): self.close()
and to use it:
interface = "0.0.0.0" port = 8080 server = asyncore_echo_server.EchoServer((interface, port)) server.serve_forever()
Comments
Thank you.
Thanks a lot.
thanks a lot. This is an article I`ve been looking for ages ;)
Nice post.
There is one typo I’ve noticed, self.set_resue_addr() should be self.set_reuse_addr()
great post, thanx
Thanks for posting, will make it easier to get going with this class.
Very useful one for beginners
Great article!!!
I have a question.
can i combine this server in an application implemented with cmd library? cmdloop() is blocking.
And how?
Thanks
@Antonis: since cmdloop() is blocking and so is asyncore.loop(), you have to execute them in different threads. So, for e.g:
import threading
t1 = threading.thread(Cmd.Cmdloop, "prompt> ")
t1.start()
server = asyncore_echo_server.EchoServer((interface, port))
t2 = threading.thread(server.serve_forever)
t2.start()
The other thing to consider is that you’d probably want cmdloop and the echoserver to communicate with each other. For e.g., the server should print what it receives to the console, and send what the user types to the network.
One way to do this is to create two queues Q1 and Q2. Your command interpreter should read the user input, perhaps transform or process it, and put the result in Q1. The EchoHandler.handle_write should read Q1 (instead of the “buffer” above) and send its contents out.
The EchoHandler’s handle_read method should read the network data and put it in Q2. Your command interpreter should, perhaps just after the user has entered a command, read the contents of Q2 and print it to console.
This is the general idea. When to read/write to the queues is highly application dependent.
If I wanted to create a simple proxy it seems that I’d have to create 2 threads and probably use Queue as well. One thread to handle receiving and sending of data to the proxy and back to the client, and one thread to handle sending/receiving data to/from a website. If I understand what you’ve said above, I would also need to create a Queue to handle passing information back and forth between the two threads.
After reading your posts, my understanding is that asyncore would function normally (in my suggested example above) and the use of threads would prevent any client side processing hangs/slow responses from slowing down the asyncore function itself.
Is that model correct? Or is my understanding flawed?
Thanks!
@Curious: your understanding is correct. Your scenario is conceptually the same as the one posted earlier by @Antonis, except in that case there was only one “client” (the cmdloop) and one “website” (the echoserver). In your case, however, presumable there will be many clients connecting to your server, asking for various URLs?
If so, then your server needs to be more sophisticated: the thread getting the data back from the website has to “remember” which client originally requested that data and ensure it is sent to only that client.
(Interesting, I can’t reply to your reply.)
I thought it was similar to the one presented by @Antonis but I wasn’t positive. I’m not a programmer by trade.
Oh my, I had planned on all the necessary functions requiring the request and then returning both the request and response to the requestor. At this point I see how that’s not going to fully solve the problem of who sent what data and needs what response. I’ll somehow need to tag each thread.
Is there some pythonic book that describes these types of scenarios?
Thanks for the useful posts here! And for responding to my question.
Any good book on network programming should cover these topics, perhaps with other languages/frameworks.
“Twisted Network Programming Essentials” (http://www.amazon.com/Twisted-Network-Programming-Essentials-Fettig/dp/0596100329/ref=sr_1_1?ie=UTF8&s=books&qid=1279776775&sr=1-1) is a good book on how to write non-blocking IO based network clients and servers. It will cover your scenario, but use the Twisted framework. Which is interesting, because Twisted takes the core idea behind the asyncore module and takes it to a new level.
Thank you! This saved me a lot of time.
Thank you! This Tutorial reallyu shows how python make abstraction of C select() and epoll() fucntions.
But I’ve two questions:
– I don’t undertstand why your EchoHandler Class has a server parameter, and why to call the constructor with that parameter.
– I would also like to know if when use an explicit map, we can act ourself on this map, like adding file descriptor and such things.
Or it’s just in order to have a reference to the map we’re using, and it’s for the main loop(asyncore.loop()) to make actions on the given map.
Thans again.
@Veron:
Good question. In the sample code, the
server
parameter is useless and can be removed. MyEchoHandler
does not use the server parameter because it is rather simple. In a real server, handlers would typically be reading/storing some state that is independent of a connection or that needs to survive across connections. This state can be conveniently kept in theserver
object and accessed by the individual handlers via theserver
parameter passed to them. Of course, instead of passingserver
object to the handlers, you could pass in any object that is suitable for storing the state. So again, yes, in the sample code, you can get rid of the server parameter.If you pass in a
map
to loop explicitly, then you must: (a) add new dispatchers to it when you create them (when a new socket is accepted); (b) remove dispatchers from it when their socket is closed.The dispatcher code does the above two things for you, using the global map. If you are not using the global map, then you have to do these things yourself.
Thanks parijatmishara for your greet answer.
However, I’d like to if one can access the global map, in order to get a specific file descriptor/dispatcher, and write data on this file descriptor?
Or It’s beter or obliged to provide a specific map if we want to do such actions?
The problem with accessing the global map is that you don’t have a reference to it — you don’t know its name and which module it is defined in — unless you are willing to dig into the source code of the asyncore module.
Actually, you DON’T want to know the name of the global map variable — it is an implementation detail that may change from version to version of the asyncore module, potentially breaking your code.
Hence, if you want to get hold of an fd and do operations yourself, you are better off using your own map.