Doc/lib/libsocksvr.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293

\section{\module{SocketServer} ---
         A framework for network servers}

\declaremodule{standard}{SocketServer}
\modulesynopsis{A framework for network servers.}


The \module{SocketServer} module simplifies the task of writing network
servers.

There are four basic server classes: \class{TCPServer} uses the
Internet TCP protocol, which provides for continuous streams of data
between the client and server.  \class{UDPServer} uses datagrams, which
are discrete packets of information that may arrive out of order or be
lost while in transit.  The more infrequently used
\class{UnixStreamServer} and \class{UnixDatagramServer} classes are
similar, but use \UNIX{} domain sockets; they're not available on
non-\UNIX{} platforms.  For more details on network programming, consult
a book such as W. Richard Steven's \citetitle{UNIX Network Programming}
or Ralph Davis's \citetitle{Win32 Network Programming}.

These four classes process requests \dfn{synchronously}; each request
must be completed before the next request can be started.  This isn't
suitable if each request takes a long time to complete, because it
requires a lot of computation, or because it returns a lot of data
which the client is slow to process.  The solution is to create a
separate process or thread to handle each request; the
\class{ForkingMixIn} and \class{ThreadingMixIn} mix-in classes can be
used to support asynchronous behaviour.

Creating a server requires several steps.  First, you must create a
request handler class by subclassing the \class{BaseRequestHandler}
class and overriding its \method{handle()} method; this method will
process incoming requests.  Second, you must instantiate one of the
server classes, passing it the server's address and the request
handler class.  Finally, call the \method{handle_request()} or
\method{serve_forever()} method of the server object to process one or
many requests.

When inheriting from \class{ThreadingMixIn} for threaded connection
behavior, you should explicitly declare how you want your threads
to behave on an abrupt shutdown. The \class{ThreadingMixIn} class
defines an attribute \var{daemon_threads}, which indicates whether
or not the server should wait for thread termination. You should
set the flag explicitly if you would like threads to behave
autonomously; the default is \constant{False}, meaning that Python
will not exit until all threads created by \class{ThreadingMixIn} have
exited.

Server classes have the same external methods and attributes, no
matter what network protocol they use:

\setindexsubitem{(SocketServer protocol)}

\subsection{Server Creation Notes}

There are five classes in an inheritance diagram, four of which represent
synchronous servers of four types:

\begin{verbatim}
        +------------+
        | BaseServer |
        +------------+
              |
              v
        +-----------+        +------------------+
        | TCPServer |------->| UnixStreamServer |
        +-----------+        +------------------+
              |
              v
        +-----------+        +--------------------+
        | UDPServer |------->| UnixDatagramServer |
        +-----------+        +--------------------+
\end{verbatim}

Note that \class{UnixDatagramServer} derives from \class{UDPServer}, not
from \class{UnixStreamServer} -- the only difference between an IP and a
Unix stream server is the address family, which is simply repeated in both
unix server classes.

Forking and threading versions of each type of server can be created using
the \class{ForkingMixIn} and \class{ThreadingMixIn} mix-in classes.  For
instance, a threading UDP server class is created as follows:

\begin{verbatim}
    class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass
\end{verbatim}

The mix-in class must come first, since it overrides a method defined in
\class{UDPServer}.  Setting the various member variables also changes the
behavior of the underlying server mechanism.

To implement a service, you must derive a class from
\class{BaseRequestHandler} and redefine its \method{handle()} method.  You
can then run various versions of the service by combining one of the server
classes with your request handler class.  The request handler class must be
different for datagram or stream services.  This can be hidden by using the
handler subclasses \class{StreamRequestHandler} or \class{DatagramRequestHandler}.

Of course, you still have to use your head!  For instance, it makes no sense
to use a forking server if the service contains state in memory that can be
modified by different requests, since the modifications in the child process
would never reach the initial state kept in the parent process and passed to
each child.  In this case, you can use a threading server, but you will
probably have to use locks to protect the integrity of the shared data.

On the other hand, if you are building an HTTP server where all data is
stored externally (for instance, in the file system), a synchronous class
will essentially render the service "deaf" while one request is being
handled -- which may be for a very long time if a client is slow to receive
all the data it has requested.  Here a threading or forking server is
appropriate.

In some cases, it may be appropriate to process part of a request
synchronously, but to finish processing in a forked child depending on the
request data.  This can be implemented by using a synchronous server and
doing an explicit fork in the request handler class \method{handle()}
method.

Another approach to handling multiple simultaneous requests in an
environment that supports neither threads nor \function{fork()} (or where
these are too expensive or inappropriate for the service) is to maintain an
explicit table of partially finished requests and to use \function{select()}
to decide which request to work on next (or whether to handle a new incoming
request).  This is particularly important for stream services where each
client can potentially be connected for a long time (if threads or
subprocesses cannot be used).

%XXX should data and methods be intermingled, or separate?
% how should the distinction between class and instance variables be
% drawn?

\subsection{Server Objects}

\begin{funcdesc}{fileno}{}
Return an integer file descriptor for the socket on which the server
is listening.  This function is most commonly passed to
\function{select.select()}, to allow monitoring multiple servers in the
same process.
\end{funcdesc}

\begin{funcdesc}{handle_request}{}
Process a single request.  This function calls the following methods
in order: \method{get_request()}, \method{verify_request()}, and
\method{process_request()}.  If the user-provided \method{handle()}
method of the handler class raises an exception, the server's
\method{handle_error()} method will be called.
\end{funcdesc}

\begin{funcdesc}{serve_forever}{}
Handle an infinite number of requests.  This simply calls
\method{handle_request()} inside an infinite loop.
\end{funcdesc}

\begin{datadesc}{address_family}
The family of protocols to which the server's socket belongs.
\constant{socket.AF_INET} and \constant{socket.AF_UNIX} are two
possible values.
\end{datadesc}

\begin{datadesc}{RequestHandlerClass}
The user-provided request handler class; an instance of this class is
created for each request.
\end{datadesc}

\begin{datadesc}{server_address}
The address on which the server is listening.  The format of addresses
varies depending on the protocol family; see the documentation for the
socket module for details.  For Internet protocols, this is a tuple
containing a string giving the address, and an integer port number:
\code{('127.0.0.1', 80)}, for example.
\end{datadesc}

\begin{datadesc}{socket}
The socket object on which the server will listen for incoming requests.
\end{datadesc}

% XXX should class variables be covered before instance variables, or
% vice versa?

The server classes support the following class variables:

\begin{datadesc}{allow_reuse_address}
Whether the server will allow the reuse of an address. This defaults
to \constant{False}, and can be set in subclasses to change the policy.
\end{datadesc}

\begin{datadesc}{request_queue_size}
The size of the request queue.  If it takes a long time to process a
single request, any requests that arrive while the server is busy are
placed into a queue, up to \member{request_queue_size} requests.  Once
the queue is full, further requests from clients will get a
``Connection denied'' error.  The default value is usually 5, but this
can be overridden by subclasses.
\end{datadesc}

\begin{datadesc}{socket_type}
The type of socket used by the server; \constant{socket.SOCK_STREAM}
and \constant{socket.SOCK_DGRAM} are two possible values.
\end{datadesc}

There are various server methods that can be overridden by subclasses
of base server classes like \class{TCPServer}; these methods aren't
useful to external users of the server object.

% should the default implementations of these be documented, or should
% it be assumed that the user will look at SocketServer.py?

\begin{funcdesc}{finish_request}{}
Actually processes the request by instantiating
\member{RequestHandlerClass} and calling its \method{handle()} method.
\end{funcdesc}

\begin{funcdesc}{get_request}{}
Must accept a request from the socket, and return a 2-tuple containing
the \emph{new} socket object to be used to communicate with the
client, and the client's address.
\end{funcdesc}

\begin{funcdesc}{handle_error}{request, client_address}
This function is called if the \member{RequestHandlerClass}'s
\method{handle()} method raises an exception.  The default action is
to print the traceback to standard output and continue handling
further requests.
\end{funcdesc}

\begin{funcdesc}{process_request}{request, client_address}
Calls \method{finish_request()} to create an instance of the
\member{RequestHandlerClass}.  If desired, this function can create a
new process or thread to handle the request; the \class{ForkingMixIn}
and \class{ThreadingMixIn} classes do this.
\end{funcdesc}

% Is there any point in documenting the following two functions?
% What would the purpose of overriding them be: initializing server
% instance variables, adding new network families?

\begin{funcdesc}{server_activate}{}
Called by the server's constructor to activate the server.  The default
behavior just \method{listen}s to the server's socket.
May be overridden.
\end{funcdesc}

\begin{funcdesc}{server_bind}{}
Called by the server's constructor to bind the socket to the desired
address.  May be overridden.
\end{funcdesc}

\begin{funcdesc}{verify_request}{request, client_address}
Must return a Boolean value; if the value is \constant{True}, the request will be
processed, and if it's \constant{False}, the request will be denied.
This function can be overridden to implement access controls for a server.
The default implementation always returns \constant{True}.
\end{funcdesc}

\subsection{RequestHandler Objects}

The request handler class must define a new \method{handle()} method,
and can override any of the following methods.  A new instance is
created for each request.

\begin{funcdesc}{finish}{}
Called after the \method{handle()} method to perform any clean-up
actions required.  The default implementation does nothing.  If
\method{setup()} or \method{handle()} raise an exception, this
function will not be called.
\end{funcdesc}

\begin{funcdesc}{handle}{}
This function must do all the work required to service a request.
The default implementation does nothing.
Several instance attributes are available to it; the request is
available as \member{self.request}; the client address as
\member{self.client_address}; and the server instance as
\member{self.server}, in case it needs access to per-server
information.

The type of \member{self.request} is different for datagram or stream
services.  For stream services, \member{self.request} is a socket
object; for datagram services, \member{self.request} is a string.
However, this can be hidden by using the  request handler subclasses
\class{StreamRequestHandler} or \class{DatagramRequestHandler}, which
override the \method{setup()} and \method{finish()} methods, and
provide \member{self.rfile} and \member{self.wfile} attributes.
\member{self.rfile} and \member{self.wfile} can be read or written,
respectively, to get the request data or return data to the client.
\end{funcdesc}

\begin{funcdesc}{setup}{}
Called before the \method{handle()} method to perform any
initialization actions required.  The default implementation does
nothing.
\end{funcdesc}