diff options
author | andreask <andreask> | 2014-03-08 00:21:59 (GMT) |
---|---|---|
committer | andreask <andreask> | 2014-03-08 00:21:59 (GMT) |
commit | 5d34eaa0d47d6d00cc46f01093e65a567f28b559 (patch) | |
tree | 94d01d82ddf45c71b6e60ef151453da157245d07 | |
parent | 5a1cac2f139731c8a4cacfc7dce7b8c456e860f4 (diff) | |
download | tcl-5d34eaa0d47d6d00cc46f01093e65a567f28b559.zip tcl-5d34eaa0d47d6d00cc46f01093e65a567f28b559.tar.gz tcl-5d34eaa0d47d6d00cc46f01093e65a567f28b559.tar.bz2 |
socket -async and gets/puts stall on windows (Ticket [336441ed59])
win_sock_async_connect_race_fix
This is a change for a problem which is pretty much impossible to test
for in the testsuite, as it is a race condition on a problem with
Windows and as such cannot be reliably induced from the Tcl side,
script nor C.
The problem affects only sockets which are opened -async.
At the time of the socket's creation the core will remember this fact
in the SocketState flags (SOCKET_ASYNC_CONNECT <SAC>) and in the
events to look for with select (FD_CONNECT).
Then, to handle the possiblity that the script writes to or read from
the socket before the connection has completed the driver functions
Tcp(Input|Output)Proc check for the flag, and if it is still set enter
WaitForSocketEvent (FD_CONNECT) <WFSE> to sync-wait for the connection
before continuing to actually read/write.
Unfortunately Windows sometimes deigns to not deliver FD_CONNECT,
skipping directly to FD_READ|WRITE. When that happens the unmodified
WFSE gets stuck in WFSO, and hangs the entire Tcl process.
The core actually already has code to deal with that situation, in
part. This code is found in the SOCKET_MESSAGE branch of the big
switch in SocketProc(). When it finds <SAC> in the flags not reset by
an FD_CONNECT event it unconditionally clears the flag and forces an
FD_WRITE on other parts (My change adds a comment to the location in
question, as marker).
This code works for when Windows delivers the first event before the
script manages to read/write from the new socket, because then the
driver functions will see the cleared flag and not enter WFSE to wait
for FD_CONNECT in the first place.
However, if the script was fast enough to already be in the WFSE
waiting for FD_CONNECT then the main thread is stuck and the change made by SocketProc() does not help.
The commit here fixes that issue by extending WFSE to recognize the
reset of SAC by SocketProc() as a valid break condition when it waits
for FD_CONNECT, thus preventing it from getting stuck.
-rw-r--r-- | win/tclWinSock.c | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/win/tclWinSock.c b/win/tclWinSock.c index 9fa01c9..badfc7a 100644 --- a/win/tclWinSock.c +++ b/win/tclWinSock.c @@ -1211,6 +1211,15 @@ WaitForSocketEvent( break; } else if (infoPtr->readyEvents & events) { break; + } else if ((events == FD_CONNECT) && + !(infoPtr->flags & SOCKET_ASYNC_CONNECT)) { + /* When waiting for FD_CONNECT Windows may not deliver this event, + * causing us to get stuck. However, SocketProc()'s SOCKET_MESSAGE + * handler has special code which detects this and resets the + * infoPtr->flags async bit anyway (See (xxx)). That we can detect + * here and break the loop as if we had gotten FD_CONNECT. + */ + break; } else if (infoPtr->flags & SOCKET_ASYNC) { *errorCodePtr = EWOULDBLOCK; result = 0; @@ -2327,6 +2336,7 @@ SocketProc( } } + /* (xxx) See corresponding marker in WaitForSocketEvent as well */ if (infoPtr->flags & SOCKET_ASYNC_CONNECT) { infoPtr->flags &= ~(SOCKET_ASYNC_CONNECT); if (error != ERROR_SUCCESS) { |