summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorandreask <andreask>2014-03-08 00:21:59 (GMT)
committerandreask <andreask>2014-03-08 00:21:59 (GMT)
commit5d34eaa0d47d6d00cc46f01093e65a567f28b559 (patch)
tree94d01d82ddf45c71b6e60ef151453da157245d07
parent5a1cac2f139731c8a4cacfc7dce7b8c456e860f4 (diff)
downloadtcl-win_sock_async_connect_race_fix.zip
tcl-win_sock_async_connect_race_fix.tar.gz
tcl-win_sock_async_connect_race_fix.tar.bz2
socket -async and gets/puts stall on windows (Ticket [336441ed59]) win_sock_async_connect_race_fix
This is a change for a problem which is pretty much impossible to test for in the testsuite, as it is a race condition on a problem with Windows and as such cannot be reliably induced from the Tcl side, script nor C. The problem affects only sockets which are opened -async. At the time of the socket's creation the core will remember this fact in the SocketState flags (SOCKET_ASYNC_CONNECT <SAC>) and in the events to look for with select (FD_CONNECT). Then, to handle the possiblity that the script writes to or read from the socket before the connection has completed the driver functions Tcp(Input|Output)Proc check for the flag, and if it is still set enter WaitForSocketEvent (FD_CONNECT) <WFSE> to sync-wait for the connection before continuing to actually read/write. Unfortunately Windows sometimes deigns to not deliver FD_CONNECT, skipping directly to FD_READ|WRITE. When that happens the unmodified WFSE gets stuck in WFSO, and hangs the entire Tcl process. The core actually already has code to deal with that situation, in part. This code is found in the SOCKET_MESSAGE branch of the big switch in SocketProc(). When it finds <SAC> in the flags not reset by an FD_CONNECT event it unconditionally clears the flag and forces an FD_WRITE on other parts (My change adds a comment to the location in question, as marker). This code works for when Windows delivers the first event before the script manages to read/write from the new socket, because then the driver functions will see the cleared flag and not enter WFSE to wait for FD_CONNECT in the first place. However, if the script was fast enough to already be in the WFSE waiting for FD_CONNECT then the main thread is stuck and the change made by SocketProc() does not help. The commit here fixes that issue by extending WFSE to recognize the reset of SAC by SocketProc() as a valid break condition when it waits for FD_CONNECT, thus preventing it from getting stuck.
-rw-r--r--win/tclWinSock.c10
1 files changed, 10 insertions, 0 deletions
diff --git a/win/tclWinSock.c b/win/tclWinSock.c
index 9fa01c9..badfc7a 100644
--- a/win/tclWinSock.c
+++ b/win/tclWinSock.c
@@ -1211,6 +1211,15 @@ WaitForSocketEvent(
break;
} else if (infoPtr->readyEvents & events) {
break;
+ } else if ((events == FD_CONNECT) &&
+ !(infoPtr->flags & SOCKET_ASYNC_CONNECT)) {
+ /* When waiting for FD_CONNECT Windows may not deliver this event,
+ * causing us to get stuck. However, SocketProc()'s SOCKET_MESSAGE
+ * handler has special code which detects this and resets the
+ * infoPtr->flags async bit anyway (See (xxx)). That we can detect
+ * here and break the loop as if we had gotten FD_CONNECT.
+ */
+ break;
} else if (infoPtr->flags & SOCKET_ASYNC) {
*errorCodePtr = EWOULDBLOCK;
result = 0;
@@ -2327,6 +2336,7 @@ SocketProc(
}
}
+ /* (xxx) See corresponding marker in WaitForSocketEvent as well */
if (infoPtr->flags & SOCKET_ASYNC_CONNECT) {
infoPtr->flags &= ~(SOCKET_ASYNC_CONNECT);
if (error != ERROR_SUCCESS) {