During testing TLS connectivity, we came across a condition within NetX 6.1.8 causing a connection dead-lock scenario with HTTPS connections.
Once the dead lock occurs, neither IP thread nor HTTPS web server thread will process new connections on port 443.
How to trigger
We use the nx_web_http_server
add-on component to implement TLS encrypted web services.
Producing repeated, concurrent and short-lived HTTPS connections (by clicking on the refresh button in Chrome browser) on port 443 which the NetX TLS stack terminates due to errors (error code NX_SECURE_TLS_NO_SUPPORTED_CIPHERS 0x10E or NX_NOT_CONNECTED 0x38).
Other info
The problem does not seem to occur if HTTPS connections are performed sequential (using wget as a client for example).
It only occurs when a browser is used which does open multiple connections and some of them will be queued.
This hints that the issue has to be in conjunction with the listen queueing mechanism (NX_WEB_HTTP_SERVER_SESSION_MAX set to 2, NX_WEB_HTTP_SERVER_MAX_PENDING set to 4).
We were not able to reproduce with normal HTTP traffic. This also hints that it is related to TLS and
maybe to the clean-up of an unsuccessful TLS session in function _nx_tcpserver_connect_process()
while at the same time a new connection is presented.
Also this is independent to the two race conditions reported here #42 and #43. Patches for those have been applied and the issue occurs with or without those patches.
System's status after the occurrence
The IP thread is still running, this has been confirmed with logs and debugger. Other IP services still work.
But it won't touch port 443 again as it has queued connections in its listen list.
According to file nx_tcp_packet_process.c
line 597, the listen socket is checked for NULL:
if ((listen_ptr -> nx_tcp_listen_socket_ptr) &&
((tcp_header_ptr -> nx_tcp_header_word_3 & NX_TCP_RST_BIT) == NX_NULL))
and if it is NULL, according to this comment in line 772:
/* The application needs to call relisten with a new server request to process this queued
connection. */
the IP thread will not process it and depends on the https web server thread to do a re-listen.
The HTTPS web server thread is also still running, it cyclically calls _nx_tcpserver_relisten
in its event loop.
But the HTTPS web server thread won't perform a re-listen because the socket is in NX_TCP_LISTEN_STATE
.
So we end up with the IP thread not dealing with the new connection because the listen socket has been set to NULL
and the server thread not re-listening because there is still a socket in listen state.
The listen queue's state and the server's socket state don't seem to lign up any more.
TraceX snapshot of an occurrence with annotations
This TraxeX file is attached.
How to demonstrate
Adding the following code snippet after line 209 of nx_tcpserver.c
will log that a dead lock
was triggered and also recover from it by unaccepting and re-listening the socket.
else if(server_ptr -> nx_tcpserver_listen_session -> nx_tcp_session_socket.nx_tcp_socket_state == NX_TCP_LISTEN_STATE)
{
struct NX_TCP_LISTEN_STRUCT *listen_ptr;
listen_ptr = server_ptr -> nx_tcpserver_ip -> nx_ip_tcp_active_listen_requests;
if (listen_ptr)
{
do
{
if (listen_ptr -> nx_tcp_listen_socket_ptr == NULL &&
listen_ptr->nx_tcp_listen_port == server_ptr -> nx_tcpserver_listen_session -> nx_tcp_session_socket.nx_tcp_socket_port &&
listen_ptr->nx_tcp_listen_queue_current > 0)
{
_nx_trace_event_insert(5000,
listen_ptr -> nx_tcp_listen_socket_ptr,
server_ptr -> nx_tcpserver_listen_session -> nx_tcp_session_socket.nx_tcp_socket_state,
listen_ptr->nx_tcp_listen_queue_current, 0,
NX_TRACE_ALL_EVENTS, 0, 0);
SEGGER_RTT_printf(0, "Got you! Connection dead-locked on port %d (queue: %d of %d)!!!!!!!!!!!!!!!!!!!!!!\n", listen_ptr->nx_tcp_listen_port, listen_ptr->nx_tcp_listen_queue_current, listen_ptr->nx_tcp_listen_queue_maximum);
//assert(listen_ptr -> nx_tcp_listen_socket_ptr);
// Recover from dead-lock:
nx_tcp_server_socket_unaccept(&server_ptr -> nx_tcpserver_listen_session -> nx_tcp_session_socket);
status = nx_tcp_server_socket_relisten(server_ptr -> nx_tcpserver_ip,
server_ptr -> nx_tcpserver_listen_port,
&server_ptr -> nx_tcpserver_listen_session -> nx_tcp_session_socket);
if((status != NX_SUCCESS) && (status != NX_CONNECTION_PENDING))
{
SEGGER_RTT_printf(0, "%d, %d\n", status, __LINE__);
return NX_TCPSERVER_FAIL;
}
}
/* Move to the next listen request. */
listen_ptr = listen_ptr -> nx_tcp_listen_next;
} while (listen_ptr != server_ptr -> nx_tcpserver_ip -> nx_ip_tcp_active_listen_requests);
}
}
tracex2.zip