Comments (22)
I don't think this is a false positive. Thread sanitizer is correct that this code is a data race. It's a data race where you ignore the result of the race, but C says that data races are undefined behavior whether or not you ignore the result.
The issue isn't that TSan doesn't understand the object is read-only. It's not read-only. It is logically read-only, but the next
and prev
pointers are legitimately not read-only. TSan just doesn't understand that you then ignore the bad reads:
Lines 163 to 166 in 1977c00
But TSan is correct to not understand that because C's rules don't care.
from openssl.
While that will silence the report, it won't fix the bug in OpenSSL. The data race is still undefined behavior in the language.
from openssl.
Thanks for the detailed report (link to log helped). I think this is kind of false positive. I must admit I don't all details how thread sanitizer works internally. So my analysis here might be quite wrong.
The mutex M2
is rw-lock whcih belongs to SSL context (according to this T-SAN output here):
E Mutex M2 (0x721000275940) created at:
E #0 pthread_rwlock_init <null> (clickhouse+0x723e9bd) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #1 CRYPTO_THREAD_lock_new build_docker/./contrib/openssl/crypto/threads_pthread.c:54:9 (clickhouse+0x2044f271) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #2 SSL_CTX_new_ex build_docker/./contrib/openssl/ssl/ssl_lib.c:3883:17 (clickhouse+0x20244a35) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #3 SSL_CTX_new build_docker/./contrib/openssl/ssl/ssl_lib.c:4120:12 (clickhouse+0x2024533d) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #4 Poco::Net::Context::createSSLContext() build_docker/./base/poco/NetSSL_OpenSSL/src/Context.cpp (clickhouse+0x1d240abc) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
</snip>
Mutexes M0
and M1
don't belong to OpenSSL they seem to be created by DB application.
E Mutex M1 (0x722c0000a800) created at:
E #0 pthread_mutex_init <null> (clickhouse+0x723d7bf) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #1 std::__1::__libcpp_recursive_mutex_init[abi:v15000](pthread_mutex_t*) build_docker/./contrib/llvm-project/libcxx/include/__threading_support:269:10 (clickhouse+0x2082e9a4) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #2 std::__1::recursive_mutex::recursive_mutex() build_docker/./contrib/llvm-project/libcxx/src/mutex.cpp:61:14 (clickhouse+0x2082e9a4)
E #3 Poco::Net::SecureSocketImpl::SecureSocketImpl(Poco::AutoPtr<Poco::Net::SocketImpl>, Poco::AutoPtr<Poco::Net::Context>) build_docker/./base/poco/NetSSL_OpenSSL/src/SecureSocketImpl.cpp:64:19 (clickhouse+0x1d2591a3) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #4 Poco::Net::SecureStreamSocketImpl::SecureStreamSocketImpl(Poco::AutoPtr<Poco::Net::Context>) build_docker/./base/poco/NetSSL_OpenSSL/src/SecureStreamSocketImpl.cpp:26:2 (clickhouse+0x1d25f8db) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #5 Poco::Net::SecureStreamSocket::SecureStreamSocket() build_docker/./base/poco/NetSSL_OpenSSL/src/SecureStreamSocket.cpp:30:19 (clickhouse+0x1d25c987) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #6 std::__1::__unique_if<Poco::Net::SecureStreamSocket>::__unique_single std::__1::make_unique[abi:v15000]<Poco::Net::SecureStreamSocket>() build_docker/./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:714:32 (clickhouse+0x195c81ee) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #7 DB::Connection::connect(DB::ConnectionTimeouts const&) build_docker/./src/Client/Connection.cpp:128:26 (clickhouse+0x195c81ee)
and stack here belongs to M0
:
E Mutex M0 (0x722c0016c010) created at:
E #0 pthread_mutex_init <null> (clickhouse+0x723d7bf) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #1 std::__1::__libcpp_recursive_mutex_init[abi:v15000](pthread_mutex_t*) build_docker/./contrib/llvm-project/libcxx/include/__threading_support:269:10 (clickhouse+0x2082e9a4) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #2 std::__1::recursive_mutex::recursive_mutex() build_docker/./contrib/llvm-project/libcxx/src/mutex.cpp:61:14 (clickhouse+0x2082e9a4)
E #3 Poco::Net::SecureSocketImpl::SecureSocketImpl(Poco::AutoPtr<Poco::Net::SocketImpl>, Poco::AutoPtr<Poco::Net::Context>) build_docker/./base/poco/NetSSL_OpenSSL/src/SecureSocketImpl.cpp:64:19 (clickhouse+0x1d2591a3) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #4 Poco::Net::SecureStreamSocketImpl::SecureStreamSocketImpl(Poco::AutoPtr<Poco::Net::Context>) build_docker/./base/poco/NetSSL_OpenSSL/src/SecureStreamSocketImpl.cpp:26:2 (clickhouse+0x1d25f8db) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #5 Poco::Net::SecureStreamSocket::SecureStreamSocket() build_docker/./base/poco/NetSSL_OpenSSL/src/SecureStreamSocket.cpp:30:19 (clickhouse+0x1d25c987) (BuildId: a38098db2d267457b06b8d4a8b29e4d4d8b98a0a)
E #6 std::__1::__unique_if<Poco::N
</snip>
So what really matters to OpenSSL is M2
(our mutex). The mutex is held as we insert session to cache created by thread T672
. The matching code which does that can be found here:
712 int SSL_CTX_add_session(SSL_CTX *ctx, SSL_SESSION *c)
713 {
714 int ret = 0;
715 SSL_SESSION *s;
716
717 /*
718 * add just 1 reference count for the SSL_CTX's session cache even though
719 * it has two ways of access: each session is in a doubly linked list and
720 * an lhash
721 */
722 SSL_SESSION_up_ref(c);
723 /*
724 * if session c is in already in cache, we take back the increment later
725 */
726
727 if (!CRYPTO_THREAD_write_lock(ctx->lock)) {
728 SSL_SESSION_free(c);
729 return 0;
730 }
731 s = lh_SSL_SESSION_insert(ctx->sessions, c);
732
733 /*
734 * s != NULL iff we already had a session with the given PID. In this
735 * case, s == c should hold (then we did not really modify
736 * ctx->sessions), or we're in trouble.
737 */
So this where we grab the lock for SSL context (a.k.a. M2
) and insert a chunk of memory (session) into global list (a.k.a. session cache). What's worth to note is that function lh_SSL_SESSION_insert()
modifies link members (prev, next pointers) of session which is inserted to cache. The modification happens under M2
protection and thread sanitizer makes a note of it (I think).
Later there is a thread T674
which calls a dup on session created by T672
. I'm sure the look up operation did happen after the protection M2
. However as soon as T674
finds the session it grabs the reference and drops M2
. Because the session is immutable. The memcpy() operation copies the whole session including prev,next pointers which were modified under protection of M2
when T672
was calling lh_SSL_SESSION_insert()
. And that's the why the T-san assert is triggered.
So thread sanitizer is right we are reading memory without holding M2
but this is harmful here because thread sanitizer does not understand the object is read-only here. May be we can copy the object member by member avoiding to copy {prev, next} members instead of doing memcpy(), to make thread sanitizer happy. don't know if it would help. At least that's my understanding of the report.
from openssl.
Be careful also with struct padding. Since actually quite a lot of fields that can't be copied (e.g. all the allocated buffers), it might make sense to just move the copyable fields into a smaller embedded struct and memcpy that. Or just copy field by field if there aren't many of them.
from openssl.
@rschu1ze thank you for testing.
@mattcaswell thanks I had forgotten about the refcount. Also to @davidben's comment, can you elaborate on which buffers may be uncopyable? From my read that might include (but not be limited to):
- psk_identity_hint
- psk_identity
- srp_username
- ticket_appdata
Do those need to be ignored? Also do we need to up_ref things like cipher, peer_chain, peer_rpk, etc to avoid having those values freed early when a duplicate session holds them?
@rschu1ze there may be a subsequent test I will ask you to do based on the answer to the above questions
from openssl.
Yepp, I'll push #24629 (comment) tomorrow (too tired now). But since we (ClickHouse) are mere users of OpenSSL you or other OpenSSL experts might want to double-check.
from openssl.
I'm not sure if this helps you, but since you appear to be using the aws python sdk, we recently closed this issue:
#24480
Which is in a completely different location, but also related to the aws python sdk, in which we found that parts of the sdk were creating SSL_CTX objects that it was sharing among SSL objects (which is fine) but mutating the shared objects while they were in use in other threads (which is not).
Not sure if it relates here, but it might be worth looking at.
from openssl.
We might try to provide thread sanitizer with a hint to ignore ssl_session_dup build()
. this might help us with clang's sanitizer.
from openssl.
Could we move the next/prev members to the end of the structure and memcpy everything but those pointers?
from openssl.
Best to avoid the refcount too, since that's also synchronized.
from openssl.
FWIW, I'm not sure of the feasibility, but RCU might be a solution here. the lock/unlock functions for rcu are already annotated to convince tsan that there is no race condition here. The addition of a read side lock should be effectively no performance impact (ideally), but it sounds like a reasonably easy replacement.
from openssl.
I think RCU is way, way overkill here. There is no complex synchronization or anything going on here. Just an overbroad convenience memcpy that never needed to copy the offending fields in the first place.
from openssl.
If the memcpy can be constrained to only touch fields that aren't protected on the write side, then yes, I would agree, but I'm not currently looking at the code, so I'm not sure of the feasibility there.
from openssl.
If the memcpy can be constrained to only touch fields that aren't protected on the write side, then yes, I would agree, but I'm not currently looking at the code, so I'm not sure of the feasibility there.
As noted above I think this can be done simply by moving the offending fields to the end of the structure.
from openssl.
I made a temporary fix (since this blocks the next release of ClickHouse), and would replace it by a more comprehensive fix once available.
from openssl.
diff --git a/ssl/ssl_local.h b/ssl/ssl_local.h
index 9083ec2f3b..3a96c6a136 100644
--- a/ssl/ssl_local.h
+++ b/ssl/ssl_local.h
@@ -540,11 +540,6 @@ struct ssl_session_st {
* load the 'cipher' structure */
unsigned int kex_group; /* TLS group from key exchange */
CRYPTO_EX_DATA ex_data; /* application specific data */
- /*
- * These are used to make removal of session-ids more efficient and to
- * implement a maximum cache size.
- */
- struct ssl_session_st *prev, *next;
struct {
char *hostname;
@@ -574,6 +569,12 @@ struct ssl_session_st {
size_t ticket_appdata_len;
uint32_t flags;
SSL_CTX *owner;
+
+ /*
+ * These are used to make removal of session-ids more efficient and to
+ * implement a maximum cache size.
+ */
+ struct ssl_session_st *prev, *next;
};
/* Extended master secret support */
diff --git a/ssl/ssl_sess.c b/ssl/ssl_sess.c
index d326eadb04..67b342df49 100644
--- a/ssl/ssl_sess.c
+++ b/ssl/ssl_sess.c
@@ -138,7 +138,7 @@ static SSL_SESSION *ssl_session_dup_intern(const SSL_SESSION *src, int ticket)
dest = OPENSSL_malloc(sizeof(*dest));
if (dest == NULL)
return NULL;
- memcpy(dest, src, sizeof(*dest));
+ memcpy(dest, src, sizeof(*dest) - sizeof(SSL_SESSION *) * 2);
/*
* Set the various pointers to NULL so that we can call SSL_SESSION_free in
@rschu1ze would you be willing to test the above patch? Its based on the suggestion @mattcaswell made above to move the prev/next pointers to the end of the struct and avoid copying them in dup_intern.
from openssl.
Thanks. Yes, I will test in a moment.
from openssl.
@nhorman - @davidben also suggested moving the refcount too
from openssl.
The patch in #24629 (comment) resolves the problem.
from openssl.
Side note: This is slightly ridiculous but I just discovered that my colleague @alesapin noticed and fixed the same issue four years ago --> ClickHouse#6 I guess the issue was not reported upstream back then. This was shortly before ClickHouse migrated to boringssl and recently back to OpenSSL.
from openssl.
ok, @rschu1ze is the implication there that you have a patch for this locally? Care to push it to a PR here so we can complete this?
from openssl.
associated pr complete, closing. Thank you!
from openssl.
Related Issues (20)
- OSSL_STORE_LOADER_free() has conflicting documentation HOT 5
- iteration 19 evokes undefined behaviour [gentoo] HOT 7
- Unable to load PKCS7 object with EC key HOT 2
- Error finalizing cipher loop" when running openssl speed -evp -decrypt for AES-GCM HOT 7
- Issue while installing the riscv-pk -- ../pk/pk.c:188: Error: unknown CSR `senvcfg' HOT 2
- Unable to install OpenSSL on Wndows 11 HOT 4
- s_client seems not using client certificate when connecting via proxy HOT 5
- v3.3.0: CMS + EC keys + PKCS#11 does not work HOT 4
- no NDK arm-linux-androideabi-gcc on $PATH at (eval 12) line 143 error in windows and make[1]: *** [Makefile:3285: apps/libapps.a] Error 127 error when using wsl in windows HOT 10
- Provide a way to access OpenSSL source code via plain HTTP now that ftp.openssl.org is no more HOT 8
- Are RC5 and MDC2 algorithms are disabled by default in OpenSSL 3.3.0? HOT 1
- s_client/s_server: Read PSK from file instead of taking it as CLI parameter HOT 3
- How to visualize providers function call paths?
- 3.3.1: ${prefix} missing in /usr/lib/pkg-config HOT 13
- Base 64 decoding truncation
- memory leak in OPENSSL_config HOT 5
- `apps/openssl.cnf` default is to not enforce TLS. Should default to: enforce TLS HOT 8
- `SSL_get_ex_data_X509_STORE_CTX_idx` does not respect `OSSL_LIB_CTX` HOT 4
- OpenSSL 3.0.8 - How to fallback to default provider when property fips=yes set and FIPS provider is loaded
- OS Zoo CI currently broken HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openssl.