Comments (6)
Thank you for such a detailed report!
Your analysis looks correct.
I don't remember the code precisely, but likely this simple fix will not work in multi-threaded environment, when events of the event_base could be processed from multiple threads, since in this case this deferred callback can be runned in parallel with evdns_base_clear_nameservers_and_suspend
, and this means that you cannot modify any members of it.
It is "ok" right now because pending_cb
checked/modified under evdns_base lock, but actually after a brief look there are more problems here, since you may get use-after-free because reply_run_callback
could be already executed and free then handler, while you will look at some of it's fields...
This can be fixed by adding some refcnt (like the code for bufferevent does), but this will require locks right now (since atomic is "on pause" - #1378), though unlikely this will be a hot path
We are using an older version of libsdwan but, as far as I can see, the current latest upstream version seems to have the same issue.
Out of curiosity, what is libsdwan?
from libevent.
Out of curiosity, what is libsdwan?
oh, I meant to say we are using an older version of libevent :-) libsdwan is our internal library.
from libevent.
@stablebits you were writing about the fix, any progress? or we can close this?
from libevent.
@azat my last comment was incorrect and because there was no reaction from you yet, I simply deleted it.
Below is my initial take (I got distracted by other activitie, so this hasn't been tested yet):
Comments:
evdns_base_clear_nameservers_and_suspend() -> evdns_cancel_request()
is accessing probe_request
(struct evdns_request
) while holding the base
lock. They see probe_request
through server->probe_request
, which suggests that nameserver_probe_callback()
hasn't ran yet. Otherwise, server->probe_request
would have been NULL
. nameserver_probe_callback()
resets it while holding ns->base
, which is the same base
lock. As long as reply_run_callback()
always runs some handle->user_callback
(in our case nameserver_probe_callback()
) before destroying evdns_request
, at least the object itself should still exist if server->probe_request != NULL
.
However, there are 2 issues. As you mentioned, while evdns_base_clear_nameserver_and_suspend() -> evdns_cancel_request()
is accessing probe_request
, reply_run_callback()
may already be running (not just scheduled to run) and so:
- It accesses
handle->*
fields without holding thebase
lock, which means thatevdns_base_clear_*
can't change them safely. nameserver_probe_callback()
may be waiting for thebase
lock here (*):
if (result == DNS_ERR_CANCEL) {
/* We canceled this request because the nameserver came up
* for some other reason. Do not change our opinion about
* the nameserver. */
return;
}
EVDNS_LOCK(ns->base); <===== Waiting here (*)
ns->probe_request = NULL;
if (result == DNS_ERR_NONE || result == DNS_ERR_NOTEXIST) {
/* this is a good reply */
nameserver_up(ns);
} else {
[...]
ready to access the destroyed ns
object once the base
lock becomes available.
In order to address these 2 issues, this patch wraps the code accessing handle
and calinghandle->user_callback
in reply_run_callback
with the base
lock.
It may well be that I missed some cases though. We'll run this code on our systems for a 1-2 weeks to see whether it solves the problem (there are a few crashes per day on average).
from libevent.
Hi @stablebits, sorry for the delay I couldn't allocate time for libevent for quit a long time...
evdns_base_clear_nameservers_and_suspend() -> evdns_cancel_request() is accessing probe_request (struct evdns_request) while holding the base lock. They see probe_request through server->probe_request, which suggests that nameserver_probe_callback() hasn't ran yet. Otherwise, server->probe_request would have been NULL. nameserver_probe_callback() resets it while holding ns->base, which is the same base lock. As long as reply_run_callback() always runs some handle->user_callback (in our case nameserver_probe_callback()) before destroying evdns_request, at least the object itself should still exist if server->probe_request != NULL.
That is correct.
In order to address these 2 issues, this patch wraps the code accessing handle and calinghandle->user_callback in reply_run_callback with the base lock.
Yes, your analysis looks correct.
Can you please submit your patch as a PR? With a few comments:
reply_run_callback
don't needuser_pointer
anymorenameserver_probe_callback
do not need lock anymore (sincereply_run_callback
already holds it)
Apart from it, great analysis, thank you!
from libevent.
@azat here is a pull request #1561, thanks!
from libevent.
Related Issues (20)
- Libevent 2.2 stable release
- Warnings reported by Infer HOT 2
- Bug in make_ws_frame (ws.c), bug in get_ws_frame (ws.c) HOT 1
- Suggest hash-pin GitHub Actions with sensitive permissions and a Dependency-Update-Tool HOT 8
- Which version is currently most suitable for libevent? HOT 2
- How to get the reqeust queue length of one evhttp_connection? HOT 1
- libevent debugging HOT 1
- [bug] Event activated by event_active on another thread may loss while event timeout HOT 4
- clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [sample/le-proxy] Error 1 HOT 1
- Fix MbedTLS and ZLIB for windows on CI HOT 1
- May bug in evhttp_connection_free_on_completion when the evhttp_connection fail to connect HOT 2
- Security advisory link is broken HOT 2
- find_dependency for MBedTLS failed for LibeventConfig.cmake HOT 1
- Test Failure and Warnings in http/bad_request under SELECT Event Mechanism on Raspberry Pi HOT 2
- install DESTINATIONs inconsistent.
- How to implement a websocket client? HOT 6
- bufferevent_write does not immediately write data to the socket HOT 4
- Use checksec to scan the libevent_pthreads-2.1.so.7.0.1 file FORTIFY item is 'No' HOT 5
- How to deal with bufferevent pointer when used externally? HOT 5
- Upgrading libevent based TCP server implementation from commit id 665d79f1 to ec8d7a5a results in random crashes HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libevent.