Git Product home page Git Product logo

Comments (6)

azat avatar azat commented on June 18, 2024

Thank you for such a detailed report!
Your analysis looks correct.

I don't remember the code precisely, but likely this simple fix will not work in multi-threaded environment, when events of the event_base could be processed from multiple threads, since in this case this deferred callback can be runned in parallel with evdns_base_clear_nameservers_and_suspend, and this means that you cannot modify any members of it.

It is "ok" right now because pending_cb checked/modified under evdns_base lock, but actually after a brief look there are more problems here, since you may get use-after-free because reply_run_callback could be already executed and free then handler, while you will look at some of it's fields...

This can be fixed by adding some refcnt (like the code for bufferevent does), but this will require locks right now (since atomic is "on pause" - #1378), though unlikely this will be a hot path

We are using an older version of libsdwan but, as far as I can see, the current latest upstream version seems to have the same issue.

Out of curiosity, what is libsdwan?

from libevent.

stablebits avatar stablebits commented on June 18, 2024

Out of curiosity, what is libsdwan?

oh, I meant to say we are using an older version of libevent :-) libsdwan is our internal library.

from libevent.

azat avatar azat commented on June 18, 2024

@stablebits you were writing about the fix, any progress? or we can close this?

from libevent.

stablebits avatar stablebits commented on June 18, 2024

@azat my last comment was incorrect and because there was no reaction from you yet, I simply deleted it.

Below is my initial take (I got distracted by other activitie, so this hasn't been tested yet):

evdns-fix.patch

Comments:

evdns_base_clear_nameservers_and_suspend() -> evdns_cancel_request() is accessing probe_request (struct evdns_request) while holding the base lock. They see probe_request through server->probe_request, which suggests that nameserver_probe_callback() hasn't ran yet. Otherwise, server->probe_request would have been NULL. nameserver_probe_callback() resets it while holding ns->base, which is the same base lock. As long as reply_run_callback() always runs some handle->user_callback (in our case nameserver_probe_callback()) before destroying evdns_request, at least the object itself should still exist if server->probe_request != NULL.

However, there are 2 issues. As you mentioned, while evdns_base_clear_nameserver_and_suspend() -> evdns_cancel_request() is accessing probe_request, reply_run_callback() may already be running (not just scheduled to run) and so:

  1. It accesses handle->* fields without holding the base lock, which means that evdns_base_clear_* can't change them safely.
  2. nameserver_probe_callback() may be waiting for the base lock here (*):
     if (result == DNS_ERR_CANCEL) {
         /* We canceled this request because the nameserver came up
          * for some other reason.  Do not change our opinion about
          * the nameserver. */
         return;
     }

     EVDNS_LOCK(ns->base);     <===== Waiting here (*)
     ns->probe_request = NULL;
     if (result == DNS_ERR_NONE || result == DNS_ERR_NOTEXIST) {
         /* this is a good reply */
         nameserver_up(ns);
     } else {
[...]

ready to access the destroyed ns object once the base lock becomes available.

In order to address these 2 issues, this patch wraps the code accessing handle and calinghandle->user_callback in reply_run_callback with the base lock.

It may well be that I missed some cases though. We'll run this code on our systems for a 1-2 weeks to see whether it solves the problem (there are a few crashes per day on average).

from libevent.

azat avatar azat commented on June 18, 2024

Hi @stablebits, sorry for the delay I couldn't allocate time for libevent for quit a long time...

evdns_base_clear_nameservers_and_suspend() -> evdns_cancel_request() is accessing probe_request (struct evdns_request) while holding the base lock. They see probe_request through server->probe_request, which suggests that nameserver_probe_callback() hasn't ran yet. Otherwise, server->probe_request would have been NULL. nameserver_probe_callback() resets it while holding ns->base, which is the same base lock. As long as reply_run_callback() always runs some handle->user_callback (in our case nameserver_probe_callback()) before destroying evdns_request, at least the object itself should still exist if server->probe_request != NULL.

That is correct.

In order to address these 2 issues, this patch wraps the code accessing handle and calinghandle->user_callback in reply_run_callback with the base lock.

Yes, your analysis looks correct.

Can you please submit your patch as a PR? With a few comments:

  • reply_run_callback don't need user_pointer anymore
  • nameserver_probe_callback do not need lock anymore (since reply_run_callback already holds it)

Apart from it, great analysis, thank you!

from libevent.

stablebits avatar stablebits commented on June 18, 2024

@azat here is a pull request #1561, thanks!

from libevent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.