Git Product home page Git Product logo

Comments (11)

ask avatar ask commented on August 27, 2024 1

Thanks for looking into this! Clearly a bug

I merged #55 for now as that one is easier to reason about, but hope to merge #56 as well

from mode.

jbooth-mastery avatar jbooth-mastery commented on August 27, 2024

Also broken on: 3.8.0, using just the builtin examples, on master:

jbooth@m /home/jbooth/oss/mode (master) $ python examples/tutorial.py
[2020-01-02 10:54:16,276] [16560] [INFO] [^Worker]: Starting... 
[2020-01-02 10:54:16,276] [16560] [INFO] [^-App]: Starting... 
[2020-01-02 10:54:16,333] [16560] [WARNING] WRITING GRAPH TO image.png 
[2020-01-02 10:54:16,354] [16560] [INFO] [^--Websockets]: Starting... 
[2020-01-02 10:54:16,354] [16560] [WARNING] STARTING WEBSOCKET SERVER 
[2020-01-02 10:54:16,354] [16560] [INFO] [^--Webserver]: Starting... 
[2020-01-02 10:54:16,355] [16560] [WARNING] examples/tutorial.py:53: DeprecationWarning: Application.make_handler(...) is deprecated, use AppRunner API instead
  handler = self._handler = self._app.make_handler() 
[2020-01-02 10:54:16,356] [16560] [INFO] [^--Webserver]: Serving on port 8000 
[2020-01-02 10:54:17,357] [16560] [INFO] [^--UserCache]: Starting... 
^C-INT- -INT- -INT- -INT- -INT- -INT-
[2020-01-02 10:54:25,363] [16560] [INFO] [^Worker]: Signal received: Signals.SIGINT (2) 
[2020-01-02 10:54:25,364] [16560] [INFO] [^Worker]: Stopping... 
[2020-01-02 10:54:25,364] [16560] [INFO] [^-App]: Stopping... 
[2020-01-02 10:54:25,365] [16560] [INFO] [^--UserCache]: Stopping... 
[2020-01-02 10:54:25,366] [16560] [INFO] [^--Webserver]: Stopping... 
[2020-01-02 10:54:25,372] [16560] [INFO] [^--Webserver]: Closing server 
[2020-01-02 10:54:25,372] [16560] [INFO] [^--Webserver]: Waiting for server to close handle 
[2020-01-02 10:54:25,373] [16560] [INFO] [^--Webserver]: Shutting down web application 
[2020-01-02 10:54:25,373] [16560] [INFO] [^--Webserver]: Waiting for handler to shut down 
[2020-01-02 10:54:25,374] [16560] [INFO] [^--Webserver]: Cleanup 
^C-INT- -INT- -INT- -INT- -INT- -INT-
[2020-01-02 10:55:17,589] [16560] [INFO] [^--Websockets]: Stopping... 
[2020-01-02 10:55:17,589] [16560] [INFO] [^Worker]: Gathering service tasks... 
[2020-01-02 10:55:17,589] [16560] [INFO] [^Worker]: Gathering all futures... 
[2020-01-02 10:55:18,591] [16560] [INFO] [^Worker]: Closing event loop 
^CException ignored in: <module 'threading' from '/home/jbooth/.asdf/installs/python/3.8.0/lib/python3.8/threading.py'>
Traceback (most recent call last):
  File "/home/jbooth/.asdf/installs/python/3.8.0/lib/python3.8/threading.py", line 1388, in _shutdown
    lock.acquire()
KeyboardInterrupt: 
jbooth@m /home/jbooth/oss/mode (master) $ 

from mode.

jbooth-mastery avatar jbooth-mastery commented on August 27, 2024

Also broken in python 3.6.9, but it exits fine in python 3.6.3. Aha! Going to trace that down to a single python version.

from mode.

jbooth-mastery avatar jbooth-mastery commented on August 27, 2024

3.6.6 passes. 3.6.7 fails, requiring two ctrl-C's. 3.6.7 passes if I let it sit and idle long enough before the ctrl-C?

3.6.7 is heisenbug. Sometimes ctrl-C exits cleanly. Sometimes not. This is true back to at least 3.6.3.

from mode.

jbooth-mastery avatar jbooth-mastery commented on August 27, 2024

Okay, having walked away in the middle of a test, here's what I've learned: there's two branch points for what looks like failure.

An early ctrl-C will result in a hanging process, for exactly one minute. In general, in the example, if you let it run until it prints the REMOVING EXPIRED USERS line and quickly ctrl-C you can avoid the first hang. If you hit the first hang it is between these two:

jbooth@m /home/jbooth/oss/mode (master) $ python --version;python examples/tutorial.py
Python 3.7.4
[2020-01-02 12:42:47,116] [10916] [INFO] [^Worker]: Starting... 
[2020-01-02 12:42:47,116] [10916] [INFO] [^-App]: Starting... 
[2020-01-02 12:42:47,178] [10916] [WARNING] WRITING GRAPH TO image.png 
[2020-01-02 12:42:47,198] [10916] [INFO] [^--Websockets]: Starting... 
[2020-01-02 12:42:47,198] [10916] [WARNING] STARTING WEBSOCKET SERVER 
[2020-01-02 12:42:47,199] [10916] [INFO] [^--Webserver]: Starting... 
[2020-01-02 12:42:47,199] [10916] [WARNING] examples/tutorial.py:53: DeprecationWarning: Application.make_handler(...) is deprecated, use AppRunner API instead
  handler = self._handler = self._app.make_handler() 
[2020-01-02 12:42:47,201] [10916] [INFO] [^--Webserver]: Serving on port 8000 
^C[2020-01-02 12:42:47,800] [10916] [INFO] [^--UserCache]: Starting... 
-INT- -INT- -INT- -INT- -INT- -INT-
[2020-01-02 12:42:47,801] [10916] [INFO] [^Worker]: Signal received: Signals.SIGINT (2) 
[2020-01-02 12:42:47,802] [10916] [INFO] [^Worker]: Stopping... 
[2020-01-02 12:42:47,802] [10916] [INFO] [^-App]: Stopping... 
[2020-01-02 12:42:47,803] [10916] [INFO] [^--UserCache]: Stopping... 
[2020-01-02 12:42:47,804] [10916] [INFO] [^--Webserver]: Stopping... 
[2020-01-02 12:42:48,203] [10916] [INFO] [^--Webserver]: Closing server 
[2020-01-02 12:42:48,203] [10916] [INFO] [^--Webserver]: Waiting for server to close handle 
[2020-01-02 12:42:48,204] [10916] [INFO] [^--Webserver]: Shutting down web application 
[2020-01-02 12:42:48,204] [10916] [INFO] [^--Webserver]: Waiting for handler to shut down 
[2020-01-02 12:42:48,205] [10916] [INFO] [^--Webserver]: Cleanup 
... a minute passes ...
[2020-01-02 12:43:47,865] [10916] [INFO] [^--Websockets]: Stopping... 

After that timeout, there's a second failure point, based around the threading changes in 3.7.4. In 3.7.4 and later with those changes, we hang as the interpreter is trying to exit:

[2020-01-02 12:43:47,865] [10916] [INFO] [^--Websockets]: Stopping... 
[2020-01-02 12:43:47,866] [10916] [INFO] [^Worker]: Gathering service tasks... 
[2020-01-02 12:43:47,866] [10916] [INFO] [^Worker]: Gathering all futures... 
[2020-01-02 12:43:48,869] [10916] [INFO] [^Worker]: Closing event loop 
... the interpreter hangs (hard) here ...

and then hang what looks like indefinitely (multi-minute) until a second ctrl-C is thrown, at which point the exception in the threading module happens:

[2020-01-02 12:43:47,865] [10916] [INFO] [^--Websockets]: Stopping... 
[2020-01-02 12:43:47,866] [10916] [INFO] [^Worker]: Gathering service tasks... 
[2020-01-02 12:43:47,866] [10916] [INFO] [^Worker]: Gathering all futures... 
[2020-01-02 12:43:48,869] [10916] [INFO] [^Worker]: Closing event loop 
^CException ignored in: <module 'threading' from '/home/jbooth/.asdf/installs/python/3.7.4/lib/python3.7/threading.py'>
Traceback (most recent call last):
  File "/home/jbooth/.asdf/installs/python/3.7.4/lib/python3.7/threading.py", line 1308, in _shutdown
    lock.acquire()
KeyboardInterrupt
jbooth@m /home/jbooth/oss/mode (master) $ date
Thu 02 Jan 2020 12:50:10 PM CST

from mode.

jbooth-mastery avatar jbooth-mastery commented on August 27, 2024

Debugging this is super confusing in part because mode's WorkerThread is overriding threading.Thread's _is_stopped attribute. This means if you look at the thread state in the repl/pdb, it will erroneously report the thread as stopped because self._is_stopped is a threading.Event which is not falsey. That seems to make WorkerThread.stop incorrect -- self.is_alive() is always false, so it'll never execute the self.join()

from mode.

jbooth-mastery avatar jbooth-mastery commented on August 27, 2024

Trying to get better debugging by running the example under python -X dev examples/tutorial.py I can't even get it launched. Python throws RuntimeErrors about non-thread-safe asyncio loop usage.

from mode.

jbooth-mastery avatar jbooth-mastery commented on August 27, 2024

After fixing the thread safety issues, it tentatively looks like the thread_loop isn't getting properly stopped and closed. When it isn't closed, the thread doesn't properly stop. The unstopped thread prevents interpreter exit. This doesn't fix @r313pp's crashing example (although it also not properly exits post-crash via ctrl-c!), but it does fix the tutorial so it properly exits on ctrl-c. It is causing test failures I haven't yet investigated. I'll try to hit both of those, probably next week, and get a branch and PR up.

diff --git a/mode/threads.py b/mode/threads.py
index 4a5c7ca..635d5c8 100644
--- a/mode/threads.py
+++ b/mode/threads.py
@@ -189,6 +189,9 @@ class ServiceThread(Service):
             # shutdown here, since _shutdown_thread will not execute.
             self.set_shutdown()
             raise
+        finally:
+            self.loop.stop()
+            self.loop.close()
 
     async def stop(self) -> None:
         if self._started.is_set():
@@ -208,8 +211,6 @@ class ServiceThread(Service):
         await self.on_thread_stop()
         self.set_shutdown()
         await self._default_stop_futures()
-        if self._thread is not None:
-            self._thread.stop()
         await self._default_stop_exit_stacks()
 
     async def _serve(self) -> None:
diff --git a/mode/utils/futures.py b/mode/utils/futures.py
index 0f8e45a..1e280d5 100644
--- a/mode/utils/futures.py
+++ b/mode/utils/futures.py
@@ -155,7 +155,7 @@ def maybe_set_result(fut: Optional[asyncio.Future],
                      result: Any) -> bool:
     """Set future result if not already done."""
     if fut is not None and not fut.done():
-        fut.set_result(result)
+        notify(fut, result)
         return True
     return False
 
@@ -165,4 +165,5 @@ def notify(fut: Optional[asyncio.Future], result: Any = None) -> None:
     # can be used to turn a Future into a lockless, single-consumer condition,
     # for multi-consumer use asyncio.Condition
     if fut is not None and not fut.done():
-        fut.set_result(result)
+        loop = fut.get_loop()
+        loop.call_soon_threadsafe(fut.set_result, result)
diff --git a/mode/utils/locks.py b/mode/utils/locks.py
index 5c42ee4..eba6e64 100644
--- a/mode/utils/locks.py
+++ b/mode/utils/locks.py
@@ -45,12 +45,13 @@ class Event:
         All coroutines waiting for it to become true are awakened.
         Coroutine that call wait() once the flag is true will not block at all.
         """
+        from .futures import notify
+
         if not self._value:
             self._value = True
 
-            for fut in self._waiters:
-                if not fut.done():
-                    fut.set_result(True)
+            for fut in self._waiters.copy():
+                notify(fut, True)
 
     def clear(self) -> None:
         """Reset the internal flag to false.

from mode.

r313pp avatar r313pp commented on August 27, 2024

@jbooth-mastery have you tried my PR #55?
Although I haven't addressed correct loop closing problem(may be I'll do it by the end of this week), but threadsafe async calls and thread hang are fixed, at least in my cases.

from mode.

jbooth-mastery avatar jbooth-mastery commented on August 27, 2024

I'd missed your branch. It doesn't seem to work correctly for me:

jbooth@m /home/jbooth/oss/mode (service-thread-hangs-fix) $ python --version;python -X dev -W default examples/tutorial.py 
Python 3.8.0
/home/jbooth/oss/mode/.direnv/python-3.8.0/lib/python3.8/site-packages/aiohttp/helpers.py:107: DeprecationWarning: "@coroutine" decorator is deprecated since Python 3.8, use "async def" instead
  def noop(*args, **kwargs):  # type: ignore
[2020-01-06 09:29:17,035] [13861] [INFO] [^Worker]: Starting... 
[2020-01-06 09:29:17,036] [13861] [INFO] [^-App]: Starting... 
[2020-01-06 09:29:17,110] [13861] [WARNING] WRITING GRAPH TO image.png 
[2020-01-06 09:29:17,136] [13861] [INFO] [^--Websockets]: Starting... 
[2020-01-06 09:29:17,136] [13861] [WARNING] STARTING WEBSOCKET SERVER 
[2020-01-06 09:29:17,137] [13861] [WARNING] Executing <Task pending name='Task-1' coro=<Service.start() running at /home/jbooth/oss/mode/mode/services.py:732> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7fcc01a2d510>()] created at /home/jbooth/oss/mode/mode/threads.py:160> cb=[_run_until_complete_cb() at /home/jbooth/.asdf/installs/python/3.8.0/lib/python3.8/asyncio/base_events.py:180] created at /home/jbooth/oss/mode/mode/worker.py:263> took 0.103 seconds 
[2020-01-06 09:29:17,138] [13861] [INFO] [^--Webserver]: Starting... 
[2020-01-06 09:29:17,138] [13861] [WARNING] examples/tutorial.py:53: DeprecationWarning: Application.make_handler(...) is deprecated, use AppRunner API instead
  handler = self._handler = self._app.make_handler() 
[2020-01-06 09:29:17,141] [13861] [INFO] <Server sockets=(<asyncio.TransportSocket fd=9, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_STREAM, proto=6, laddr=('::', 8000, 0, 0)>, <asyncio.TransportSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('0.0.0.0', 8000)>)> is serving 
[2020-01-06 09:29:17,141] [13861] [INFO] [^--Webserver]: Serving on port 8000 
[2020-01-06 09:29:17,141] [13861] [WARNING] /home/jbooth/oss/mode/mode/services.py:719: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
  done, pending = await asyncio.wait( 
[2020-01-06 09:29:17,142] [13861] [INFO] [^--UserCache]: Starting... 
[2020-01-06 09:29:17,142] [13861] [WARNING] /home/jbooth/oss/mode/mode/services.py:625: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
  await asyncio.wait_for( 
^C-INT- -INT- -INT- -INT- -INT- -INT-
[2020-01-06 09:29:23,735] [13861] [INFO] [^Worker]: Signal received: Signals.SIGINT (2) 
[2020-01-06 09:29:23,736] [13861] [INFO] [^Worker]: Stopping... 
[2020-01-06 09:29:23,737] [13861] [INFO] [^-App]: Stopping... 
[2020-01-06 09:29:23,738] [13861] [INFO] [^--UserCache]: Stopping... 
[2020-01-06 09:29:23,740] [13861] [WARNING] /home/jbooth/oss/mode/mode/services.py:903: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
  await asyncio.wait( 
[2020-01-06 09:29:23,746] [13861] [INFO] [^--Webserver]: Stopping... 
[2020-01-06 09:29:23,747] [13861] [ERROR] [^-App]: Error while stopping child <Webserver: stopping >: RuntimeError('Non-thread-safe operation invoked on an event loop other than the current one') 
Traceback (most recent call last):
  File "/home/jbooth/oss/mode/mode/services.py", line 854, in _default_stop_children
    await asyncio.shield(child.stop())
  File "/home/jbooth/oss/mode/mode/threads.py", line 196, in stop
    await super().stop()
  File "/home/jbooth/oss/mode/mode/services.py", line 832, in stop
    self._stopped.set()
  File "/home/jbooth/oss/mode/mode/utils/locks.py", line 53, in set
    fut.set_result(True)
  File "/home/jbooth/.asdf/installs/python/3.8.0/lib/python3.8/asyncio/base_events.py", line 713, in call_soon
    self._check_thread()
  File "/home/jbooth/.asdf/installs/python/3.8.0/lib/python3.8/asyncio/base_events.py", line 750, in _check_thread
    raise RuntimeError(
RuntimeError: Non-thread-safe operation invoked on an event loop other than the current one
[2020-01-06 09:29:23,749] [13861] [INFO] [^--Websockets]: Stopping... 
[2020-01-06 09:29:23,751] [13861] [INFO] [^Worker]: Gathering service tasks... 
[2020-01-06 09:29:23,751] [13861] [INFO] [^Worker]: Gathering all futures... 
/home/jbooth/oss/mode/mode/worker.py:318: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
  await asyncio.sleep(1.0, loop=self.loop)
[2020-01-06 09:29:24,753] [13861] [INFO] [^Worker]: Closing event loop 
... process hangs ...

I'm going to try to fix the tests I broke and then will look at merging your changes into my branch and will put it up as a PR. I'm hoping to get that all done today or tomorrow as this bug is a blocker for my team.

from mode.

jbooth-mastery avatar jbooth-mastery commented on August 27, 2024

I'll update the conflicts out of it and let's see how it looks?

from mode.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.