Comments (14)
With the help of this post:https://stackoverflow.com/questions/15349997/assertionerror-when-threading-in-python
We found out that the library under test has a subclass of threading.Thread()
but with a different signature. We suspect that the code need to be refactored to behave better. It looks like the bug is not in green, but the fact that green get stuck is concerning.
I suppose it got stuck because the code under test got stuck.
from green.
Fixed in Green 3.3.0, which I just released. 🥳 🎈 🎉
from green.
We got the issue 8 out of 10 times on our CI pipeline. By adding -s 1
to disable multithreading for this directory we passed 5 out of 5 times. So we have a temporary workaround.
from green.
Glad to hear the problem was somewhere else. 😉
from green.
@CleanCut unfortunately I combed over the library that is causing green to freeze and I could only find a small bit of code where we subclass threading.Thread so we can collect the return value of run(). We do not use threads anywhere else in that library and we are not using multiprocessing
either.
I was not able to reproduce locally on macOS either natively or under the same docker container we use in our CI. It is very consistent on our CI system with python 3.9.6.
I see that you wrote green.process.LoggingDaemonlessPool._repopulate_pool()
back in 2015, and that the official Pool
class now has a _repopulate_pool_static()
that was added 2y ago, bypassing your _repopulate_pool()
. I believe that LoggingDaemonlessPool
needs to be updated with its own _repopulate_pool_static()
.
That's my 1AM analysis and might be wrong but so far I really think the bug is in green
.
from green.
I did some experiments and patching green/process.py with no success. I'll investigate further and will let you know.
from green.
I did some experiments and patching green/process.py with no success. I'll investigate further and will let you know.
👍🏻 I'll wait to hear what you find.
from green.
I've patched multiprocessing.process.BaseProcess
to get more context information:
def __init__(self, group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None):
if group is not None:
import inspect
import sys
import traceback
print()
print('current process:', current_process())
print(f'sys.argv: {sys.argv}')
print(
f'group: {group!r}, target: {target}, name: {name}, args: {args}, kwargs: {kwargs}'
)
print('target signature: ', inspect.signature(target))
traceback.print_stack()
initializer_or_finalizer = args[2]
print(
'InitializerOrFinalizer module:',
repr(initializer_or_finalizer.module_part),
'function:',
repr(initializer_or_finalizer.function_part),
)
print()
assert group is None, 'group argument must be None for now'
This is what I get:
current process: <_MainProcess name='MainProcess' parent=None started>
sys.argv: ['/root/.pyenv/versions/3.9.6/lib/python3.9/site-packages/green/__main__.py']
group: <multiprocessing.context.ForkContext object at 0x7ffb5da34250>, target: <function worker at 0x7ffb5d7543a0>, name: None, args: (<multiprocessing.queues.SimpleQueue object at 0x7ffb23f6a1f0>, <multiprocessing.queues.SimpleQueue object at 0x7ffb254337c0>, <green.runner.InitializerOrFinalizer object at 0x7ffb585d75e0>, (), None, True), kwargs: {}
target signature: (inqueue, outqueue, initializer=None, initargs=(), maxtasks=None, wrap_exception=False, finalizer=None, finalargs=())
File "/root/.pyenv/versions/3.9.6/lib/python3.9/threading.py", line 930, in _bootstrap
self._bootstrap_inner()
File "/root/.pyenv/versions/3.9.6/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/root/.pyenv/versions/3.9.6/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/root/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/pool.py", line 513, in _handle_workers
cls._maintain_pool(ctx, Process, processes, pool, inqueue,
File "/root/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/pool.py", line 337, in _maintain_pool
Pool._repopulate_pool_static(ctx, Process, processes, pool,
File "/root/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/pool.py", line 319, in _repopulate_pool_static
w = Process(ctx, target=worker,
File "/root/.pyenv/versions/3.9.6/lib/python3.9/multiprocessing/process.py", line 100, in __init__
traceback.print_stack()
InitializerOrFinalizer module: '' function: ''
Somehow we end up in the init of BaseProcess with a group argument set to ForkContext
which causes an assert to be raised.
I do not have deep experience with the multiprocessing library so my understanding is limited.
The other issue is that this has been very difficult to reproduce. It is very consistently failing under our CI pipeline conditions but changing some elements avoids the issue:
- always works fine on my laptop, either native macOS or with the same container.
- we use a python process wrapper (calls pylint, mypy, green ... through subprocess.run()) and calling green directly, the issue does not show up.
- splitting the library which fails into smaller chunks and everything passes, so I was not able to identify if there is a 'conflict' with some of our code.
Let me know if the data I dumped helps you and if there are other things you would like me to dump. Unfortunately I cannot share a reproducible case since the docker image and our code is private.
from green.
I'll pause my investigation for a few days. I tried many different combinations but I am still unable to pinpoint any specific test or combination of tests that trigger the issue. It seems that the number of tests might be a factor since we have a few hundred tests in that directory.
The next thing I will try is to revert the two threading patches added in python 3.9.6 and see if they are the ones triggering the issue.
from green.
Good news! I was able to reproduce the problem! 🎉
That diff you linked to was super helpful. You were correct that the problem was because of the addition of _repopulate_pool_static
, which broke repopulating pool subprocesses after they have died. Since green
sets the processes to never die, this usually wouldn't be a problem, but somehow your workload is managing to kill all of the subprocesses -- and then since the pool has no workers pulling jobs from the queue, everything just hangs forever.
I was able to reproduce the problem in aa572e1 by adding a -X, --max--maxtasksperchild
option and setting that to 1, and then trying to run more suites than subprocesses that get launched. I hit a hang in Python 3.8 and 3.9, but Python 3.7 works fine.
Now to fix the problem...
from green.
Thanks a lot for the quick fix!!! I'll try 3.3.0 in our CI pipeline tomorrow and will let you know how it goes.
from green.
🤞🏻
from green.
I've tested green 3.3.0 and I can confirm that the AssertionError stack trace is gone, however the tests are still stuck in parallel mode and never return. Passing the -s 1
option to green bypasses the problem so we are not stuck but there is still a problem somewhere.
I'll try to run some more experiments over the weekend. I'm wondering if adding a timeout per test might help me pinpoint which test might cause the situation to trigger.
The good new is that thanks to your new CLI option I was able to reproduce on my laptop under macOS which will help a lot.
green -r -v -s 32 --maxtasksperchild 1
. This will allow me to cycle through experiments much faster.
Here is what I suspect might be happening. We run our CI under Kubernetes where the container is not getting the full CPU capacity of the host, but many processes in the container still believe the process has access to all the CPU/RAM resources. Without -s
being set green is probably setting a number of cores (16) the host is claiming but the container is only allocated 4 k8s CPUs. Setting -s 4
seems to fix the issue as well and TBH doing more than 4 is way overkill so I'll set that as the default to pass to green from our wrapper.
I did confirm that os.cpu_count()
, which Pool()
calls when process is None, is returning 16 on our CI pipelines.
As far as I'm concern green is now even better today than it was last week :-)
Thanks a lot.
from green.
Here is what I suspect might be happening. We run our CI under Kubernetes where the container is not getting the full CPU capacity of the host, but many processes in the container still believe the process has access to all the CPU/RAM resources. Without
-s
being set green is probably setting a number of cores (16) the host is claiming but the container is only allocated 4 k8s CPUs. Setting-s 4
seems to fix the issue as well and TBH doing more than 4 is way overkill so I'll set that as the default to pass to green from our wrapper.
Oh, good. I'm glad you found a workaround. If we can find both a) a way to determine that we are running in k8s (an environment variable that k8s always sets, maybe?), AND b) a way for Python to determine a better CPU number in k8s, then we could have green adjust it's auto cpu count under k8s.
If we do go down this path, let's open a new issue to have the discussion in.
As far as I'm concern green is now even better today than it was last week :-)
Thanks a lot.
You are most welcome!
from green.
Related Issues (20)
- django.db.utils.ProgrammingError: relation "users_user" does not exist HOT 3
- Combine methode name and doc string in output HOT 2
- Question: What is a "test" in context of --processes HOT 1
- SyntaxError not catched when named explicte but unittest does HOT 4
- Confusing error when I have errors in source code HOT 4
- Problem with -r (coverage) output HOT 1
- Missing git tag for 3.4.2 release HOT 1
- green might hang on async tests HOT 3
- shutil cleanup sometimes fails on python 3.8 HOT 5
- running green -vvv green.test.test_runner on PyPy + macOS errors out HOT 2
- Feature request: option to debug tests
- Python 3.12 support HOT 4
- Join The Project Team HOT 7
- stopTest issue with Python 3.12.1 and skipped tests HOT 15
- Move Release Process to GitHub Actions HOT 7
- Please add me to the project as a maintainer HOT 2
- Fix lack of coverage in CI HOT 3
- Change default branch on coveralls to "main" HOT 2
- Upload wheels to pypi HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from green.