Comments (22)
I'm not sure whether it's because of that PR but @jackhumphries is the master mind behind cpp anyway ;)
from ray.
from ray.
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3826#018ea456-8ee6-4b39-9a95-ef990a519880
from ray.
Still flaky.
from ray.
CI test linux://:mutable_object_test is flaky. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/3913#018ec3d8-3c63-4f0d-bcba-77a0fb0a220d
- https://buildkite.com/ray-project/postmerge/builds/3908#018ec24b-fd56-4ea1-881f-7bbfaa5538b6
- https://buildkite.com/ray-project/postmerge/builds/3907#018ec144-7665-4469-9e75-72520d13b2cc
- https://buildkite.com/ray-project/postmerge/builds/3897#018ebe98-b2b1-48e9-8f41-fe6ba0dada6e
- https://buildkite.com/ray-project/postmerge/builds/3891#018eb723-8ce2-4fee-82d3-e21a17b5ceb2
DataCaseName-linux://:mutable_object_test-END
Managed by OSS Test Policy
from ray.
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3943#018ec9e9-e96f-4739-ab74-a33168c56721
from ray.
CI test linux://:mutable_object_test is flaky. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/3946#018eca00-78d4-468c-83ee-cea39e555ae9
- https://buildkite.com/ray-project/postmerge/builds/3913#018ec3d8-3c65-4cce-ae1c-ac03b11f5e46
- https://buildkite.com/ray-project/postmerge/builds/3913#018ec3d8-3c63-4f0d-bcba-77a0fb0a220d
- https://buildkite.com/ray-project/postmerge/builds/3908#018ec24b-fd56-4ea1-881f-7bbfaa5538b6
- https://buildkite.com/ray-project/postmerge/builds/3907#018ec144-7665-4469-9e75-72520d13b2cc
DataCaseName-linux://:mutable_object_test-END
Managed by OSS Test Policy
from ray.
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3981#018ed132-2b83-4f27-b32e-3658b1368755
from ray.
From initial investigation, the test is "flaky" because it takes a long time to run. In ReadAcquire()
in src/ray/object_manager/common.cc
, there is a polling loop that all readers enter as they wait on the writer to update the object. In each loop, the readers increment and decrement the semaphore (which has a maximum value of 1), so there is significant contention on the semaphore.
from ray.
A quick fix to reduce contention would be to add sched_yield()
between sem_post()
and TryToAcquireSemaphore()
. A good long term fix would be to use a futex to sleep on the version number to avoid polling.
from ray.
from ray.
I'd be inclined to keep it as is, because this test should run quickly. It's just unnecessary contention that shouldn't be there in the first place due to polling.
from ray.
from ray.
CI test linux://:mutable_object_test is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/4001#018ed618-837e-404e-8f2f-41358e50c2a2
- https://buildkite.com/ray-project/postmerge/builds/4002#018ed621-787a-4b89-8f93-675aac48b2b5
- https://buildkite.com/ray-project/postmerge/builds/4000#018ed611-63b8-43ac-b1dc-1ed38f6e46a0
- https://buildkite.com/ray-project/postmerge/builds/3946#018eca00-78d4-468c-83ee-cea39e555ae9
DataCaseName-linux://:mutable_object_test-END
Managed by OSS Test Policy
from ray.
CI test linux://:mutable_object_test is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/4001#018ed618-837e-404e-8f2f-41358e50c2a2
- https://buildkite.com/ray-project/postmerge/builds/4002#018ed621-787a-4b89-8f93-675aac48b2b5
- https://buildkite.com/ray-project/postmerge/builds/4000#018ed611-63b8-43ac-b1dc-1ed38f6e46a0
- https://buildkite.com/ray-project/postmerge/builds/3946#018eca00-78d4-468c-83ee-cea39e555ae9
DataCaseName-linux://:mutable_object_test-END
Managed by OSS Test Policy
from ray.
CI test linux://:mutable_object_test is consistently_failing. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/4001#018ed618-837e-404e-8f2f-41358e50c2a2
- https://buildkite.com/ray-project/postmerge/builds/4002#018ed621-787a-4b89-8f93-675aac48b2b5
- https://buildkite.com/ray-project/postmerge/builds/4000#018ed611-63b8-43ac-b1dc-1ed38f6e46a0
- https://buildkite.com/ray-project/postmerge/builds/3946#018eca00-78d4-468c-83ee-cea39e555ae9
DataCaseName-linux://:mutable_object_test-END
Managed by OSS Test Policy
from ray.
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/4001#018ed618-8383-4ee6-a050-2a3bc9d23cc9
from ray.
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/4001#018ed618-8383-4ee6-a050-2a3bc9d23cc9
from ray.
CI test linux://:mutable_object_test is flaky. Recent failures:
DataCaseName-linux://:mutable_object_test-END
Managed by OSS Test Policy
from ray.
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/3268#018e0b9c-f120-4741-a4a0-745efa45d938
from ray.
CI test linux://:mutable_object_test is flaky. Recent failures:
- https://buildkite.com/ray-project/postmerge/builds/4020#018ee433-c8be-4c70-a12e-72bd591d9956
- https://buildkite.com/ray-project/postmerge/builds/4017#018ee3dc-05e9-41cd-bfe7-91be1b63dff6
- https://buildkite.com/ray-project/postmerge/builds/4014#018ee391-ecc6-40f6-8e6a-7e43cc32c157
- https://buildkite.com/ray-project/postmerge/builds/4004#018edc21-075a-4a9f-81d1-9cb6d7742440
- https://buildkite.com/ray-project/postmerge/builds/4001#018ed618-8381-4ea0-a0dc-96ec70a4402a
DataCaseName-linux://:mutable_object_test-END
Managed by OSS Test Policy
from ray.
Test passed on latest run: https://buildkite.com/ray-project/postmerge/builds/4037#018ee8c7-e7ed-4143-8548-a8ee2261c2a4
from ray.
Related Issues (20)
- User guides about automatic mixed precision is missing
- [Worker] Worker crash when getting an actor name
- [RLlib|Tune|Train] ValueError: Could not recover from checkpoint as it does not exist anymore HOT 2
- [Ray cluster] Worker node is disappearing after some seconds HOT 2
- [Core] Un-Deprecate Dynamic Generators HOT 2
- [RLlib] DeepMind preprocessor not working as expected
- [Serve] Cannot specify `setup_timeout_seconds` with container runtime env
- [Ray Cluster] Colon in cluster_name breaks file_mounts directive
- [Dashboard] Revisit Reporter Agent communication protocol to use proto instead of JSON HOT 2
- [Worker] Recommended methodology to look into worker stuckness HOT 1
- [Ray Autoscaling] Issues related to the handling of Pending Worker Nodes when scaling down
- [Ray scheduling] The memory already used on the Worker Node needs to be taken into account when scheduling Ray tasks
- Ray Data BigQuery pickling error [<Ray component: Core|Data>] HOT 1
- Ray Summit 2024 CfP HOT 2
- [Ray Data] map_batches treats num_gpus=0 as specifying a workload to run on a GPU
- [Serve] `serve run --reload` to auto-recover during a bad deployment
- CI test linux://rllib:learning_tests_cartpole_dqn_envrunner is flaky HOT 2
- CI test linux://python/ray/tests:test_runtime_env_container is consistently_failing HOT 6
- [docs][banner] change letters in Ray Summit CfP banner from black to white
- [data][train] Bug in SplitCoordinator: "assert self._output_iterator is not None"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.