Description
Branch: https://github.com/eclipse-zenoh/zenoh/tree/rust-master
Zenoh Rust Version: 21b346a
I was using the zenoh-ffi
to implement a couple of pubsub nodes (for use with ROS2), and discovered some issues with Zenoh.
These issues break any implementations that use the FFI where publish and subscribes happen in the same process, but on the same network as other publishers or subscribers in different processes.
Namely, for processes with both Zenoh publishes and subscribes:
- There are non-deterministic instances where messages that are sent by the process are not received in the same process if another process is actively subscribed to another Zenoh topic (related or not), but the message can get received by another process.
- Callbacks for messages sent by a process are non-deterministically mismatched when received by the same process, if there is another process actively subscribed to another Zenoh topic (related or not.)
Note that these issues are non-deterministic. Run the examples a couple of times, and they'll pop up.
Additional note: I am writing to the topics using zn_write_wrid()
Examples
I have prepared a minimal package to exhibit this behaviour. Please find it here and ensure you are on the cmake-only
branch.
Build instructions are found in the README.
NOTE: The processes in each example are started in order.
And the resource IDs as well as the topic keys are printed for convenience.
Example 1: Dropped Intra-process Messages
If a single pubsub process is started, everything works as expected.
zenoh_minimal_cpp/build/bin$ ./test_pubsub /test
Subscription expression to /test (1)
<<< Published 36 bytes to /test: 'Message #0 from /test (1) to topic /test'
>>> Received 36 bytes on /test: 'Message #0 from /test (1) to topic /test'
<<< Published 36 bytes to /test: 'Message #1 from /test (1) to topic /test'
>>> Received 36 bytes on /test: 'Message #1 from /test (1) to topic /test'
<<< Published 36 bytes to /test: 'Message #2 from /test (1) to topic /test'
>>> Received 36 bytes on /test: 'Message #2 from /test (1) to topic /test'
However, once we start a subscriber and attach it to an unrelated topic before the pubsub process, it sometimes fails to receive the message.
Process 1: Unrelated Subscriber
zenoh_minimal_cpp/build/bin$ ./test_sub /topic
Subscription expression to /topic
Process 2: Pubsub
zenoh_minimal_cpp/build/bin$ ./test_pubsub /test
Subscription expression to /test (1)
<<< Published 36 bytes to /test: 'Message #0 from /test (1) to topic /test'
<<< Published 36 bytes to /test: 'Message #1 from /test (1) to topic /test'
<<< Published 36 bytes to /test: 'Message #2 from /test (1) to topic /test'
More perplexingly, if another subscriber (this time subscribed to the appropriate topic that is being published to) is started after the first unrelated subscriber, it sometimes is able to receive the message even though the pubsub process is unable to (this means that it probably isn't an issue with the size_t
topic IDs that are assigned to each key expression on resource declaration.)
(Also note that this only happens sometimes, indicating a possible race condition.)
Process 1: Unrelated Subscriber
zenoh_minimal_cpp/build/bin$ ./test_sub /topic
Subscription expression to /topic
Process 1: Related Subscriber
zenoh_minimal_cpp/build/bin$ ./test_sub /test
Subscription expression to /test
>>> Received 36 bytes on /test: 'Message #0 from /test (1) to topic /test'
>>> Received 36 bytes on /test: 'Message #1 from /test (1) to topic /test'
>>> Received 36 bytes on /test: 'Message #2 from /test (1) to topic /test'
Process 3: Pubsub
zenoh_minimal_cpp/build/bin$ ./test_pubsub /test
Subscription expression to /test (1)
<<< Published 36 bytes to /test: 'Message #0 from /test (1) to topic /test'
<<< Published 36 bytes to /test: 'Message #1 from /test (1) to topic /test'
<<< Published 36 bytes to /test: 'Message #2 from /test (1) to topic /test'
Example 2: Mixed Callbacks
If we initialise an unrelated subscriber, a related subscriber, and a pubsub process that publishes to multiple topics, only the pubsub receives subscriber callbacks with mismatched topics, but the related subscriber receives them correctly.
NOTE: This issue seems to disappear if we do not spin up the unrelated subscriber.
The order for starting the subscribers does not seem to matter. All that matters is the presence of the unrelated subscriber.
(You can try starting two pubsub processes on the same topic as well and you should sometimes see half of the messages get their callbacks mixed up (from the messages coming from the same process.))
Process 1: Unrelated Subscriber
zenoh_minimal_cpp/build/bin$ ./test_sub /unrelated
Subscription expression to /unrelated
Process 2: Related Subscriber
zenoh_minimal_cpp/build/bin$ ./test_sub /topic_a /topic_b /topic_c
Subscription expression to /topic_a
Subscription expression to /topic_b
Subscription expression to /topic_c
>>> Received 46 bytes on /topic_c: 'Message #0 from /topic_c (3) to topic /topic_c'
>>> Received 46 bytes on /topic_b: 'Message #0 from /topic_b (2) to topic /topic_b'
>>> Received 46 bytes on /topic_a: 'Message #0 from /topic_a (1) to topic /topic_a'
>>> Received 46 bytes on /topic_c: 'Message #1 from /topic_c (3) to topic /topic_c'
>>> Received 46 bytes on /topic_b: 'Message #1 from /topic_b (2) to topic /topic_b'
>>> Received 46 bytes on /topic_a: 'Message #1 from /topic_a (1) to topic /topic_a'
>>> Received 46 bytes on /topic_a: 'Message #2 from /topic_a (1) to topic /topic_a'
>>> Received 46 bytes on /topic_b: 'Message #2 from /topic_b (2) to topic /topic_b'
>>> Received 46 bytes on /topic_c: 'Message #2 from /topic_c (3) to topic /topic_c'
Process 3: Multi-Pubsub
zenoh_minimal_cpp/build/bin$ ./test_pubsub /topic_a /topic_b /topic_c
Subscription expression to /topic_a (1)
Subscription expression to /topic_b (2)
Subscription expression to /topic_c (3)
<<< Published 46 bytes to /topic_c: 'Message #0 from /topic_c (3) to topic /topic_c'
>>> Received 46 bytes on /topic_b: 'Message #0 from /topic_c (3) to topic /topic_c' <---- Mismatched topics
<<< Published 46 bytes to /topic_b: 'Message #0 from /topic_b (2) to topic /topic_b'
>>> Received 46 bytes on /topic_a: 'Message #0 from /topic_b (2) to topic /topic_b' <---- Mismatched topics
<<< Published 46 bytes to /topic_a: 'Message #0 from /topic_a (1) to topic /topic_a'
<<< Published 46 bytes to /topic_c: 'Message #1 from /topic_c (3) to topic /topic_c'
<<< Published 46 bytes to /topic_b: 'Message #1 from /topic_b (2) to topic /topic_b'
<<< Published 46 bytes to /topic_a: 'Message #1 from /topic_a (1) to topic /topic_a'
>>> Received 46 bytes on /topic_a: 'Message #1 from /topic_b (2) to topic /topic_b' <---- Mismatched topics
>>> Received 46 bytes on /topic_b: 'Message #1 from /topic_c (3) to topic /topic_c' <---- Mismatched topics
<<< Published 46 bytes to /topic_c: 'Message #2 from /topic_c (3) to topic /topic_c'
<<< Published 46 bytes to /topic_b: 'Message #2 from /topic_b (2) to topic /topic_b'
>>> Received 46 bytes on /topic_b: 'Message #2 from /topic_c (3) to topic /topic_c' <---- Mismatched topics
>>> Received 46 bytes on /topic_a: 'Message #2 from /topic_b (2) to topic /topic_b' <---- Mismatched topics
<<< Published 46 bytes to /topic_a: 'Message #2 from /topic_a (1) to topic /topic_a'
Misc
I am tagging @codebot and @gbiggs here so they can follow the issue as well.
- CH3EERS!