Git Product home page Git Product logo

Comments (12)

smolkaj avatar smolkaj commented on August 20, 2024 1

Do you push the forwarding pipeline as soon as the P4Runtime server is started?

Yes, pretty much. We're seeing this issue in a unit test which is pretty much straight line code without any delays.

If we start the switch without a P4 config and we wait for start_and_return to return before starting the P4Runtime server, I think we end up with a deadlock, as no config can be pushed until the server is started?

Ohhhh, good catch! Perhaps we can swap the calls only iff an initial config was specified?

from behavioral-model.

antoninbas avatar antoninbas commented on August 20, 2024 1

I would recommend just locking the config_mutex when calling start_and_return_:

start_and_return_();

Since targets are using this callback to validate the initial config, it makes sense to lock the mutex before calling it.

I hope I am not missing something, but this should work:

void
SwitchWContexts::start_and_return() {
  std::unique_lock<std::mutex> config_lock(config_mutex);
  if (!config_loaded && !enable_swap) {
    Logger::get()->error(
        "The switch was started with no P4 and config swap is disabled");
  }
  config_loaded_cv.wait(config_lock, [this]() { return config_loaded; });
  start();  // DevMgr::start
  start_and_return_();
  // Starts any registered periodically-executing externs
  PeriodicTaskList::get_instance().start();
}

After the condition variable's wait call returns, the lock will be held. Holding the lock while calling these other functions should be fine (I am not super familiar with PeriodicTaskList, but I would say it's probably fine). It's also possible to release the lock just before calling PeriodicTaskList::get_instance().start() if we want to be on the safe side.

from behavioral-model.

matthewtlam avatar matthewtlam commented on August 20, 2024 1

I believe that the initial config from the JSON file gets set in the SwitchWContexts::init_from_options_parser function on line 263 in p4lang/behavioral_model/src/bm_sim/switch.cpp. That function gets called on p4lang/behavioral_model/targets/simple_switch_grpc/switch_runner.cpp:561, before starting PIGrpcServerRunV2 (P4RT server) on line 644.

from behavioral-model.

smolkaj avatar smolkaj commented on August 20, 2024

Pasting the line TSAN is complaining about here for convenience:

// src/bm_sim/context.cpp
int
Context::do_swap() {
  if (!swap_ordered) return 1;
  boost::unique_lock<boost::shared_mutex> lock(request_mutex);
  p4objects = p4objects_rt;  // <------------------ TSAN error
  swap_ordered = false;
  send_swap_status_notification(SwapStatus::SWAP_COMPLETED);
  return 0;
}

from behavioral-model.

matthewtlam avatar matthewtlam commented on August 20, 2024

As a side note, running the TSAN tool while removing the BMv2 Config JSON in the command line arguments (and using the --no-p4 instead) provides no race conditions

from behavioral-model.

smolkaj avatar smolkaj commented on August 20, 2024

I found the following context in switch.h:

//! Both switch classes support live swapping of P4-JSON configurations. To
//! enable it you need to provide the correct flag to the constructor (see
//! bm::SwitchWContexts::SwitchWContexts()). Swaps are ordered through the
//! runtime interfaces. We ensure that during the actual swap operation
//! (bm::SwitchWContexts::do_swap() method), there is no Packet instance
//! inflight, which we achieve using the process_packet_mutex mutex). The final
//! step of the swap is to call bm::SwitchWContexts::swap_notify_(), which
//! targets can override if they need to perform some operations as part of the
//! swap. Targets are guaranteed that no Packet instances exist as that
//! time. Note that swapping configurations may invalidate pointers that you are
//! still using, and it is your responsibility to refresh them.

from behavioral-model.

smolkaj avatar smolkaj commented on August 20, 2024

Just an educated guess, but on the surface it seems like this may be a race between the configuration through the command line and the configuration through P4Runtime? If so, this could be fixed by not starting the P4Runtime server until after the initial configuration.

from behavioral-model.

smolkaj avatar smolkaj commented on August 20, 2024

Here is the complete ThreadSanitizer report:

==================
WARNING: ThreadSanitizer: data race (pid=3936)
  Write of size 8 at 0x724c00015c50 by thread T62 (mutexes: write M0):
    #0 swap<bm::P4Objects *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__utility/swap.h:44:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x104b998) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 swap third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/shared_ptr.h:697:5 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x104b998)
    #2 operator= third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/shared_ptr.h:655:21 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x104b998)
    #3 bm::Context::do_swap() third_party/p4lang_behavioral_model/src/bm_sim/context.cpp:867:13 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x104b998)
    #4 bm::SwitchWContexts::do_swap() third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:471:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1123127) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #5 bm::SwitchWContexts::swap_configs() third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:368:17 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1122f02) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #6 _pi_update_device_end third_party/p4lang_behavioral_model/PI/src/pi_imp.cpp:112:38 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xfc6314) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #7 pi_update_device_end third_party/p4lang_PI/src/pi.c:221:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xfb9449) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #8 pi::fe::proto::DeviceMgrImp::pipeline_config_set(p4::v1::SetForwardingPipelineConfigRequest_Action, p4::v1::ForwardingPipelineConfig const&) third_party/p4lang_PI/proto/frontend/src/device_mgr.cpp:617:19 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xf35667) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #9 pi::fe::proto::DeviceMgr::pipeline_config_set(p4::v1::SetForwardingPipelineConfigRequest_Action, p4::v1::ForwardingPipelineConfig const&) third_party/p4lang_PI/proto/frontend/src/device_mgr.cpp:3120:16 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xf34d39) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #10 pi::server::(anonymous namespace)::P4RuntimeServiceImpl::SetForwardingPipelineConfig(grpc::ServerContext*, p4::v1::SetForwardingPipelineConfigRequest const*, p4::v1::SetForwardingPipelineConfigResponse*) third_party/p4lang_PI/proto/server/pi_server.cpp:459:31 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xf2e51e) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #11 grpc::Status std::__tsan::__function::__policy_invoker<grpc::Status (p4::v1::grpc::P4Runtime::Service*, grpc::ServerContext*, p4::v1::SetForwardingPipelineConfigRequest const*, p4::v1::SetForwardingPipelineConfigResponse*)>::__call_impl<std::__tsan::__function::__default_alloc_func<p4::v1::grpc::P4Runtime::Service::Service()::$_2, grpc::Status (p4::v1::grpc::P4Runtime::Service*, grpc::ServerContext*, p4::v1::SetForwardingPipelineConfigRequest const*, p4::v1::SetForwardingPipelineConfigResponse*)>>(std::__tsan::__function::__policy_storage const*, p4::v1::grpc::P4Runtime::Service*, grpc::ServerContext*, p4::v1::SetForwardingPipelineConfigRequest const*, p4::v1::SetForwardingPipelineConfigResponse*) p4runtime.grpc.pb.cc (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x11c5743) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #12 grpc::internal::RpcMethodHandler<p4::v1::grpc::P4Runtime::Service, p4::v1::SetForwardingPipelineConfigRequest, p4::v1::SetForwardingPipelineConfigResponse, proto2::MessageLite, proto2::MessageLite>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&) <null> (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x11c59d8) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #13 grpc::Server::SyncRequest::ContinueRunAfterInterception() third_party/grpc/src/cpp/server/server_cc.cc:471:14 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f733ae) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #14 grpc::Server::SyncRequest::Run(std::__tsan::shared_ptr<grpc::Server::GlobalCallbacks> const&, bool) third_party/grpc/src/cpp/server/server_cc.cc:459:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f73052) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #15 grpc::Server::SyncRequestThreadManager::DoWork(void*, bool, bool) third_party/grpc/src/cpp/server/server_cc.cc:830:15 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f7287a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #16 grpc::ThreadManager::MainWorkLoop() third_party/grpc/src/cpp/thread_manager/thread_manager.cc:214:9 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f8847f) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #17 Run third_party/grpc/src/cpp/thread_manager/thread_manager.cc:48:13 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #18 operator() third_party/grpc/src/cpp/thread_manager/thread_manager.cc:40:69 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231)
    #19 grpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) third_party/grpc/src/cpp/thread_manager/thread_manager.cc:40:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231)
    #20 (anonymous namespace)::ThreadBodyWithCleanup(void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*) third_party/grpc/google_specific/src/core/support/thd_google.cc:117:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8291) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #21 __invoke<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__type_traits/invoke.h:150:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #22 invoke<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:28:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #23 operator()<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/bind_front.h:36:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #24 operator()<void> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/perfect_forward.h:77:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #25 __invoke<std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__type_traits/invoke.h:150:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #26 invoke<std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:28:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #27 InvokeR<void, std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *>, void> third_party/absl/functional/internal/any_invocable.h:132:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #28 void absl::internal_any_invocable::RemoteInvoker<false, void, std::__tsan::__bind_front_t<void (*)(void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*), void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*>&&>(absl::internal_any_invocable::TypeErasedState*) third_party/absl/functional/internal/any_invocable.h:368:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #29 operator() third_party/absl/functional/internal/any_invocable.h:876:1 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x133a644) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #30 ClosureThread::Run() thread/thread.h:464:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x133a644)
    #31 Thread::ThreadBody(void*) thread/thread.cc:1354:16 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3d96d20) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

  Previous read of size 8 at 0x724c00015c50 by main thread:
    #0 operator-> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/shared_ptr.h:727:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 field_exists third_party/p4lang_behavioral_model/include/bm/bm_sim/context.h:453:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056)
    #2 field_exists third_party/p4lang_behavioral_model/include/bm/bm_sim/switch.h:165:32 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056)
    #3 field_exists third_party/p4lang_behavioral_model/include/bm/bm_sim/switch.h:984:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056)
    #4 SimpleSwitch::check_queueing_metadata() third_party/p4lang_behavioral_model/targets/simple_switch/simple_switch.cpp:442:26 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056)
    #5 SimpleSwitch::start_and_return_() third_party/p4lang_behavioral_model/targets/simple_switch/simple_switch.cpp:275:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef2c84) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #6 bm::SwitchWContexts::start_and_return() third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:87:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x11209e3) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #7 sswitch_grpc::SimpleSwitchGrpcRunner::init_and_start(bm::OptionsParser const&) third_party/p4lang_behavioral_model/targets/simple_switch_grpc/switch_runner.cpp:670:18 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xede378) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #8 main third_party/p4lang_behavioral_model/targets/simple_switch_grpc/main.cpp:241:23 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedb22a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

  Location is heap block of size 424 at 0x724c00015c40 allocated by main thread:
    #0 operator new(unsigned long) third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_new_delete.cpp:64:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xed857c) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 __libcpp_operator_new<unsigned long> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/new:271:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #2 __libcpp_allocate third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/new:295:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #3 allocate third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/allocator.h:125:32 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #4 __allocate_at_least<std::__tsan::allocator<bm::Context> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/allocate_at_least.h:41:19 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #5 __vallocate third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/vector:742:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #6 vector third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/vector:1139:5 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #7 bm::SwitchWContexts::SwitchWContexts(unsigned long, bool) third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:58:23 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #8 bm::Switch::Switch(bool) third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:578:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x11242f8) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #9 SimpleSwitch::SimpleSwitch(bool, unsigned int, unsigned long) third_party/p4lang_behavioral_model/targets/simple_switch/simple_switch.cpp:203:5 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef1380) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #10 sswitch_grpc::SimpleSwitchGrpcRunner::SimpleSwitchGrpcRunner(bool, std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>>, unsigned int, std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>>, unsigned int, std::__tsan::shared_ptr<sswitch_grpc::SSLOptions>, unsigned long) third_party/p4lang_behavioral_model/targets/simple_switch_grpc/switch_runner.cpp:505:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedd8ac) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #11 sswitch_grpc::SimpleSwitchGrpcRunner::get_instance(bool, std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>>, unsigned int, std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>>, unsigned int, std::__tsan::shared_ptr<sswitch_grpc::SSLOptions>, unsigned long) third_party/p4lang_behavioral_model/targets/simple_switch_grpc/switch_runner.h:67:35 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedbeb3) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #12 main third_party/p4lang_behavioral_model/targets/simple_switch_grpc/main.cpp:233:18 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedb1f4) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

  Mutex M0 (0x726400015b98) created at:
    #0 pthread_mutex_lock third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1341:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xe582fd) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 std::__tsan::__libcpp_mutex_lock(pthread_mutex_t*) third_party/llvm/llvm-project/libcxx/include/__thread/support/pthread.h:96:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x421ba49) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #2 std::__tsan::mutex::lock() third_party/llvm/llvm-project/libcxx/src/mutex.cpp:29:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x421ba49)
    #3 unique_lock third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__mutex/unique_lock.h:41:11 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x112117f) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #4 bm::SwitchWContexts::init_objects(std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>> const&, unsigned long, std::__tsan::shared_ptr<bm::TransportIface>) third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:195:34 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x112117f)
    #5 bm::SwitchWContexts::init_from_options_parser(bm::OptionsParser const&, std::__tsan::shared_ptr<bm::TransportIface>, std::__tsan::unique_ptr<bm::DevMgrIface, std::__tsan::default_delete<bm::DevMgrIface>>) third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:263:14 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1121760) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #6 sswitch_grpc::SimpleSwitchGrpcRunner::init_and_start(bm::OptionsParser const&) third_party/p4lang_behavioral_model/targets/simple_switch_grpc/switch_runner.cpp:561:31 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xeddd8a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #7 main third_party/p4lang_behavioral_model/targets/simple_switch_grpc/main.cpp:241:23 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedb22a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

  Thread T62 'grpcpp_sync_server' (tid=4043, running) created by thread T54 at:
    #0 pthread_create third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1022:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xe56739) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 Thread::CreatePthread(pthread_attr_t&) thread/thread.cc:502:13 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3d9660d) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #2 Thread::Start() thread/thread.cc:678:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3d9739a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #3 (anonymous namespace)::ThreadInternalsGoogleThread::Start() third_party/grpc/google_specific/src/core/support/thd_google.cc:106:36 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af83c3) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #4 Start third_party/grpc/src/core/lib/gprpp/thd.h:152:14 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f885e8) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #5 Start third_party/grpc/src/cpp/thread_manager/thread_manager.h:124:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f885e8)
    #6 grpc::ThreadManager::MainWorkLoop() third_party/grpc/src/cpp/thread_manager/thread_manager.cc:186:23 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f885e8)
    #7 Run third_party/grpc/src/cpp/thread_manager/thread_manager.cc:48:13 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #8 operator() third_party/grpc/src/cpp/thread_manager/thread_manager.cc:40:69 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231)
    #9 grpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) third_party/grpc/src/cpp/thread_manager/thread_manager.cc:40:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231)
    #10 (anonymous namespace)::ThreadBodyWithCleanup(void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*) third_party/grpc/google_specific/src/core/support/thd_google.cc:117:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8291) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #11 __invoke<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__type_traits/invoke.h:150:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #12 invoke<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:28:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #13 operator()<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/bind_front.h:36:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #14 operator()<void> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/perfect_forward.h:77:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #15 __invoke<std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__type_traits/invoke.h:150:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #16 invoke<std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:28:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #17 InvokeR<void, std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *>, void> third_party/absl/functional/internal/any_invocable.h:132:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #18 void absl::internal_any_invocable::RemoteInvoker<false, void, std::__tsan::__bind_front_t<void (*)(void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*), void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*>&&>(absl::internal_any_invocable::TypeErasedState*) third_party/absl/functional/internal/any_invocable.h:368:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #19 operator() third_party/absl/functional/internal/any_invocable.h:876:1 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x133a644) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #20 ClosureThread::Run() thread/thread.h:464:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x133a644)
    #21 Thread::ThreadBody(void*) thread/thread.cc:1354:16 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3d96d20) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

SUMMARY: ThreadSanitizer: data race third_party/p4lang_behavioral_model/src/bm_sim/context.cpp:867:13 in bm::Context::do_swap()
==================

from behavioral-model.

smolkaj avatar smolkaj commented on August 20, 2024

Swapping the relative order of the calls to simple_switch->start_and_return() and PIGrpcServerRunV2 in SimpleSwitchGrpcRunner::init_and_start, such that the latter comes second, seems to do the trick. I suppose this makes it so that the P4Runtime server only comes up after the switch has been started and initialized.

from behavioral-model.

antoninbas avatar antoninbas commented on August 20, 2024

Do you push the forwarding pipeline as soon as the P4Runtime server is started? It's strange that these 2 things happen concurrently.

I think it would be fine to swap these 2 function calls as you suggested. I don't see any potential drawback or issue.

What's interesting is that the function ultimately responsible for the issue (check_queueing_metadata) is not really relevant for simple_switch_grpc and P4Runtime:

SimpleSwitch::check_queueing_metadata() {
. It dates back to the days of a single static JSON pipeline.

^ I was wrong there. check_queueing_metadata is also called whenever there is a pipeline swap, which is the correct thing to do. Although in that context, the same race condition does not seem possible, as check_queueing_metadata is called from do_swap itself, and we hold a unique lock on process_packet_mutex for the entire duration.

I think the solution you propose (changing order of function calls) may not work in the end. I see that start_and_return waits for a P4 config to be available:

void
SwitchWContexts::start_and_return() {
{
std::unique_lock<std::mutex> config_lock(config_mutex);
if (!config_loaded && !enable_swap) {
Logger::get()->error(
"The switch was started with no P4 and config swap is disabled");
}
config_loaded_cv.wait(config_lock, [this]() { return config_loaded; });
}
start(); // DevMgr::start
start_and_return_();
// Starts any registered periodically-executing externs
PeriodicTaskList::get_instance().start();
}

If we start the switch without a P4 config and we wait for start_and_return to return before starting the P4Runtime server, I think we end up with a deadlock, as no config can be pushed until the server is started?

from behavioral-model.

smolkaj avatar smolkaj commented on August 20, 2024

Where does the initial config from the JSON file get set? I guess we would want to verify that at least that bit happens before the P4RT server is started, right?

from behavioral-model.

smolkaj avatar smolkaj commented on August 20, 2024

Thanks for verifying that, that looks correct to me. And thanks @antoninbas for the suggestion!

from behavioral-model.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.