Git Product home page Git Product logo

spdk.github.io's Issues

Devmem TCP

Devmem TCP is a feature developed by Google and already deployed in their production, based on their own and Meta patches, with a strong driving force to merge into the Linux kernel by the end of 2024. SPDK could leverage this approach to achieve zero-copy TCP receive flows.

Blog post https://www.linkedin.com/pulse/devmem-tcp-back-future-rakesh-cheerla-buplc
Slides:
https://netdevconf.info/0x17/sessions/talk/device-memory-tcp.html Devmem TCP
https://netdevconf.info/0x17/docs/netdev-0x17-paper24-talk-slides/Zero%20Copy%20Rx%20with%20io_uring%20-%20NetDev%20Conf%202023-1.pdf Zero copy network RX using io_uring
Patches:
https://patchwork.kernel.org/project/linux-media/list/?series=861412&state=%2A&archive=both Device memory TCP kernel patch series
https://patchwork.kernel.org/project/io-uring/list/?series=834772&state=%2A&archive=both Zero copy Rx using io_uring kernel patch series
https://github.com/spikeh/linux/tree/zcrx/next Devmem TCP + Zero copy
https://github.com/spikeh/liburing/tree/zcrx/next liburing
https://github.com/spikeh/netbench/tree/zcrx/next Benchmarking tool

Same topic SPDK patch series:
https://review.spdk.io/gerrit/c/spdk/spdk/+/17597/28

NVMf iov pool per thread

By current design iov array is allocated per request. Having iov array embedded into request type doesn't scale with significantly increased number of requests. Such iov resource could be managed under memory domain translate/invalidate and with reduced "time under usage" and LIFO approach become more cache friendly.

NVMF/NVME questions

  1. NVMF Target - Persistent Reservation - Asynchronous load and store otherwise it becomes a blocking call. Might be a problem if the load/store is slow.

  2. NVMF Target - Custom discovery filter - There are some pre-defined filter options. Option to provide a custom filter via SPDK_NVMF_TGT_DISCOVERY_MATCH_TRANSPORT_CUSTOM enum value?

  3. NVMF Target - Using SPDK module as a library - TCP server not owned by SPDK. Can it read/write to something else. RDMA memory region setup and ownership?

  4. NVMF Initiator - Persistent Reservation - Should bdev nvme module have a wrapper to reserve paths for the namespace it is managing?

Enabling more community CIs in SPDK

Discuss challenges with and opportunities for enabling more community driven CI systems:

  • Triggering/scheduling driven from Gerrit
  • Presenting or merging final results from multiple systems
  • Selecting tests from autotest.sh
  • Making CI results as required/optional or negative as blocking

Shared iobuf cache per thread

By current design each module has its own private cache. Usually in the IO path only one such buffer is required but could be obtained from different iobuf module depending on configuration (e.g. bdev or nvmf). Such design enforce N pools (small/large) with private cache and increases memory requirements significantly.

NUMA

Currently SPDK does not try to do NUMA local allocations.

We currently have iobuf which consolidates a lot of the memory pools in SPDK. It seems relatively straightforward to have multiple pools, one for each NUMA node, and then do the allocations based on the NUMA node of the calling spdk_thread. We should be able to free the buffer to the correct pool based on the NUMA node of the calling spdk_thread.

Some complications:

  1. dynamic scheduler - ideally we don't move an spdk_thread from one NUMA node to another - but what if that spdk_thread is completely idle?
  2. validation that buffers are freed to correct pool - i.e. today it is perfectly valid for one spdk_thread to allocate a buffer and another spdk_thread to free it - we want to keep this behavior, but need to make sure that if spdk_thread A on numa node 0 allocates buffer, and spdk_thread B on numa node 1 frees the buffer, that the buffer goes back into the numa node 0 pool

NVMe/TCP requests pool per thread

Currently, each NVMe queue pair allocates its own requests and uses a queue (TAILQ) for get/put operations. This means that for a large number of queue pairs (N), there are N such TAILQs. It would be more effective for cache to have a single per-thread queue so that requests can be reused sooner, keeping cache lines hot. Such a change of ownership may impose some design traps that need to be discussed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.