spdk / spdk.github.io Goto Github PK

SPDK organization web pages

HTML 27.74% CSS 59.09% Shell 1.96% JavaScript 11.22%

spdk.github.io's Issues

Devmem TCP

Devmem TCP is a feature developed by Google and already deployed in their production, based on their own and Meta patches, with a strong driving force to merge into the Linux kernel by the end of 2024. SPDK could leverage this approach to achieve zero-copy TCP receive flows.

Blog post https://www.linkedin.com/pulse/devmem-tcp-back-future-rakesh-cheerla-buplc
Slides:
https://netdevconf.info/0x17/sessions/talk/device-memory-tcp.html Devmem TCP
https://netdevconf.info/0x17/docs/netdev-0x17-paper24-talk-slides/Zero%20Copy%20Rx%20with%20io_uring%20-%20NetDev%20Conf%202023-1.pdf Zero copy network RX using io_uring
Patches:
https://patchwork.kernel.org/project/linux-media/list/?series=861412&state=%2A&archive=both Device memory TCP kernel patch series
https://patchwork.kernel.org/project/io-uring/list/?series=834772&state=%2A&archive=both Zero copy Rx using io_uring kernel patch series
https://github.com/spikeh/linux/tree/zcrx/next Devmem TCP + Zero copy
https://github.com/spikeh/liburing/tree/zcrx/next liburing
https://github.com/spikeh/netbench/tree/zcrx/next Benchmarking tool

Same topic SPDK patch series:
https://review.spdk.io/gerrit/c/spdk/spdk/+/17597/28

partial polling

NVMf iov pool per thread

By current design iov array is allocated per request. Having iov array embedded into request type doesn't scale with significantly increased number of requests. Such iov resource could be managed under memory domain translate/invalidate and with reduced "time under usage" and LIFO approach become more cache friendly.

zero copy taget sidie

DIF/DIX support in generic bdev

shadow io fd

bdev performance issues (struct layouts impact performance in unpredictable ways)

@benlwalker - as part of this topic, could you talk about any of the stacked bdev improvements you have been prototyping? I feel like this is an important consideration for this topic - i.e. does accepting a 4-5% degradation based on random performance impacts enable a much bigger improvement for stacked bdevs or other optimizations?

NVMF/NVME questions

NVMF Target - Persistent Reservation - Asynchronous load and store otherwise it becomes a blocking call. Might be a problem if the load/store is slow.
NVMF Target - Custom discovery filter - There are some pre-defined filter options. Option to provide a custom filter via SPDK_NVMF_TGT_DISCOVERY_MATCH_TRANSPORT_CUSTOM enum value?
NVMF Target - Using SPDK module as a library - TCP server not owned by SPDK. Can it read/write to something else. RDMA memory region setup and ownership?
NVMF Initiator - Persistent Reservation - Should bdev nvme module have a wrapper to reserve paths for the namespace it is managing?

bdev passthru

power saving

Accel task pool size and its management

Improper configuration might lead to "dead lock".

CSAL/FTL theory of operation

Presentation of the high level architecture of FTL, key structures and concepts, IO path flows.

fsdev - filesystem devices

fastlane IO - 3 use cases - discussion

Enabling more community CIs in SPDK

Discuss challenges with and opportunities for enabling more community driven CI systems:

Triggering/scheduling driven from Gerrit
Presenting or merging final results from multiple systems
Selecting tests from autotest.sh
Making CI results as required/optional or negative as blocking

Shared iobuf cache per thread

By current design each module has its own private cache. Usually in the IO path only one such buffer is required but could be obtained from different iobuf module depending on configuration (e.g. bdev or nvmf). Such design enforce N pools (small/large) with private cache and increases memory requirements significantly.

NUMA

Currently SPDK does not try to do NUMA local allocations.

We currently have iobuf which consolidates a lot of the memory pools in SPDK. It seems relatively straightforward to have multiple pools, one for each NUMA node, and then do the allocations based on the NUMA node of the calling spdk_thread. We should be able to free the buffer to the correct pool based on the NUMA node of the calling spdk_thread.

Some complications:

dynamic scheduler - ideally we don't move an spdk_thread from one NUMA node to another - but what if that spdk_thread is completely idle?
validation that buffers are freed to correct pool - i.e. today it is perfectly valid for one spdk_thread to allocate a buffer and another spdk_thread to free it - we want to keep this behavior, but need to make sure that if spdk_thread A on numa node 0 allocates buffer, and spdk_thread B on numa node 1 frees the buffer, that the buffer goes back into the numa node 0 pool

accel framework extensions (t10 dif/dix) and memory domain extensions (dma function pointer)

NVMe/TCP requests pool per thread

Currently, each NVMe queue pair allocates its own requests and uses a queue (TAILQ) for get/put operations. This means that for a large number of queue pairs (N), there are N such TAILQs. It would be more effective for cache to have a single per-thread queue so that requests can be reused sooner, keeping cache lines hot. Such a change of ownership may impose some design traps that need to be discussed.

spdk / spdk.github.io Goto Github PK

spdk.github.io's Issues

Recommend Projects

Recommend Topics

Recommend Org