axboe / liburing Goto Github PK
View Code? Open in Web Editor NEWLibrary providing helpers for the Linux kernel io_uring support
License: MIT License
Library providing helpers for the Linux kernel io_uring support
License: MIT License
liburing -------- This is the io_uring library, liburing. liburing provides helpers to setup and teardown io_uring instances, and also a simplified interface for applications that don't need (or want) to deal with the full kernel side implementation. For more info on io_uring, please see: https://kernel.dk/io_uring.pdf Subscribe to [email protected] for io_uring related discussions and development for both kernel and userspace. The list is archived here: https://lore.kernel.org/io-uring/ kernel version dependency -------------------------- liburing itself is not tied to any specific kernel release, and hence it's possible to use the newest liburing release even on older kernels (and vice versa). Newer features may only be available on more recent kernels, obviously. ulimit settings --------------- io_uring accounts memory it needs under the rlimit memlocked option, which can be quite low on some setups (64K). The default is usually enough for most use cases, but bigger rings or things like registered buffers deplete it quickly. root isn't under this restriction, but regular users are. Going into detail on how to bump the limit on various systems is beyond the scope of this little blurb, but check /etc/security/limits.conf for user specific settings, or /etc/systemd/user.conf and /etc/systemd/system.conf for systemd setups. This affects 5.11 and earlier, new kernels are less dependent on RLIMIT_MEMLOCK as it is only used for registering buffers. Regressions tests ----------------- The bulk of liburing is actually regression/unit tests for both liburing and the kernel io_uring support. Please note that this suite isn't expected to pass on older kernels, and may even crash or hang older kernels! Building liburing ----------------- # # Prepare build config (optional). # # --cc specifies the C compiler. # --cxx specifies the C++ compiler. # ./configure --cc=gcc --cxx=g++; # # Build liburing. # make -j$(nproc); # # Install liburing (headers, shared/static libs, and manpage). # sudo make install; See './configure --help' for more information about build config options. FFI support ----------- By default, the build results in 4 lib files: 2 shared libs: liburing.so liburing-ffi.so 2 static libs: liburing.a liburing-ffi.a Languages and applications that can't use 'static inline' functions in liburing.h should use the FFI variants. liburing's main public interface lives in liburing.h as 'static inline' functions. Users wishing to consume liburing purely as a binary dependency should link against liburing-ffi. It contains definitions for every 'static inline' function. License ------- All software contained within this repo is dual licensed LGPL and MIT, see COPYING and LICENSE, except for a header coming from the kernel which is dual licensed GPL with a Linux-syscall-note exception and MIT, see COPYING.GPL and <https://spdx.org/licenses/Linux-syscall-note.html>. Jens Axboe 2022-05-19
Currently io_uring_peek_cqe
filters all cqes with user_data
set to LIBURING_UDATA_TIMEOUT
, while io_uring_peek_batch_cqe does not. This is desired behavior or not?
Sample code:
#include "liburing.h"
#include <stdio.h>
#include <errno.h>
int main(int argc, char const *argv[])
{
int ret;
struct io_uring ring;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
ret = io_uring_queue_init(32, &ring, 0);
if (ret)
{
fprintf(stderr, "queue init failed: %d\n", ret);
return ret;
}
sqe = io_uring_get_sqe(&ring);
if (!sqe)
{
fprintf(stderr, "sqe get failed\n");
return 1;
}
// this one gets filtered
io_uring_prep_nop(sqe);
io_uring_sqe_set_data(sqe, (void *)LIBURING_UDATA_TIMEOUT);
ret = io_uring_submit_and_wait(&ring, 1);
if (ret != 1)
{
fprintf(stderr, "submit failed: %d\n", ret);
return 1;
}
ret = io_uring_peek_cqe(&ring, &cqe);
if (ret != -EAGAIN)
{
fprintf(stderr, "peek failed: %d\n", ret);
return ret;
}
sqe = io_uring_get_sqe(&ring);
if (!sqe)
{
fprintf(stderr, "sqe get failed\n");
return 1;
}
// this one is not filtered
io_uring_prep_nop(sqe);
io_uring_sqe_set_data(sqe, (void *)LIBURING_UDATA_TIMEOUT);
ret = io_uring_submit_and_wait(&ring, 1);
if (ret != 1)
{
fprintf(stderr, "submit failed: %d\n", ret);
return ret;
}
ret = io_uring_peek_batch_cqe(&ring, &cqe, 1);
if (ret != 1)
{
fprintf(stderr, "peek batch failed, expected 1, got: %d\n", ret);
return ret;
}
if (cqe->user_data != LIBURING_UDATA_TIMEOUT)
{
fprintf(stderr, "LIBURING_UDATA_TIMEOUT expected");
return 1;
}
return 0;
}
Threading support can be very useful in async programming. For example thread joining and condvar waiting.
Futex is a good start IMO.
Hi, I've probably found another problem triggered when my tests run in parallel.
Should it be ok to setup unique io_uring
independently from multiple threads?
I've created simple test to reproduce it (at least on my Ryzen 7 3700X with Fedora 31 kernel 5.3.12).
Basically it fails during io_uring_setup
call with ENOMEM
.
I've tried to follow this guide to figure something out, but I'm not into kernel dev so this all seems too much woodoo for me ;-)
Anyway, here's the trace output if it helps somehow:
1) | __x64_sys_io_uring_setup() {
1) | io_uring_setup() {
1) | capable() {
1) | ns_capable_common() {
1) <...>-633600 => <...>-633604
1) | security_capable() {
1) <...>-633604 => <...>-633600
1) 0.230 us | cap_capable();
1) 0.762 us | }
1) 1.202 us | }
1) 1.623 us | }
1) 0.270 us | free_uid();
1) 2.735 us | }
1) 3.467 us | }
And here is a test to reproduce it.
#include <stdio.h>
#include <pthread.h>
#include "liburing.h"
struct thread_info_t {
pthread_t tid;
int num;
};
static void *doTest(void *arg) {
struct io_uring ring;
struct io_uring_cqe *cqe;
struct io_uring_sqe *sqe;
struct thread_info_t *ti;
int ret;
ti = (struct thread_info_t *)arg;
printf("%d: start\n", ti->num);
ret = io_uring_queue_init(128, &ring, 0);
if (ret) {
printf("%d: ring setup failed: %d\n", ti->num, ret);
return arg;
}
sqe = io_uring_get_sqe(&ring);
if (!sqe) {
printf("%d: get sqe failed\n", ti->num);
return arg;
}
io_uring_prep_nop(sqe);
ret = io_uring_submit(&ring);
if (ret <= 0) {
printf("%d: sqe submit failed: %d\n", ti->num, ret);
return arg;
}
ret = io_uring_wait_cqe(&ring, &cqe);
if (ret < 0) {
printf("%d: wait completion %d\n", ti->num, ret);
return arg;
}
io_uring_cqe_seen(&ring, cqe);
printf("%d: done\n", ti->num);
return NULL;
}
int main(int argc, char *argv[])
{
struct thread_info_t threads[10];
int ret;
void *res;
for (int i=0; i<10; i++) {
threads[i].num = i;
ret = pthread_create(&threads[i].tid, NULL, doTest, &threads[i]);
if (ret) {
fprintf(stderr, "Thread create failed\n");
return 1;
}
}
for (int i=0; i<10; i++) {
ret = pthread_join(threads[i].tid, &res);
if (ret) {
fprintf(stderr, "Thread join failed\n");
return 1;
}
if (res) {
fprintf(stderr, "Test failed\n");
return 1;
}
}
return 0;
}
One of my outputs is:
0: start
1: start
2: start
3: start
4: start
4: ring setup failed: -12
5: start
5: ring setup failed: -12
6: start
0: done
6: ring setup failed: -12
3: done
7: start
7: ring setup failed: -12
1: done
8: start
8: ring setup failed: -12
9: start
9: ring setup failed: -12
2: done
Test failed
I have an open socket from which I'm reading data, with the following behavior:
(1) If the socket is blocking, then preparing a read [io_uring_prep_readv] and submitting it [io_uring_submit] causes the submit to block until data can be read from the socket.
(2) If the socket is non-blocking, then doing (1) causes EAGAIN to be returned on the CQE, unless the socket has data available.
(3) If the socket is non-blocking, then polling the socket for input [io_uring_prep_poll_add + POLLIN], flagging it with IOSQE_IO_LINK, followed by a consecutive read SQE causes the error EINVAL to be returned on the poll CQE.
Ideally (3) would work and perform the read when the socket received input. In order to get it to work, I have to split up the poll and read and only submit the latter after I receive the former.
Likewise when sending data on a socket. In the rare occurrence where the output buffer is full, instead of registering a POLLOUT and retrying the write, it'd be nice to send the data and only have to worry about my total outstanding operations.
Perhaps I'm missing something, but can this be supported? Thanks!
This issue only happens when you submit multiple read requests.
#include <liburing.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
char str1[32768];
char str2[32768];
struct io_uring ring;
void prep(int fd, char *str) {
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
struct iovec iov = {
.iov_base = str,
.iov_len = sizeof(str1),
};
io_uring_prep_readv(sqe, fd, &iov, 1, 0);
sqe->user_data = fd;
io_uring_submit(&ring);
printf("SUBMIT: %d\n", fd);
}
void wait() {
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
io_uring_cqe_seen(&ring, cqe);
printf("FINISH: %d with res %d\n", (int)cqe->user_data, cqe->res);
}
int main() {
io_uring_queue_init(32, &ring, 0);
int fd1 = open("/path/to/large/file1", O_RDONLY);
int fd2 = open("/path/to/large/file2", O_RDONLY);
prep(fd1, str1);
prep(fd2, str2);
wait();
wait();
close(fd2);
close(fd1);
io_uring_queue_exit(&ring);
}
Before executing the program, run sync && echo 3 > /proc/sys/vm/drop_caches && swapoff -a && swapon -a
You will get -EFAULT when debugging the program using GDB
(gdb) run
Starting program: /root/test/./test
SUBMIT: 8
SUBMIT: 9
FINISH: 9 with res -14
FINISH: 8 with res 32768
[Inferior 1 (process 16788) exited normally]
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64
If you run it directly, the program will crash with segfault.
$ ./test
SUBMIT: 4
SUBMIT: 5
FINISH: 4 with res 32768
FINISH: 5 with res 3568
[1] 16893 segmentation fault (core dumped) ./test
If you run the program again without dropping caches, it will work as expected
Linux localhost 5.4.0-1.el7.elrepo.x86_64 #1 SMP Mon Nov 25 09:18:09 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
Original post: hakasenyang/openssl-patch#22 (comment)
EDIT: verified on 5.5rc too
Is there a way to manage user access/permission?
For example say i use liburing to be a web server and run it as a "root" user maybe i am using "IORING_SETUP_SQPOLL". I wouldn't want my web server to run everything in root privileges, maybe some of the functions(read/write, ...) needs to run as other users.
Maybe this is something we can set in sqe
sqe = io_uring_get_sqe(ring)
sqe.setuid = 123
on error it would raise permission denied error.
One use case for IOSQE_IO_LINK is that zero copy IO operation, but it's hard to determine how many bytes is correctly read.
For example echo server. It's just ACCEPT -> RECV -> SEND -> CLOSE, but it's hard / not possible to do in zero copy way. The problem is:
For 2, man 2 read
says that
It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal.
So that even a simple READ -> WRITE link chain may not always be reliable.
I suggest that add a flag called IOSQE_IO_USE_PREV_RES ( the name is not decided ), which works only with IORING_OP_{WRITE,SEND} must be used together with IOSQE_IO_LINK, indicates that current operation's buffer size is set by previous ret code. If previous ret code <= 0 the operation should generate an error.
What do you think?
Why does prep helpers have offt_t in their arguments, example:
liburing/src/include/liburing.h
Line 166 in 8f24d3c
liburing/src/include/liburing/io_uring.h
Line 22 in 8f24d3c
I have a socket open that has an asynchronous recvmsg (io_uring_prep_recvmsg + io_uring_sqe_set_data) outstanding. No data is being supplied by the other end. Subsequently, the recvmsg is being canceled (io_uring_prep_cancel). The CQE for the cancel is giving -114 (-EALREADY) which is expected, however the CQE for the recvmsg is receiving -512, which is not.
Not sure where the result is being generated. In the case it's a kernel issue, I'm testing with https://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2019-12-01/.
It's a fact that users don't want to maintain an offset themselves. Seeking operations are still widely used but there's no way to do it using io_uring. (lseek+read+lseek is not atomic)
I think readv/writev without offsets are still reasonable for asio: operations that for different fds can still be run in parallel, as shown in ucontext-cp
libuv has to punt these operations to the threadpool currently. Let's support it natively.
Cancellation is very important for socket programming. I know canceling an IO operation requires hardware/driver support, but if the operation haven't started yet ( ie. EAGAIN ), it should be cancelable.
Otherwise, we have to fall back to IORING_OP_POLL_ADD.
The operations defined in https://github.com/axboe/liburing/blob/master/src/include/liburing/io_uring.h#L81-L88 are missing man page descriptions:
IORING_OP_CONNECT,
IORING_OP_FALLOCATE,
IORING_OP_OPENAT,
IORING_OP_CLOSE,
IORING_OP_FILES_UPDATE,
IORING_OP_STATX,
IORING_OP_READ,
IORING_OP_WRITE,
When benchmarking an echo server written with io-uring, I found adding a poll_add sqe before readv/recvmsg could result in about 30% performance boost:
https://github.com/CarterLi/io_uring-echo-server/blob/switch/io_uring_echo_server.c#L14
131729 request/sec
VS 98694 request/sec
using rust_echo_bench.
That was unexpected. AFAIK readv/recvmsg
is async operation itself, adding a poll_add sqe won't help but result in extra context switch ( because it will awake io_uring_enter ).
After some investigation, I found program without poll_add will create lots of kernel processes called io_wqe_worker
. But program with poll_add won't.
Don't know how poll_add works, but it seems that poll_add has much lower cost then async read. Is it expected? And maybe a silly question, could we implement async read as poll-add and nonblocking read?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <liburing.h>
int main() {
char buffer[1024];
struct io_uring ring;
struct iovec iov;
struct sockaddr_in saddr;
struct msghdr msg;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
int sockfd = 0, clientfd = 0, ret;
io_uring_queue_init(32, &ring, 0);
sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {
perror("socket");
goto err;
}
saddr = (struct sockaddr_in) {
.sin_family = AF_INET,
.sin_addr = {
.s_addr = htonl(INADDR_ANY),
},
.sin_port = htons(12345),
};
ret = bind(sockfd, (struct sockaddr *)&saddr, sizeof(saddr));
if (ret < 0) {
perror("bind");
goto err;
}
ret = listen(sockfd, 32);
if (ret < 0) {
perror("listen");
goto err;
}
clientfd = accept(sockfd, NULL, NULL);
if (clientfd < 0) {
perror("accept");
goto err;
}
iov = (struct iovec) {
.iov_base = buffer,
.iov_len = sizeof(buffer),
};
msg = (struct msghdr) {
.msg_namelen = sizeof(struct sockaddr_in),
.msg_iov = &iov,
.msg_iovlen = 1,
};
sqe = io_uring_get_sqe(&ring);
io_uring_prep_recvmsg(sqe, clientfd, &msg, 0);
ret = io_uring_submit_and_wait(&ring, 1);
if (ret <= 0) {
perror("io_uring_submit_and_wait");
goto err;
}
io_uring_peek_cqe(&ring, &cqe);
if (cqe->res < 0) {
printf("recvmsg failed: %d\n", cqe->res);
goto err;
}
err:
io_uring_queue_exit(&ring);
close(clientfd);
close(sockfd);
}
$ clang -g -luring -o test test.c
$ ./test # On another terminal: curl -v localhost:12345
recvmsg failed: -14 # Not constantly
$ uname -a # https://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2019-12-15/
Linux carter-virtual-machine 5.5.0-999-generic #201912142104 SMP Sun Dec 15 02:07:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
I can't reproduce it on Linux 5.4, may relates to https://lore.kernel.org/io-uring/[email protected]/T/#m919b41ecbf5049c15df15e8cbf2ff982acc37cc9
io_uring_get_sqe sometimes fails to find vacant sqe when SQPOLL is enabled, but there is free space.Running following test case always produces io_uring_get_sqe failed, space left : 8
:
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/poll.h>
#include "liburing.h"
#define NUM_ENTRIES 8
int setup_and_run();
int main(int argc, char *argv[])
{
for (int j = 0; j < 100; j++)
{
int ret = setup_and_run();
if (ret)
{
return ret;
}
}
return 0;
}
int setup_and_run()
{
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
struct io_uring_params p;
struct io_uring ring;
int ret, data;
memset(&p, 0, sizeof(p));
p.flags = IORING_SETUP_SQPOLL;
ret = io_uring_queue_init_params(NUM_ENTRIES, &ring, &p);
if (ret)
{
fprintf(stderr, "ring create failed: %d\n", ret);
return 1;
}
if (p.sq_entries != NUM_ENTRIES)
{
fprintf(stderr, "ring create failed, wanted %d sq entries, got: %d entries\n", NUM_ENTRIES, ret);
return 1;
}
for (int i = 0; i < NUM_ENTRIES; i++)
{
sqe = io_uring_get_sqe(&ring);
if (!sqe)
{
fprintf(stderr, "io_uring_get_sqe failed\n");
return ret;
}
io_uring_prep_nop(sqe);
io_uring_sqe_set_data(sqe, (void *)(unsigned long)42);
}
ret = io_uring_submit(&ring);
if (!ret)
{
fprintf(stderr, "io_uring_submit railed");
return -1;
}
ret = io_uring_wait_cqe(&ring, &cqe);
if (ret == 0)
{
data = (unsigned long)io_uring_cqe_get_data(cqe);
if (data != 42)
{
fprintf(stderr, "invalid data: %d\n", data);
return data;
}
int space_left = io_uring_sq_space_left(&ring);
sqe = io_uring_get_sqe(&ring);
if (sqe == NULL)
{
fprintf(stderr, "io_uring_get_sqe failed, space left: %d\n", space_left);
return 1;
}
}
else
{
fprintf(stderr, "io_uring_wait_cqe failed : %d\n", ret);
return ret;
}
io_uring_queue_exit(&ring);
return 0;
}
// test.c
#include <liburing.h>
#include <stdio.h>
#include <time.h>
int main() {
struct io_uring ring;
io_uring_queue_init(32, &ring, 0);
printf("0: %ld\n", time(NULL));
{
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_nop(sqe);
io_uring_submit(&ring);
struct __kernel_timespec ts = {
.tv_sec = 10,
.tv_nsec = 0,
};
struct io_uring_cqe *cqe;
io_uring_wait_cqe_timeout(&ring, &cqe, &ts);
io_uring_cqe_seen(&ring, cqe);
printf("1: %ld\n", time(NULL));
}
{
struct __kernel_timespec ts = {
.tv_sec = 1,
.tv_nsec = 0,
};
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_timeout(sqe, &ts, 0, 0);
io_uring_submit(&ring);
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
io_uring_cqe_seen(&ring, cqe);
printf("2: %ld\n", time(NULL));
}
io_uring_queue_exit(&ring);
return 0;
}
Actual: The last io_uring_prep_timeout waits for 10s
$ clang test.c -luring -o test
$ ./test
0: 1575128130
1: 1575128130
2: 1575128140
Expect: It should only wait for 1s
Linux carter-virtual-machine 5.4.0-999-generic #201911282213 SMP Fri Nov 29 03:17:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
https://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2019-11-29/
hi, I have a question about io_uring_enter and EAGAIN.
When to_submit is zero, can io_uring_enter return EAGAIN?
When to_submit is not zero, can io_uring_enter return 0? And if so, when does it return 0, and when EAGAIN?
I noticed that for IORING_OP_TIMEOUT, if completion event count is not set, it defaults to 1. It's not very useful in my opinion. I suggest if sqe->off
equals to 0, IORING_OP_TIMEOUT acts like a timer. That is to say, IORING_OP_TIMEOUT won't be completed through other requests' completion.
With this change, timerfd can be partially replaced. interval
is not suit for io_uring though.
Yes it's a breaking change. sqe->off == -1
is also considerable.
There is no way to know whether a file descriptor support {read,write}_iter
, and sometimes we don't even know the type of fds, for example STDIN/OUT/ERR_FILENO
. We have to try IORING_OP_PREADV
first. If we get -EINVAL
, we have to fall back to IORING_OP_POLL_ADD
and plain read, which is inconvenient and slow.
why the man pages doesn't contain any documentation about this library
and is this faster than epoll and if yes then by any factor ?
Need a space in
Line 236 in c0fcb7f
I've been trying and failing to read from a fd opened with eventfd
through IORING_OP_READ
int fd = eventfd(0, 0);
io_uring_prep_read(sqe, fd, &event, sizeof(eventfd_t), 0);
This fails consistently with EINVAL
Polling with io_uring_prep_poll_add(sqe, fd, 0);
then reading with read(fd, &event, sizeof(eventfd_t))
works.
am I missing something or is reading for eventfd directly through io_uring not supported ?
Thanks
Hi,
I looked through the files under test
and examples
, and found some iov_base
are allocated by posix_memalign
and others by malloc
or even char*
literals.
I then did some experiments. It seemed to be that liburing does not care about memory alignment. Is this true? Thanks in advance.
liburing/test/500f9fbadef8-test.c
Line 27 in 1ed37c5
Line 20 in 5569609
This program never terminates:
#include "liburing.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
struct io_uring ring;
int ring_flags, ret, data;
ring_flags = IORING_SETUP_SQPOLL;
ret = io_uring_queue_init(64, &ring, ring_flags);
if (ret) {
fprintf(stderr, "ring create failed: %d\n", ret);
return 1;
}
sqe = io_uring_get_sqe(&ring);
if (!sqe) {
fprintf(stderr, "sqe get failed\n");
return 1;
}
io_uring_prep_nop(sqe);
io_uring_sqe_set_data(sqe, 42);
io_uring_submit_and_wait(&ring, 1);
ret = io_uring_peek_cqe(&ring, &cqe);
if (ret) {
fprintf(stderr, "cqe get failed\n");
return 1;
}
data = io_uring_cqe_get_data(cqe);
if (data != 42) {
fprintf(stderr, "invalid data: %d\n", data);
return 1;
}
return 0;
}
changing this line
Line 177 in 4cc37de
if (sq_ring_needs_enter(ring, &flags) || wait_nr) {
fixes it. If i'm wrong, I'm sorry.
I'm issuing an accept() SQE subsequently followed by a connect() SQE. The connect result is success, however the accept() returns with CQE status ENOTCONN. Ignoring the error, the connected socket is fine and can issue I/O.
I presume this has something to do with the asynchronous connect case. Running with linux kernel at e31736d9fae841e8a1612f263136454af10f476a (12/14).
We support IORING_OP_{READ,WRITE}FIXED but doesn't support IORING_OP{SEND,RECV}_FIXED.
Is it problematic? Will it result in type narrowing?
Noticed this:
liburing/src/include/liburing.h
Line 267 in 2e7d744
Then I found this:
https://git.kernel.dk/cgit/linux-block/tree/fs/io_uring.c?h=for-5.5/io_uring-post#n2353
Is it reserved for future use? I think it's kind of strange and doesn't seem to be useful in my opinion.
For IOSQE_IO_LINK link chain, people usually don't care operations before the whole link chain is completed.
Currently io_uring_wait_cqe
is awaked for every operation's completion. We have io_uring_wait_cqes
can partially resolve this issue, but io_uring_wait_cqes
has its own limitations:
IORING_OP_ACCEPT
operation pending for new connection ( which needs io_uring_wait_cqes(1)
), and multiple RECV-SEND
chains solving existing connections ( which needs io_uring_wait_cqes(2)
). As a result we have to use io_uring_wait_cqe
.Suggestion: add a new flag named IOSQE_IO_NO_AWAKE
which indicates an operation should not awake io_uring_enter. It can resolve those 2 problems.
IOSQE_IO_NO_AWAKE
is set when preparing operations, we don't need to touch the global event loopIOSQE_IO_NO_AWAKE
can be set for sqes separately. For example we set RECV(IOSQE_IO_NO_AWAKE)-SEND
, then io_uring_wait_cqe
should work fine.Sorry for my bad English if I can't explain myself clearly.
A program written by me became a zombie process for some reason. I didn't fork other process, nor did something special, just normal stuff.
I was testing IOSQE_IO_LINK, FIXED_FILES and FIXED_BUFFERS, if helps.
It can't be consistently reproduced, but happened several times. I couldn't kill it. When I was rebooting the system, I got:
$ uname -a 23:46:16
Linux carter-virtual-machine 5.5.0-999-generic #202002082109 SMP Sun Feb 9 02:13:41 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <liburing.h>
int main() {
char buffer[1024];
struct io_uring ring;
struct sockaddr_in saddr;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
int sockfd = 0, clientfd = 0, ret;
io_uring_queue_init(32, &ring, 0);
sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {
perror("socket");
goto err;
}
saddr = (struct sockaddr_in) {
.sin_family = AF_INET,
.sin_addr = {
.s_addr = htonl(INADDR_ANY),
},
.sin_port = htons(12345),
};
ret = bind(sockfd, (struct sockaddr *)&saddr, sizeof(saddr));
if (ret < 0) {
perror("bind");
goto err;
}
ret = listen(sockfd, 32);
if (ret < 0) {
perror("listen");
goto err;
}
clientfd = accept(sockfd, NULL, NULL);
if (clientfd < 0) {
perror("accept");
goto err;
}
sqe = io_uring_get_sqe(&ring);
io_uring_prep_recv(sqe, clientfd, buffer, sizeof(buffer), 0);
ret = io_uring_submit_and_wait(&ring, 1);
if (ret <= 0) {
perror("io_uring_submit_and_wait");
goto err;
}
io_uring_peek_cqe(&ring, &cqe);
if (cqe->res < 0) {
printf("recv failed: %d\n", cqe->res);
goto err;
}
err:
io_uring_queue_exit(&ring);
close(clientfd);
close(sockfd);
}
$ gcc recv.c -o recv -luring
$ ./recv # On another terminal: curl -v localhost:12345
recv failed: -14
$ uname -a # https://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2020-01-31/
Linux carter-virtual-machine 5.5.0-999-generic #202001302109 SMP Fri Jan 31 02:15:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Think there is wrong with "io_uring_cqe.res" When i call something like
cqe = io_uring_cqe()
... io_uring_prep_readv
cqe.res # will output say "5"
cqe = io_uring_cqe()
... io_uring_prep_readv
cqe.res # will output again the same value as read-1 "5"
while read-2 content length/buffer size is totally different!
not sure whats going on.
I am testing following code which spawns 5 threads and drives IOs on each of them. It randomly hangs with and without IORING_SETUP_SQPOLL flag.
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdbool.h>
#include "liburing.h"
#define BS 4096
#define QD 32
#define MAX_OBJECTS 5
static struct io_uring ring[MAX_OBJECTS];
static int dev_fd;
static int ios;
static bool sqpoll;
static struct iovec iov;
static void *setup_iov_base(size_t size)
{
void *buf;
int fd;
if (posix_memalign(&buf, BS, size) != 0) {
printf("mem aligned failed\n");
return NULL;
}
fd = open("/dev/urandom", O_RDONLY);
if (fd < 0) {
printf("Failed to open urandom. rc=%d\n", fd);
return NULL;
}
read(fd, buf, size);
close(fd);
return buf;
}
static int init(void)
{
struct io_uring_params p = { 0 };
int i, rc;
if (sqpoll) {
p.flags = IORING_SETUP_SQPOLL;
printf("Initializing liburing with SQPOLL flag\n");
}
dev_fd = open("/dev/nvme1n1", O_RDWR | O_DIRECT);
if (dev_fd < 0) {
printf("Failed to open nvme device. rc=%d\n", dev_fd);
return dev_fd;
}
for (i = 0; i < MAX_OBJECTS; i++) {
rc = io_uring_queue_init_params(QD, &ring[i], &p);
if (rc != 0) {
printf("queue_init failed. rc=%d\n", rc);
return rc;
}
if (sqpoll) {
rc = io_uring_register_files(&ring[i], &dev_fd, 1);
if (rc < 0) {
printf("Failed to register files. rc=%d\n", rc);
return rc;
}
}
}
iov.iov_base = setup_iov_base(BS);
iov.iov_len = BS;
return 0;
}
static inline void submit_to_kernel(char *failure_message, int thread_id)
{
int rc;
rc = io_uring_submit(&ring[thread_id]);
if (rc < 0) {
printf("%s. rc=%d\n", failure_message, rc);
}
}
static struct io_uring_sqe *get_sqe(int tid, int *yield)
{
struct io_uring_sqe *sqe;
while ((sqe = io_uring_get_sqe(&ring[tid])) == NULL) {
submit_to_kernel("Failure to wake napping thread", tid);
*yield = *yield + 1;
pthread_yield();
}
return sqe;
}
static void *submit_io(void *input)
{
off_t offset = 0;
int thread_id = *((int *)input);
int total_ios = ios;
int yield = 0;
while (total_ios != 0) {
struct io_uring_sqe *sqe = get_sqe(thread_id, &yield);
if (sqpoll) {
io_uring_prep_writev(sqe, 0, &iov, 1, offset);
sqe->flags |= IOSQE_FIXED_FILE;
} else {
io_uring_prep_writev(sqe, dev_fd, &iov, 1, offset);
}
sqe->user_data = offset;
total_ios--;
if (total_ios % QD == 0) {
submit_to_kernel("Failed to submit new IO", thread_id);
}
offset += BS;
}
printf("[thread_id=%d] Submission complete. yield=%d\n", thread_id, yield);
return NULL;
}
static void *reap_io_completions(void *input)
{
int thread_id = *((int *)input);
int failed_ios = 0, rc;
int total_ios = ios;
while (total_ios != 0) {
struct io_uring_cqe *cqe = NULL;
rc = io_uring_wait_cqe(&ring[thread_id], &cqe);
if (rc < 0 || cqe->res != BS) {
printf("thread_id=%d rc=%d cqe->res=%d offset=%llu\n", thread_id, rc, cqe->res, cqe->user_data);
failed_ios++;
}
total_ios--;
io_uring_cqe_seen(&ring[thread_id], cqe);
}
printf("[thread_id=%d] Failed IO count=%d\n", thread_id, failed_ios);
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t submit[MAX_OBJECTS], complete[MAX_OBJECTS];
int t_ids[MAX_OBJECTS];
int i, rc;
if (argc != 3) {
printf("Expected three arguments\n");
return -EINVAL;
}
ios = atoi(argv[1]);
sqpoll = atoi(argv[2]) == 1;
rc = init();
if (rc != 0) {
return rc;
}
for (i = 0; i < MAX_OBJECTS; i++) {
t_ids[i] = i;
rc = pthread_create(&submit[i], NULL, submit_io, &t_ids[i]);
if (rc < 0) {
printf("Failed to create submit thread. rc=%d\n", rc);
return rc;
}
rc = pthread_create(&complete[i], NULL, reap_io_completions, &t_ids[i]);
if (rc < 0) {
printf("Failed to create complete thread. rc=%d\n", rc);
return rc;
}
}
for (i = 0; i < MAX_OBJECTS; i++) {
pthread_join(submit[i], NULL);
printf("Reaped submit thread_id=%d\n", i);
pthread_join(complete[i], NULL);
printf("Reaped complete thread_id=%d\n", i);
io_uring_queue_exit(&ring[i]);
}
close(dev_fd);
return 0;
}
Following is the output example
Success without SQPOLL
[root@ip-10-0-58-7 liburing]# ./examples/iouring-object 500 0
[thread_id=0] Submission complete. yield=0
Reaped submit thread_id=0
[thread_id=1] Submission complete. yield=0
[thread_id=3] Submission complete. yield=0
[thread_id=2] Submission complete. yield=0
[thread_id=4] Submission complete. yield=0
[thread_id=0] Failed IO count=0
Reaped complete thread_id=0
[thread_id=1] Failed IO count=0
[thread_id=3] Failed IO count=0
[thread_id=2] Failed IO count=0
[thread_id=4] Failed IO count=0
Reaped submit thread_id=1
Reaped complete thread_id=1
Reaped submit thread_id=2
Reaped complete thread_id=2
Reaped submit thread_id=3
Reaped complete thread_id=3
Reaped submit thread_id=4
Reaped complete thread_id=4
Success with SQPOLL
[root@ip-10-0-58-7 liburing]# ./examples/iouring-object 500 1
Initializing liburing with SQPOLL flag
[thread_id=1] Submission complete. yield=883
[thread_id=0] Submission complete. yield=989
Reaped submit thread_id=0
[thread_id=2] Submission complete. yield=1070
[thread_id=3] Submission complete. yield=963
[thread_id=4] Submission complete. yield=966
[thread_id=1] Failed IO count=0
[thread_id=0] Failed IO count=0
Reaped complete thread_id=0
[thread_id=2] Failed IO count=0
[thread_id=3] Failed IO count=0
[thread_id=4] Failed IO count=0
Reaped submit thread_id=1
Reaped complete thread_id=1
Reaped submit thread_id=2
Reaped complete thread_id=2
Reaped submit thread_id=3
Reaped complete thread_id=3
Reaped submit thread_id=4
Reaped complete thread_id=4
Failure without SQPOLL
[root@ip-10-0-58-7 liburing]# ./examples/iouring-object 2000 0
[thread_id=0] Submission complete. yield=0
[thread_id=1] Submission complete. yield=0
Reaped submit thread_id=0
[thread_id=3] Submission complete. yield=0
[thread_id=2] Submission complete. yield=0
[thread_id=4] Submission complete. yield=0
[thread_id=1] Failed IO count=0
[thread_id=2] Failed IO count=0
[thread_id=4] Failed IO count=0
^C
Failure with SQPOLL
[root@ip-10-0-58-7 liburing]# ./examples/iouring-object 5000 1
Initializing liburing with SQPOLL flag
[thread_id=4] Submission complete. yield=32130
[thread_id=1] Submission complete. yield=65518
[thread_id=3] Submission complete. yield=67878
[thread_id=0] Submission complete. yield=72433
Reaped submit thread_id=0
[thread_id=2] Submission complete. yield=73237
[thread_id=1] Failed IO count=0
[thread_id=3] Failed IO count=0
[thread_id=0] Failed IO count=0
Reaped complete thread_id=0
[thread_id=2] Failed IO count=0
Reaped submit thread_id=1
Reaped complete thread_id=1
Reaped submit thread_id=2
Reaped complete thread_id=2
Reaped submit thread_id=3
Reaped complete thread_id=3
Reaped submit thread_id=4
^C
liburing commit - a68caac
Linux kernel has been built from https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.4.1.tar.xz
Calling io_uring_setup with ...SQPOLL returns -1 with errno = 1 (EPERM)
After several failed search on documentations, I eventually found in the kernel code it requires CAP_SYS_ADMIN.
Could you add a few words in liburing's comments, or maybe in a newer version of "Efficient IO with io_uring" to warn new SQPOLL enthusiasts of this possible error?
BTW, this privilege check really makes it hard to use SQPOLL since the user process have to run with "escalated" privilege level, or no SQPOLL at all...
IORING_OP_TIMEOUT returns -ETIME when expires, which is considered an error and will breaks the entire link. As a result, operations after IORING_OP_TIMEOUT with IOSQE_IO_LINK will always be canceled.
#include <unistd.h>
#include <liburing.h>
int main() {
struct io_uring ring;
io_uring_queue_init(8, &ring, 0);
struct io_uring_sqe *sqe1 = io_uring_get_sqe(&ring);
struct __kernel_timespec ts = {
.tv_sec = 1,
.tv_nsec = 0,
};
io_uring_prep_timeout(sqe1, &ts, 0, 0);
io_uring_sqe_set_flags(sqe1, IOSQE_IO_LINK);
struct io_uring_sqe *sqe2 = io_uring_get_sqe(&ring);
struct iovec iov = {
.iov_base = "OK\n",
.iov_len = sizeof("OK\n"),
};
io_uring_prep_writev(sqe2, STDERR_FILENO, &iov, 1, 0);
io_uring_submit_and_wait(&ring, 2);
io_uring_queue_exit(&ring);
}
Expected: waits 1s then print "OK"
Actual: nothing is printed
The man pages seem to indicate that it is fine to register additional/different buffers during the lifetime of a ring. It would help to add to the documentation a statement about when a call to unregister is considered safe.
It comes down to the following question: "Is it safe to unregister and re-register buffers, while an operation like IORING_OP_READ_FIXED
is submitted but not yet completed?" Or in other words: "Would one have to wait until all scheduled operations on registered buffers are completed before unregistering?" I assume the latter but it makes it non-trivial to register/unregister buffers at runtime based on the demand of the application. In this case, one would have to drain (IOSQE_IO_DRAIN
) the ring before registering additional buffers which will likely cause a hiccup in throughput.
Thanks for your work on io_uring, its a really stellar interface! I'm working on wrapping liburing in a Rust library to make it accessible from Rust (as well as higher level memory-safe integrations into our async/.await ecosystem, which haven't born fruit yet).
First, I just want to confirm this is the best place for you to receive questions & pull requests. Let me know if not.
My main question: what is the backwards compatibility story for liburing right now? In general, would you say you will not remove or break APIs exposed by liburing (excepting obviously __
functions for example)? I noticed that you recently removed the syscall helpers from liburing, but I think that was a special case because you expect them to be upstreamed to glibc.
I'm asking to determine my own versioning for my Rust wrappers. Most Rust users use cargo to perform version resolution for them, and cargo makes strong assumptions about backwards compatibility between "semver compatible" versions, so I just need to figure out if I should prepare for possible breaking changes between updates to liburing.
Proposed change:
struct io_uring_params {
__u32 sq_entries;
__u32 cq_entries;
__u32 flags;
__u32 sq_thread_cpu;
__u32 sq_thread_idle;
__u32 features;
__u32 op_last; /* IORING_OP_LAST set by kernel */
__u32 resv[3];
struct io_sqring_offsets sq_off;
struct io_cqring_offsets cq_off;
};
It's not clear if/when fallocate blocks (notable exception for supportive network file systems) but regardless it'd be nice to include in the iouring framework since we can link fsyncs to commit file metadata changes.
Hi,
I'm trying to test various aspects of io_uring
and came to fsync test in this lib.
I've noticed this line: https://github.com/axboe/liburing/blob/master/test/fsync.c#L117
and added just a printf to let me know if my kernel is ok with it or not.
And unfortunatelly it doesn't work.
I've tested it on kernels 5.2.x (where this flag was introduced) and kernel 5.3.x.
It just returns that error and I don't understand why..
When I've added IOSQE_IO_LINK
to sumbission flags of flush operation, it started to run ok.
But it seems to be against what is documented and what is actually tested with the IOSQE_IO_DRAIN
.
I've also tried to search the internet if someone already hasn't faced the same issue and found only this maybe relevant post: https://www.mail-archive.com/[email protected]/msg39033.html, but without any followup..
It this an issue or some misunderstanding? Thanks!
PS: This line https://github.com/axboe/liburing/blob/master/test/fsync.c#L101 should probably be if (ret == -EINVAL)
but is unrelated to this problem.
On 5.5 kernel in case if file size is less than iovec size, cqe.res
will be equal to 0, On 5.4 kernel cqe.res
will contain correct number of bytes read
Sample code:
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/poll.h>
#include "liburing.h"
#define BUF_SIZE 4096
#define FILE_SIZE 1024
static int create_file(const char *file)
{
ssize_t ret;
char *buf;
int fd;
buf = malloc(FILE_SIZE);
memset(buf, 0xaa, FILE_SIZE);
fd = open(file, O_WRONLY | O_CREAT, 0644);
if (fd < 0) {
perror("open file");
return 1;
}
ret = write(fd, buf, FILE_SIZE);
close(fd);
return ret != FILE_SIZE;
}
int main(int argc, char* argv[]) {
int ret, fd;
struct io_uring ring;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
struct iovec vec;
vec.iov_base = malloc(BUF_SIZE);
vec.iov_len = BUF_SIZE;
if (create_file(".basic-r")) {
fprintf(stderr, "file creation failed\n");
return 1;
}
fd = open(".basic-r", O_RDONLY);
if (fd < 0) {
perror("file open");
return 1;
}
ret = io_uring_queue_init(32, &ring, 0);
if (ret)
return ret;
sqe = io_uring_get_sqe(&ring);
if (!sqe) {
fprintf(stderr, "sqe get failed\n");
return 1;
}
io_uring_prep_readv(sqe, fd, &vec, 1, 0);
ret = io_uring_submit(&ring);
if (ret != 1) {
return 1;
}
ret = io_uring_wait_cqes(&ring, &cqe, 1, 0, 0);
if (ret) {
return 1;
}
fprintf(stderr, "cqe res %d", cqe->res);
io_uring_cqe_seen(&ring, cqe);
return 0;
}
// connect.c
#include <liburing.h>
#include <stdio.h>
#include <netdb.h>
#include <sys/socket.h>
#include <unistd.h>
int main() {
struct addrinfo hints = {
.ai_family = AF_UNSPEC,
.ai_socktype = SOCK_STREAM,
}, *addr;
if (getaddrinfo("github.com", "http", &hints, &addr) < 0) {
return 1;
}
int clientfd = socket(addr->ai_family, addr->ai_socktype, addr->ai_protocol);
if (clientfd < 0) return 2;
#ifndef USE_PLAIN_CONNECT
struct io_uring ring;
io_uring_queue_init(32, &ring, 0);
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_connect(sqe, clientfd, addr->ai_addr, addr->ai_addrlen);
io_uring_submit(&ring);
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
io_uring_cqe_seen(&ring, cqe);
int ret = cqe->res;
io_uring_queue_exit(&ring);
#else
int ret = connect(clientfd, addr->ai_addr, addr->ai_addrlen);
#endif
printf("%d\n", ret);
close(clientfd);
return 0;
}
$ clang connect.c -luring -o connect && ./connect
-115
$ clang connect.c -luring -o connect -DUSE_PLAIN_CONNECT && ./connect
0
$ uname -a
Linux carter-virtual-machine 5.4.0-999-generic #201911282213 SMP Fri Nov 29 03:17:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
io_uring currently only supports vectored reads and writes (except for the _fixed operations). While vectored reads and writes are in theory a superset of single reads and writes, the required indirection of the array of iovecs presents some problems.
In particular, I'm interested in creating a memory safe abstraction of io_uring's completion-based API in Rust. Realistically, the best way to do this is for the abstraction to have logical ownership of the buffers until the IO is complete. The naive solution would be to just always allocate an intermediate buffer, which would mean an extra allocation for every read or write operation. There are better solutions which avoid the allocation, but they can be tricky to implement.
It would be easier to create a safe API for unvectored read/write (the common case) if it were supported directly by the io_uring interface. Then the abstraction would only need to manage the lifetime of the actually buffer and not the indirection array as well.
Calling io_uring_wait_cqe
with nothing to wait is a bug, but currently there is no (easy?) way to get the number of pending requests. We should detect the bug and return -EINVAL or something else.
We could also add io_uring_pending_requests
to get the number of pending requests ( with completed requests in cqe )
My toy program spins up two threads. Submits IOs from one thread and reaps IOs from the other. I have setup liburing using IORING_SETUP_SQPOLL flag.
Following is my code:
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "liburing.h"
#define DEVICE_SIZE (512ULL << 30)
#define BS 4096
#define QD 32
static struct io_uring ring;
static int dev_fd;
static void *setup_iov_base(size_t size)
{
void *buf;
int fd;
if (posix_memalign(&buf, BS, size) != 0) {
printf("mem aligned failed\n");
return NULL;
}
fd = open("/dev/urandom", O_RDONLY);
if (fd < 0) {
printf("Failed to open urandom. rc=%d\n", fd);
return NULL;
}
read(fd, buf, size);
close(fd);
[76/282]
return buf;
}
static int init(void)
{
struct io_uring_params p = { 0 };
time_t t;
int rc;
/* Implies no syscalls to submit IOs */
p.flags = IORING_SETUP_SQPOLL;
rc = io_uring_queue_init_params(QD, &ring, &p);
if (rc != 0) {
printf("queue_init failed. rc=%d\n", rc);
return rc;
}
dev_fd = open("/dev/nvme1n1", O_RDWR | O_DIRECT);
if (dev_fd < 0) {
printf("Failed to open nvme device. rc=%d\n", dev_fd);
return dev_fd;
}
/* SQPOLL only works with fixed files. */
rc = io_uring_register_files(&ring, &dev_fd, 1);
if (rc < 0) {
printf("Failed to register files. rc=%d\n", rc);
return rc;
}
srand((unsigned) time(&t));
return 0;
}
static inline void submit_to_kernel(char *failure_message)
{
int rc;
rc = io_uring_submit(&ring);
if (rc < 0) {
printf("%s. rc=%d\n", failure_message, rc);
}
}
static struct io_uring_sqe *get_sqe(int *yield)
{
struct io_uring_sqe *sqe;
while ((sqe = io_uring_get_sqe(&ring)) == NULL) {
/* Kick kernel thread if it is taking a nap */
submit_to_kernel("Failure to wake napping thread");
*yield = *yield + 1;
/* TODO: Use condition variables */
pthread_yield();
}
return sqe;
}
static void *submit_io(void *input)
{
char *buf = setup_iov_base(BS);
off_t offset = 0;
int total_ios = *((int *)input);
int yield = 0;
while (total_ios != 0) {
struct io_uring_sqe *sqe = get_sqe(&yield);
struct iovec iov = {
.iov_base = buf,
.iov_len = BS,
};
io_uring_prep_writev(sqe, 0, &iov, 1, offset);
sqe->flags |= IOSQE_FIXED_FILE;
sqe->user_data = offset;
total_ios--;
if (total_ios % QD == 0) {
submit_to_kernel("Failed to submit new IO");
}
offset += BS;
}
printf("submit_io yield %d times\n", yield);
return NULL;
}
static void *reap_io_completions(void *input)
{
int total_ios = *((int *)input);
int failed_ios = 0;
while (total_ios != 0) {
struct io_uring_cqe *cqe = NULL;
/* This call blocks if no CQE entries are available */
int rc = io_uring_wait_cqe(&ring, &cqe);
if (rc < 0 || cqe->res != BS) {
printf("rc=%d cqe->res=%d offset=%llu\n", rc, cqe->res, cqe->user_data);
failed_ios++;
}
total_ios--;
io_uring_cqe_seen(&ring, cqe);
}
printf("Failed IO count=%d\n", failed_ios);
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t submit, complete;
int total_ios;
int rc;
if (argc != 2) {
printf("Expected two arguments\n");
return -EINVAL;
}
total_ios = atoi(argv[1]);
rc = init();
if (rc != 0) {
return rc;
}
rc = pthread_create(&submit, NULL, submit_io, &total_ios);
if (rc < 0) {
printf("Failed to create submit thread. rc=%d\n", rc);
return rc;
}
rc = pthread_create(&complete, NULL, reap_io_completions, &total_ios);
if (rc < 0) {
printf("Failed to create complete thread. rc=%d\n", rc);
return rc;
}
pthread_join(submit, NULL);
pthread_join(complete, NULL);
io_uring_queue_exit(&ring);
close(dev_fd);
return 0;
}
Here is my output for multiple runs:
[root@ip-10-0-58-7 liburing]# ./examples/iouringthread 65
submit_io yield 74 times
rc=0 cqe->res=-14 offset=258048
rc=0 cqe->res=-14 offset=262144
Failed IO count=2
[root@ip-10-0-58-7 liburing]# ./examples/iouringthread 65
submit_io yield 194 times
rc=0 cqe->res=-14 offset=204800
rc=0 cqe->res=-14 offset=208896
rc=0 cqe->res=-14 offset=212992
rc=0 cqe->res=-14 offset=217088
rc=0 cqe->res=-14 offset=221184
rc=0 cqe->res=-14 offset=225280
rc=0 cqe->res=-14 offset=229376
rc=0 cqe->res=-14 offset=233472
rc=0 cqe->res=-14 offset=237568
rc=0 cqe->res=-14 offset=241664
rc=0 cqe->res=-14 offset=245760
rc=0 cqe->res=-14 offset=249856
rc=0 cqe->res=-14 offset=253952
rc=0 cqe->res=-14 offset=258048
rc=0 cqe->res=-14 offset=262144
Failed IO count=15
[root@ip-10-0-58-7 liburing]# ./examples/iouringthread 65
submit_io yield 69 times
rc=0 cqe->res=-14 offset=196608
rc=0 cqe->res=-14 offset=200704
rc=0 cqe->res=-14 offset=204800
rc=0 cqe->res=-14 offset=208896
rc=0 cqe->res=-14 offset=212992
rc=0 cqe->res=-14 offset=217088
rc=0 cqe->res=-14 offset=221184
rc=0 cqe->res=-14 offset=225280
rc=0 cqe->res=-14 offset=229376
rc=0 cqe->res=-14 offset=233472
rc=0 cqe->res=-14 offset=237568
rc=0 cqe->res=-14 offset=241664
rc=0 cqe->res=-14 offset=245760
rc=0 cqe->res=-14 offset=249856
rc=0 cqe->res=-14 offset=253952
rc=0 cqe->res=-14 offset=258048
rc=0 cqe->res=-14 offset=262144
Failed IO count=17
If I update the code to not use SQPOLL it works just fine.
liburing commit ID - a68caac
Timed waiting is commonly used and a necessary feature to replace epoll_wait (for fewer syscalls and better integration with io_uring file aio).
AFAIK the old Linux AIO does support timed-wait. Please consider adding timed-wait support to io_uring too.
#include <unistd.h>
#include <sys/signalfd.h>
#include <sys/poll.h>
#include <liburing.h>
int main() {
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigprocmask(SIG_BLOCK, &mask, NULL);
int sfd = signalfd(-1, &mask, SFD_NONBLOCK);
struct io_uring ring;
io_uring_queue_init(32, &ring, 0);
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_poll_add(sqe, sfd, POLLIN);
io_uring_submit(&ring);
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
io_uring_cqe_seen(&ring, cqe);
io_uring_queue_exit(&ring);
close(sfd);
return 0;
}
Ctrl+C
should terminate the program but it doesn't. Similar code works for epoll: https://gist.github.com/CarterLi/b8db2fcfea689b96eeae382c38130afb
Linux Ubuntu 5.3.0-10-generic #11-Ubuntu SMP Mon Sep 9 15:12:17 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Currently we forbid flags for IORING_OP_TIMEOUT.
https://github.com/torvalds/linux/blob/63de37476ebd1e9bab6a9e17186dc5aa1da9ea99/fs/io_uring.c#L2456
I think it's reasonable for io_uring_wait_cqe_timeout
. But for pure timeout ( ie REQ_F_TIMEOUT_NOSEQ ), this operation should behave like other operations and should allow for common sqe flags.
This is a valid usage:
And this should be valid too:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.