pixels-reader-cxx's People
pixels-reader-cxx's Issues
Multiple read in q2
In tpch q2, nation, region, partsupp and supplier will be loaded two times.
readBatch too slow when the batchSize is small
In duckdb, the default batch size is 2048.
The result of pixels tpch q1(tpch 1) benchmark:
The result of parquet tpch q1(tpch 1) benchmark:
I perf pixels and find that the allocation and deallocation of proto object takes a lot of time in the function readBatch
. It is better to change the proto objects to member variable.
work thread segment fault in PixelsScanInitLocal
Think of the following issue: some work threads work fast and some work threads work slow. The fast work threads finish all tasks, and the slow work threads haven't finish the PixelsScanInitLocal. Based on the current logic of PixelsScanInitLocal, a segmentation fault occurs because pixels reader belonging to this thread is not initialized.
readAsyncComplete Implementation
Currently the readAsyncComplete
code is:
struct io_uring_cqe *cqe;
if(io_uring_wait_cqe_nr(ring, &cqe, 1) != 0) {
throw InvalidArgumentException("DirectUringRandomAccessFile::readAsyncComplete: wait cqe fails");
}
uringData * data = (uringData *)io_uring_cqe_get_data(cqe);
io_uring_cqe_seen(ring, cqe);
Previously we try to write the code as:
struct io_uring_cqe *cqe;
if(io_uring_wait_cqe_nr(ring, &cqe, size) != 0) {
throw InvalidArgumentException("DirectUringRandomAccessFile::readAsyncComplete: wait cqe fails");
}
uringData * data = (uringData *)io_uring_cqe_get_data(cqe);
But I got some random errors. Sometimes io_uring_prep_read
cannot read all buffer bytes. I found that io_uring_cqe_seen
is needed and the size of io_uring_wait_cqe_nr
must be 1, otherwise the random error would happen.
Global Initialization for iouring
As we discussed above, ring
is thread local, thus we need to initialize ring
when the thread starts to process some specific table. In our current logic in PixelsScanFunction
, we have to initialize ring
in PixelsScanInitLocal
function insteading of initializing it in the constructor of DirectRandomAccessFile
. Otherwise the following case would happen: PixelsScanInitGlobal
creates a DirectRandomAccessFile
instance in thread A, and initialization function would be called in thread A. When we execute PixelsScanInitLocal
of thread B, B might directly use DirectRandomAccessFile
instance of thread A instead of creating its own DirectRandomAccessFile
instance (the case happens when DirectRandomAccessFile
instance is for the first pxl file). In this case, B would not initialize its own ring
.
IOUring: double free to ring
The double free of ring would happen
Result not stable
SELECT
p_partkey
FROM
'/scratch/liyu/opt/pixels_file/pixels-tpch-300/part/v-0-order/.pxl',
'/scratch/liyu/opt/pixels_file/pixels-tpch-300/partsupp/v-0-order/.pxl'
WHERE
ps_supplycost = (
SELECT
min(ps_supplycost)
FROM
'/scratch/liyu/opt/pixels_file/pixels-tpch-300/partsupp/v-0-order/*.pxl'
WHERE
p_partkey = ps_partkey)
LIMIT 100;
This result is not stable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.