Git Product home page Git Product logo

pixels-reader-cxx's People

Contributors

yuly16 avatar

Watchers

 avatar

pixels-reader-cxx's Issues

Multiple read in q2

In tpch q2, nation, region, partsupp and supplier will be loaded two times.

readBatch too slow when the batchSize is small

In duckdb, the default batch size is 2048.

The result of pixels tpch q1(tpch 1) benchmark:
image
image

The result of parquet tpch q1(tpch 1) benchmark:
image

image

I perf pixels and find that the allocation and deallocation of proto object takes a lot of time in the function readBatch. It is better to change the proto objects to member variable.

work thread segment fault in PixelsScanInitLocal

Think of the following issue: some work threads work fast and some work threads work slow. The fast work threads finish all tasks, and the slow work threads haven't finish the PixelsScanInitLocal. Based on the current logic of PixelsScanInitLocal, a segmentation fault occurs because pixels reader belonging to this thread is not initialized.

readAsyncComplete Implementation

Currently the readAsyncComplete code is:

struct io_uring_cqe *cqe;
if(io_uring_wait_cqe_nr(ring, &cqe, 1) != 0) {
    throw InvalidArgumentException("DirectUringRandomAccessFile::readAsyncComplete: wait cqe fails");
}
uringData * data = (uringData *)io_uring_cqe_get_data(cqe);
io_uring_cqe_seen(ring, cqe);

Previously we try to write the code as:

struct io_uring_cqe *cqe;
if(io_uring_wait_cqe_nr(ring, &cqe, size) != 0) {
    throw InvalidArgumentException("DirectUringRandomAccessFile::readAsyncComplete: wait cqe fails");
}
uringData * data = (uringData *)io_uring_cqe_get_data(cqe);

But I got some random errors. Sometimes io_uring_prep_read cannot read all buffer bytes. I found that io_uring_cqe_seen is needed and the size of io_uring_wait_cqe_nr must be 1, otherwise the random error would happen.

Global Initialization for iouring

As we discussed above, ring is thread local, thus we need to initialize ring when the thread starts to process some specific table. In our current logic in PixelsScanFunction, we have to initialize ring in PixelsScanInitLocal function insteading of initializing it in the constructor of DirectRandomAccessFile. Otherwise the following case would happen: PixelsScanInitGlobal creates a DirectRandomAccessFile instance in thread A, and initialization function would be called in thread A. When we execute PixelsScanInitLocal of thread B, B might directly use DirectRandomAccessFile instance of thread A instead of creating its own DirectRandomAccessFile instance (the case happens when DirectRandomAccessFile instance is for the first pxl file). In this case, B would not initialize its own ring.

Result not stable

SELECT
p_partkey
FROM
'/scratch/liyu/opt/pixels_file/pixels-tpch-300/part/v-0-order/.pxl',
'/scratch/liyu/opt/pixels_file/pixels-tpch-300/partsupp/v-0-order/
.pxl'
WHERE
ps_supplycost = (
SELECT
min(ps_supplycost)
FROM
'/scratch/liyu/opt/pixels_file/pixels-tpch-300/partsupp/v-0-order/*.pxl'
WHERE
p_partkey = ps_partkey)
LIMIT 100;

This result is not stable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.