Git Product home page Git Product logo

Comments (3)

coderex2522 avatar coderex2522 commented on July 4, 2024 1

For List type, the length of the offset array is one greater than the number of rows. Because the number of elements in the current row is confirmed by offsets[row] - offsets[row - 1](row number>= 1). So your example code needs to add the following two lines.
void write_orc()
{
using namespace orc;

ORC_UNIQUE_PTR<OutputStream> outStream = writeLocalFile("test-file.orc");
ORC_UNIQUE_PTR<Type> schema(
    Type::buildTypeFromString("struct<id:int,list1:array<string>>"));
WriterOptions options;
ORC_UNIQUE_PTR<Writer> writer = createWriter(*schema, outStream.get(), options);

std::unique_ptr<Writer> writer = createWriter(*type, stream.get(), options);

uint64_t batch_size = 1024, row_count = 2048;

std::unique_ptr<ColumnVectorBatch> batch =
    writer->createRowBatch(row_count);
StructVectorBatch &root_batch =
    dynamic_cast<StructVectorBatch &>(*batch.get());
LongVectorBatch &id_batch =
    dynamic_cast<LongVectorBatch &>(*struct_batch.fields[0]);
ListVectorBatch &list_batch =
    dynamic_cast<ListVectorBatch &>(*struct_batch.fields[1]);
StringVectorBatch &str_batch =
    dynamic_cast<StringVectorBatch &>(*list_batch.elements.get());

std::vector<std::string> vs{"str1", "str2"};

char **data         = str_batch.data.data();
int64_t *offsets    = list_batch.offsets.data();
uint64_t offset     = 0, rows = 0;
for (size_t i = 0; i < row_count; ++i) {
    offsets[rows] = static_cast<int64_t>(offset);

    id_batch.data[rows] = articles[i]->get_id();

    for (auto &s : vs)
    {
        data[offset] = &s[0];
        str_batch.length[offset++] = s.size();
    }

    rows++;
    if (rows == batch_size) 
    {
        offsets[rows] = offset; // new line
        root_batch.numElements = rows;
        id_batch.numElements   = rows;
        list_batch.numElements = rows;

        writer->add(*batch);
        rows = 0;
        offset = 0;
    }
}

if (rows != 0) 
{
    offsets[rows] = offset; // new line
    root_batch.numElements = rows;
    id_batch.numElements   = rows;
    list_batch.numElements = rows;

    writer->add(*batch);
    rows = 0;
    offset = 0;
}

writer->close();

}

from orc.

dongjoon-hyun avatar dongjoon-hyun commented on July 4, 2024

Thank you, @coderex2522 .

from orc.

dongjoon-hyun avatar dongjoon-hyun commented on July 4, 2024

Given the above comment, I close this issue.

from orc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.