Git Product home page Git Product logo

orc's Introduction

ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the values that are required for the current query. Because ORC files are type-aware, the writer chooses the most appropriate encoding for the type and builds an internal index as the file is written. Predicate pushdown uses those indexes to determine which stripes in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of 10,000 rows. ORC supports the complete set of types in Hive, including the complex types: structs, lists, maps, and unions.

ORC File Library

This project includes both a Java library and a C++ library for reading and writing the Optimized Row Columnar (ORC) file format. The C++ and Java libraries are completely independent of each other and will each read all versions of ORC files.

Releases:

The current build status:

  • Main branch main build status

Bug tracking: Apache Jira

The subdirectories are:

  • c++ - the c++ reader and writer
  • cmake_modules - the cmake modules
  • docker - docker scripts to build and test on various linuxes
  • examples - various ORC example files that are used to test compatibility
  • java - the java reader and writer
  • site - the website and documentation
  • tools - the c++ tools for reading and inspecting ORC files

Building

  • Install java 17 or higher
  • Install maven 3.9.6 or higher
  • Install cmake 3.12 or higher

To build a release version with debug information:

% mkdir build
% cd build
% cmake ..
% make package
% make test-out

To build a debug version:

% mkdir build
% cd build
% cmake .. -DCMAKE_BUILD_TYPE=DEBUG
% make package
% make test-out

To build a release version without debug information:

% mkdir build
% cd build
% cmake .. -DCMAKE_BUILD_TYPE=RELEASE
% make package
% make test-out

To build only the Java library:

% cd java
% ./mvnw package

To build only the C++ library:

% mkdir build
% cd build
% cmake .. -DBUILD_JAVA=OFF
% make package
% make test-out

To build the C++ library with AVX512 enabled:

export ORC_USER_SIMD_LEVEL=AVX512
% mkdir build
% cd build
% cmake .. -DBUILD_JAVA=OFF -DBUILD_ENABLE_AVX512=ON
% make package
% make test-out

Cmake option BUILD_ENABLE_AVX512 can be set to "ON" or (default value)"OFF" at the compile time. At compile time, it defines the SIMD level(AVX512) to be compiled into the binaries.

Environment variable ORC_USER_SIMD_LEVEL can be set to "AVX512" or (default value)"NONE" at the run time. At run time, it defines the SIMD level to dispatch the code which can apply SIMD optimization.

Note that if ORC_USER_SIMD_LEVEL is set to "NONE" at run time, AVX512 will not take effect at run time even if BUILD_ENABLE_AVX512 is set to "ON" at compile time.

orc's People

Contributors

autumnust avatar belugabehr avatar boroknagyz avatar coderex2522 avatar cxzl25 avatar czxrrr avatar dependabot[bot] avatar deshanxiao avatar dongjoon-hyun avatar fangzheng avatar ffacs avatar guiyanakuang avatar jcamachor avatar luffy-zh avatar luksan47 avatar moresandeep avatar noirello avatar omalley avatar paliwalashish avatar pavibhai avatar pgaref avatar prasanthj avatar rip-nsk avatar sershe-apache avatar stiga-huang avatar wgtmac avatar williamhyun avatar xndai avatar yuokada avatar zhjwpku avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

orc's Issues

The result is strange when casting `string` to `date` in ORC reading via spark (Schema Evolution)

I created an ORC file by the code as follows.

val data = Seq(
    ("", "2022-01-32"),  // pay attention to this, null
    ("", "9808-02-30"),  // pay attention to this, 9808-02-29
    ("", "2022-06-31"),  // pay attention to this, 2022-06-30

)
val cols = Seq("str", "date_str")
val df=spark.createDataFrame(data).toDF(cols:_*).repartition(1)
df.printSchema()
df.show(100)
df.write.mode("overwrite").orc("/tmp/orc/data.orc")

Please note that these three cases are invalid date.
And I read it via:

scala> var df = spark.read.schema("date_str date").orc("/tmp/orc/data.orc"); df.show()
+----------+
|  date_str|
+----------+
|      null|
|9808-02-29|
|2022-06-30|
+----------+

Why is 2022-01-32 converted to null, while 9808-02-30 is converted to 9808-02-29?

Intuitively, they are invalid date, we should return 3 nulls.

Tests failure on 1.7.3

While building for Arch Linux, Iโ€™ve encountered some tests failures (after backporting ffbd341):

[ RUN      ] TestPredicateLeaf.testIntNullSafeEqualsBloomFilter
/build/apache-orc/src/orc-1.7.3/c++/test/TestPredicateLeaf.cc:635: Failure
      Expected: TruthValue::YES_NO
      Which is: 4-byte object <05-00 00-00>
To be equal to: evaluate(pred, createIntStats(10, 100), &bf)
      Which is: 4-byte object <01-00 00-00>
[  FAILED  ] TestPredicateLeaf.testIntNullSafeEqualsBloomFilter (1 ms)
[ RUN      ] TestPredicateLeaf.testIntEqualsBloomFilter
/build/apache-orc/src/orc-1.7.3/c++/test/TestPredicateLeaf.cc:652: Failure
      Expected: TruthValue::YES_NO_NULL
      Which is: 4-byte object <06-00 00-00>
To be equal to: evaluate(pred, createIntStats(10, 100, true), &bf)
      Which is: 4-byte object <04-00 00-00>
[  FAILED  ] TestPredicateLeaf.testIntEqualsBloomFilter (0 ms)
[ RUN      ] TestPredicateLeaf.testIntInBloomFilter
/build/apache-orc/src/orc-1.7.3/c++/test/TestPredicateLeaf.cc:667: Failure
      Expected: TruthValue::YES_NO_NULL
      Which is: 4-byte object <06-00 00-00>
To be equal to: evaluate(pred, createIntStats(10, 100, true), &bf)
      Which is: 4-byte object <04-00 00-00>
/build/apache-orc/src/orc-1.7.3/c++/test/TestPredicateLeaf.cc:670: Failure
      Expected: TruthValue::YES_NO_NULL
      Which is: 4-byte object <06-00 00-00>
To be equal to: evaluate(pred, createIntStats(10, 100, true), &bf)
      Which is: 4-byte object <04-00 00-00>
[  FAILED  ] TestPredicateLeaf.testIntInBloomFilter (0 ms)
[ RUN      ] TestPredicateLeaf.testDateNullSafeEqualsBloomFilter
/build/apache-orc/src/orc-1.7.3/c++/test/TestPredicateLeaf.cc:778: Failure
      Expected: TruthValue::YES_NO
      Which is: 4-byte object <05-00 00-00>
To be equal to: evaluate(pred, createDateStats(10.0, 100.0), &bf)
      Which is: 4-byte object <01-00 00-00>
[  FAILED  ] TestPredicateLeaf.testDateNullSafeEqualsBloomFilter (0 ms)
[ RUN      ] TestPredicateLeaf.testDateEqualsBloomFilter
/build/apache-orc/src/orc-1.7.3/c++/test/TestPredicateLeaf.cc:795: Failure
      Expected: TruthValue::YES_NO_NULL
      Which is: 4-byte object <06-00 00-00>
To be equal to: evaluate(pred, createDateStats(10.0, 100.0, true), &bf)
      Which is: 4-byte object <04-00 00-00>
[  FAILED  ] TestPredicateLeaf.testDateEqualsBloomFilter (0 ms)
[ RUN      ] TestPredicateLeaf.testDateInBloomFilter
/build/apache-orc/src/orc-1.7.3/c++/test/TestPredicateLeaf.cc:812: Failure
      Expected: TruthValue::YES_NO_NULL
      Which is: 4-byte object <06-00 00-00>
To be equal to: evaluate(pred, createDateStats(10.0, 100.0, true), &bf)
      Which is: 4-byte object <04-00 00-00>
/build/apache-orc/src/orc-1.7.3/c++/test/TestPredicateLeaf.cc:815: Failure
      Expected: TruthValue::YES_NO_NULL
      Which is: 4-byte object <06-00 00-00>
To be equal to: evaluate(pred, createDateStats(10.0, 100.0, true), &bf)
      Which is: 4-byte object <04-00 00-00>
[  FAILED  ] TestPredicateLeaf.testDateInBloomFilter (0 ms)
[ RUN      ] TestBloomFilter.testBloomFilterBasicOperations
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:134: Failure
Value of: bloomFilter.mBitSet->get(288)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:134: Failure
Value of: bloomFilter.mBitSet->get(246)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:134: Failure
Value of: bloomFilter.mBitSet->get(306)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:134: Failure
Value of: bloomFilter.mBitSet->get(228)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:138: Failure
Value of: bloomFilter.mBitSet->get(458)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:138: Failure
Value of: bloomFilter.mBitSet->get(545)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:138: Failure
Value of: bloomFilter.mBitSet->get(717)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:140: Failure
Value of: bloomFilter.mBitSet->get(526)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:140: Failure
Value of: bloomFilter.mBitSet->get(40)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:140: Failure
Value of: bloomFilter.mBitSet->get(480)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:140: Failure
Value of: bloomFilter.mBitSet->get(86)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:144: Failure
Value of: bloomFilter.mBitSet->get(308)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:144: Failure
Value of: bloomFilter.mBitSet->get(335)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:144: Failure
Value of: bloomFilter.mBitSet->get(108)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:144: Failure
Value of: bloomFilter.mBitSet->get(535)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:148: Failure
Value of: bloomFilter.mBitSet->get(279)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:148: Failure
Value of: bloomFilter.mBitSet->get(15)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:148: Failure
Value of: bloomFilter.mBitSet->get(54)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:150: Failure
Value of: bloomFilter.mBitSet->get(680)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:150: Failure
Value of: bloomFilter.mBitSet->get(818)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:150: Failure
Value of: bloomFilter.mBitSet->get(434)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:150: Failure
Value of: bloomFilter.mBitSet->get(232)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:154: Failure
Value of: bloomFilter.testLong(111)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:157: Failure
Value of: bloomFilter.testLong(-1)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:159: Failure
Value of: bloomFilter.testLong(-111)
  Actual: false
Expected: true
[  FAILED  ] TestBloomFilter.testBloomFilterBasicOperations (0 ms)
[ RUN      ] TestBloomFilter.testBloomFilterSerialization
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:262: Failure
Value of: dstBloomFilter->testLong(11)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:263: Failure
Value of: dstBloomFilter->testLong(111)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:267: Failure
Value of: dstBloomFilter->testLong(-11)
  Actual: false
Expected: true
/build/apache-orc/src/orc-1.7.3/c++/test/TestBloomFilter.cc:268: Failure
Value of: dstBloomFilter->testLong(-111)
  Actual: false
Expected: true
[  FAILED  ] TestBloomFilter.testBloomFilterSerialization (0 ms)

And in the second suite:

[ RUN      ] TestFileScan.testErrorHandling
/build/apache-orc/src/orc-1.7.3/tools/test/TestFileScan.cc:209: Failure
Expected: (std::string::npos) != (error.find(error_msg)), actual: 18446744073709551615 vs 18446744073709551615
[  FAILED  ] TestFileScan.testErrorHandling (27 ms)

We build with those CMake flags:

    -DCMAKE_CXX_FLAGS="${CXXFLAGS} -fPIC -ffat-lto-objects" \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX="/usr" \
    -DLZ4_HOME="/usr" \
    -DPROTOBUF_HOME="/usr" \
    -DSNAPPY_HOME="/usr" \
    -DZLIB_HOME="/usr" \
    -DZSTD_HOME="/usr" \
    -DORC_PREFER_STATIC_ZLIB=OFF \
    -DBUILD_LIBHDFSPP=OFF \
    -DBUILD_JAVA=OFF \
    -DINSTALL_VENDORED_LIBS=OFF

Please tell me what I can provide you with to assist in the resolution of these failures.

ORC-1172: Add row count limit config for one stripe

for query engine like presto๏ผŒstripe is the base unit for query concurrency, one stripe can only be processed by one split.
In current implement of orc writer, the only config which can control row count in stripe is the "orc.stripe.size".
But for different kind of table, the row count is difficult to use.

  • for table with much columns( eg. 100 columns), 64MB may contain 5000 rows.
  • for table with less columns(eg. 5 columns), 64MB may contain 100000 rows.

for presto, normal olap query only read a subset of table columns, the row count is the key factor of query performance. If one stripe contain much rows, the query performance may become too low.

So, besides the config "orc.stripe.size", we need another config like "orc.stripe.row.count" to control the row count of one stripe.
The similar config has been introduced to cudf ( a GPU DataFrame library base on apache arrow): rapidsai/cudf#9261

ORC Format Parsing โ€“ Out-of-Bounds Access in Protobuf Messages

what happens if type.fieldnames_size() is less than the type.subtypes_size(). The type.fieldnames(i) will be invalid access.

File name : TypeImpl.cc

case proto::Type_Kind_STRUCT: {
TypeImpl* result = new TypeImplSTRUCT);
uint64_t size = static_cast<uint64_t>(type.subtypes_size());
std::vector<Type*> typeList(size);
std::vector<std::string> fieldList(size);
for(int i=0; i < type.subtypes_size(); ++i)
{ result->addStructField(type.fieldnames(i), convertType(footer.types(static_cast<int> (type.subtypes(i))),footer)); }
return std::unique_ptr<Type>(result);

I reproduced the scenario by modifying c++/test/TestType.cc (refer screenshot) and it crashed.(refer output screenshot
Screen Shot 2022-01-19 at 12 33 21 PM
)

Screen Shot 2022-01-19 at 12 32 29 PM

Release Apache ORC 1.8.1

$ ./run-all.sh apache branch-1.8

$ tail -n2 logs/*log
==> logs/centos7-test.log <==
Built target test-out
Finished centos7 at Mon Nov 28 22:32:44 PST 2022

==> logs/debian10-test.log <==
Built target test-out
Finished debian10 at Mon Nov 28 22:38:16 PST 2022

==> logs/debian10_jdk=11-test.log <==
Built target test-out
Finished debian10_jdk=11 at Mon Nov 28 22:41:52 PST 2022

==> logs/debian11-test.log <==
Built target test-out
Finished debian11 at Mon Nov 28 22:39:43 PST 2022

==> logs/fedora37-test.log <==
Built target test-out
Finished fedora37 at Mon Nov 28 22:39:08 PST 2022

==> logs/ubuntu18-test.log <==
Built target test-out
Finished ubuntu18 at Mon Nov 28 22:37:51 PST 2022

==> logs/ubuntu20-test.log <==
Built target test-out
Finished ubuntu20 at Mon Nov 28 22:42:01 PST 2022

==> logs/ubuntu20_jdk=11-test.log <==
Built target test-out
Finished ubuntu20_jdk=11 at Mon Nov 28 22:43:10 PST 2022

==> logs/ubuntu20_jdk=11_cc=clang-test.log <==
Built target test-out
Finished ubuntu20_jdk=11_cc=clang at Mon Nov 28 22:39:06 PST 2022

==> logs/ubuntu22-test.log <==
Built target test-out
Finished ubuntu22 at Mon Nov 28 22:40:22 PST 2022

AFTER VOTE

Release Apache ORC 1.6.14

  • branch-1.6 is healthy in GitHub Action
  • Docker tests (CentOS 7, Debian 9, Debian 10, Debian 11, Ubuntu 16, Ubuntu 18, Ubuntu 20) passed except one known issue, testLzoLong.
Failed tests
[  FAILED  ] TestDecompression.testLzoLong (0 ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestDecompression.testLzoLong
 1 FAILED TEST
FAILED debian11

ORC 1.7.4-SNAPSHOT fails with Iceberg Nan count tests

org.apache.iceberg.data.TestMetricsRowGroupFilter > testIsNaN[format = orc] FAILED
    java.lang.AssertionError: Should read: NaN counts are not tracked in Parquet metrics
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.apache.iceberg.data.TestMetricsRowGroupFilter.testIsNaN(TestMetricsRowGroupFilter.java:308)

org.apache.iceberg.data.TestMetricsRowGroupFilter > testNotNaN[format = orc] FAILED
    java.lang.AssertionError: Should read: NaN counts are not tracked in Parquet metrics
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.apache.iceberg.data.TestMetricsRowGroupFilter.testNotNaN(TestMetricsRowGroupFilter.java:320)

#1055 looks relevant to this issue.

Byte to integer conversions fail on platforms with unsigned char type

When building the C++ library on a platform where char is by default unsigned, byte-to-integer expansion is incorrect in orc::expandBytesToIntegers as well as in a few unit tests.

This can be reproduced on any CPU architecture when building with gcc by compiling with -funsigned-char.

[C++] Unable to filter DECIMAL column from ORC file

This question is similar to THIS one I asked before on StackOverflow, which after some more trials it works.

Previously there was some issue with the column Id but now I am trying to filter a column of DECIMAL data type but always results give me all the data instead of the filtered one.



Data which ORC file has in the required columns:

enter image description here

And this is how I am trying to filter out the DECIMAL column using orc::SearchArgument:

orc::RowReaderOptions m_RowReaderOpts;
orc::ReaderOptions m_ReaderOpts;

std::unique_ptr<orc::Reader> m_Reader;
std::unique_ptr<orc::RowReader> m_RowReader;

auto builder = orc::SearchArgumentFactory::newBuilder();
const int snapshot_time_col_id = 22;

orc::Literal ss_begin_time{34080000000000, 14, 9};
orc::Literal ss_end_time{34380000000000, 14, 9};

// I HAVE ALSO TRIED, but didn't work.
// orc::Literal ss_begin_time{34080, 5, 0};
// orc::Literal ss_end_time{34380, 5, 0};

builder->between(snapshot_time_col_id, orc::PredicateDataType::DECIMAL, ss_begin_time, ss_end_time);

m_RowReaderOpts.searchArgument(builder->build());
reader = orc::createReader(orc::readFile(a_FilePath.c_str()), m_ReaderOpts);
row_reader = reader->createRowReader(m_RowReaderOpts);

Please give some suggestions on how to filter data of type DECIMAL?

How to set list's offsets correctly

Greetings,
I'm learning to work with ORC in C++, and I think I'm stuck and don't quite understand how to set array's offsets. Precisely, the following code, when executed, produces the following exception: "Caught exception in test-file.orc: bad read in nextBuffer"
"):

void write_orc()
{
    using namespace orc;

    ORC_UNIQUE_PTR<OutputStream> outStream = writeLocalFile("test-file.orc");
    ORC_UNIQUE_PTR<Type> schema(
        Type::buildTypeFromString("struct<id:int,list1:array<string>>"));
    WriterOptions options;
    ORC_UNIQUE_PTR<Writer> writer = createWriter(*schema, outStream.get(), options);

    std::unique_ptr<Writer> writer = createWriter(*type, stream.get(), options);

    uint64_t batch_size = 1024, row_count = 2048;

    std::unique_ptr<ColumnVectorBatch> batch =
        writer->createRowBatch(row_count);
    StructVectorBatch &root_batch =
        dynamic_cast<StructVectorBatch &>(*batch.get());
    LongVectorBatch &id_batch =
        dynamic_cast<LongVectorBatch &>(*struct_batch.fields[0]);
    ListVectorBatch &list_batch =
        dynamic_cast<ListVectorBatch &>(*struct_batch.fields[1]);
    StringVectorBatch &str_batch =
        dynamic_cast<StringVectorBatch &>(*list_batch.elements.get());
    
    std::vector<std::string> vs{"str1", "str2"};

    char **data         = str_batch.data.data();
    int64_t *offsets    = list_batch.offsets.data();
    uint64_t offset     = 0, rows = 0;
    for (size_t i = 0; i < row_count; ++i) {
        offsets[rows] = static_cast<int64_t>(offset);

        id_batch.data[rows] = articles[i]->get_id();

        for (auto &s : vs)
        {
            data[offset] = &s[0];
            str_batch.length[offset++] = s.size();
        }

        rows++;
        if (rows == batch_size) 
        {
            root_batch.numElements = rows;
            id_batch.numElements   = rows;
            list_batch.numElements = rows;

            writer->add(*batch);
            rows = 0;
            offset = 0;
        }
    }

    if (rows != 0) 
    {
        root_batch.numElements = rows;
        id_batch.numElements   = rows;
        list_batch.numElements = rows;

        writer->add(*batch);
        rows = 0;
        offset = 0;
    }

    writer->close();
}

My question is: what exactly am I doing wrong when setting list's offsets?

RecordReaderImpl.getValueRange() may cause incorrect results

orc version: 1.6.11, sql: select xxx from xxx where str is not null

Recently i found some orc files wrote by trino didn't have complete statistics in files meta(maybe a presto bug), this causes OrcProto.ColumnStatistics can't be deserialized to any specific ColumnStatisticsImpl such as StringStatisticsImpl, then RecordReaderImpl.getValueRange() returns ValueRange with null lower and RecordReaderImpl.pickRowGroups() skips this row group, which should not be skipped. In normal conditions except above, everything is ok. And i found orc-1.5.x can handle above case according to RecordReaderImpl.UNKNOWN_VALUE, which has removed in 1.6.x. Maybe we could add it back for better compatibility. @dongjoon-hyun @omalley

ORC-1121: Predicate pushdown does not work

Hi, I have a problem, my test data is tpc-ds 1g, spark 3.2, orc version 1.6.11,
test sql select count(1) from call_center_orc where cc_call_center_sk > 100;
cc_call_center_sk is the first column in call_center_orc, and predicate pushdown is effectual
image
but when i test select count(1) from call_center_orc where cc_company > 100;
cc_company is not the first column, predicate pushdown does not work
image

And I debug the code, I found the problem is SchemaEvolution.ppdSafeConversion,
image
in my case, result.size is 2, but in pickRowGroups
image
the columnIx is the column index in orc meta, which is 19 for cc_company, this causes orc will not evaluate pushdown filters with row group stats, and can not skip the row group.

Release Apache ORC 1.7.6

  • branch-1.7 is healthy in GitHub Action
  • Apache ORC DockerHub repository request
  • Docker tests (CentOS 7, Debian 10, Debian 11, Ubuntu 18, Ubuntu 20, Ubuntu 22) passed.
  • Apache Spark master integration test passed.
  • Apache Iceberg master integration test passed.
  • Tag is created.
  • Upload source artifact
  • Publish java artifact
  • Vote started.

Release Apache ORC 1.7.5

Got orc::ParseError "bad read in nextBuffer" when using SearchArgument with nested struct

Below is the code to reproduce the issue, it works when removing the empty struct column "col2" or writing small number of rows or changing the value to "rand() % 100"

Am I doing anything wrong?

on version 1.7.2

Code

  WriterOptions options;
  auto stream = writeLocalFile("orc_file_test");
  MemoryPool* pool = getDefaultPool();
  std::unique_ptr<Type> type(Type::buildTypeFromString(
      "struct<col0:struct<col1:int>,col2:struct<col3:int>>"));

  size_t num = 50000;
  std::unique_ptr<Writer> writer = createWriter(*type, stream.get(), options);

  std::unique_ptr<ColumnVectorBatch> batch = writer->createRowBatch(num);
  StructVectorBatch* structBatch =
      dynamic_cast<StructVectorBatch*>(batch.get());
  StructVectorBatch* structBatch2 =
      dynamic_cast<StructVectorBatch*>(structBatch->fields[0]);
  LongVectorBatch* intBatch =
      dynamic_cast<LongVectorBatch*>(structBatch2->fields[0]);

  StructVectorBatch* structBatch3 =
      dynamic_cast<StructVectorBatch*>(structBatch->fields[1]);
  LongVectorBatch* intBatch2 =
      dynamic_cast<LongVectorBatch*>(structBatch3->fields[0]);

  structBatch->numElements = num;
  structBatch2->numElements = num;

  structBatch3->numElements = num;
  structBatch3->hasNulls = true;

  for (int64_t i = 0; i < num; ++i) {
    intBatch->data.data()[i] = rand() % 150000;
    intBatch->notNull[i] = 1;

    intBatch2->notNull[i] = 0;
    intBatch2->hasNulls = true;

    structBatch3->notNull[i] = 0;
  }
  intBatch->hasNulls = false;

  writer->add(*batch);
  writer->close();

  ReaderOptions readOptions;
  readOptions.setMemoryPool(*getDefaultPool());
  auto reader = createReader(readLocalFile("orc_file_test"), readOptions);
  orc::RowReaderOptions rowOptions;
  rowOptions.searchArgument(
      SearchArgumentFactory::newBuilder()
          ->startAnd()
          .equals(2, PredicateDataType::LONG, Literal((int64_t)5))
          .end()
          .build());
  std::unique_ptr<RowReader> rowReader = reader->createRowReader(rowOptions);

  batch = rowReader->createRowBatch(num);
  structBatch = dynamic_cast<StructVectorBatch*>(batch.get());
  structBatch2 = dynamic_cast<StructVectorBatch*>(structBatch->fields[0]);
  intBatch = dynamic_cast<LongVectorBatch*>(structBatch2->fields[0]);

  structBatch3 = dynamic_cast<StructVectorBatch*>(structBatch->fields[1]);

  while (rowReader->next(*batch)) {
    for (size_t i = 0; i < batch->numElements; i++) {
      
    }
  }

Stack trace:

terminate called after throwing an instance of 'orc::ParseError'
  what():  bad read in nextBuffer
*** Aborted at 1666816640 (Unix time, try 'date -d @1666816640') ***
*** Signal 6 (SIGABRT) (0x2035c0002b7ad) received by PID 178093 (pthread TID 0x7ffb12545a80) (linux TID 178093) (maybe from PID 178093, UID 131932) (code: -6), stack trace: ***
    @ 0000000000000000 (unknown)
    @ 000000000009c9d3 __GI___pthread_kill
    @ 00000000000444ec __GI_raise
    @ 000000000002c432 __GI_abort
    @ 00000000000a3fd4 __gnu_cxx::__verbose_terminate_handler()
    @ 00000000000a1b39 __cxxabiv1::__terminate(void (*)())
    @ 00000000000a1ba4 std::terminate()
    @ 00000000000a1e6f __cxa_throw
    @ 0000000001efcd55 __cxa_throw
    @ 00000000075b676c orc::BooleanRleDecoderImpl::seek(orc::PositionProvider&)
                       /home/engshare/third-party2/apache-orc/1.7.2/src/orc/c++/src/ByteRLE.cc:526
    @ 00000000075af711 orc::IntegerColumnReader::seekToRowGroup(std::unordered_map<unsigned long, orc::PositionProvider, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, orc::PositionProvider> > >&)
                       /home/engshare/third-party2/apache-orc/1.7.2/src/orc/c++/src/ColumnReader.cc:120
    @ 00000000075af67f orc::StructColumnReader::seekToRowGroup(std::unordered_map<unsigned long, orc::PositionProvider, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, orc::PositionProvider> > >&)
                       /home/engshare/third-party2/apache-orc/1.7.2/src/orc/c++/src/ColumnReader.cc:965
    @ 00000000075af67f orc::StructColumnReader::seekToRowGroup(std::unordered_map<unsigned long, orc::PositionProvider, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, orc::PositionProvider> > >&)
                       /home/engshare/third-party2/apache-orc/1.7.2/src/orc/c++/src/ColumnReader.cc:965
    @ 0000000007598179 orc::RowReaderImpl::seekToRowGroup(unsigned int)
                       /home/engshare/third-party2/apache-orc/1.7.2/src/orc/c++/src/Reader.cc:440
    @ 000000000759d700 orc::RowReaderImpl::startNextStripe()
                       /home/engshare/third-party2/apache-orc/1.7.2/src/orc/c++/src/Reader.cc:1037
    @ 000000000759daf4 orc::RowReaderImpl::next(orc::ColumnVectorBatch&)
                       /home/engshare/third-party2/apache-orc/1.7.2/src/orc/c++/src/Reader.cc:1055
    @ 0000000002fba9bc main
    @ 000000000002c656 __libc_start_call_main
    @ 000000000002c717 __libc_start_main_alias_2
    @ 0000000002fb2780 _start

ORC-1188: Fix `ORC_PREFER_STATIC_ZLIB`

if (ORC_PREFER_STATIC_ZLIB AND ${ZLIB_STATIC_LIB})
target_link_libraries (orc_zlib INTERFACE ${ZLIB_LIBRARY})
else ()
target_link_libraries (orc_zlib INTERFACE ${ZLIB_STATIC_LIB})
endif ()

If ORC_PREFER_STATIC_ZLIB is false then the static system zlib is used.
If ORC_PREFER_STATIC_ZLIB is true and ZLIB_STATIC_LIB is set then the shared system zlib is used.
This usage contradicts the description for ORC_PREFER_STATIC_ZLIB 'Prefer static zlib library, if available'.

Is it time to enable c++17 to the C++ library?

At the moment, the C++ library supports gcc, clang and msvc compilers at the same time with C++11 enabled. To keep up with the pace of modern c++ standards, we can enable c++17 or even c++20 by default. We should be careful with public headers as they are dependent by various downstream projects. Elsewhere we can enjoy new language features internally in the library.

However, this requires us to lift the minimum versions supported for the compilers listed below:

Thoughts? @dongjoon-hyun @williamhyun @guiyanakuang @stiga-huang @coderex2522

Release Apache ORC 1.8.2

$ ./run-all.sh apache branch-1.8
...
Test start: Sun Jan  8 19:00:36 PST 2023
End: Sun Jan 8 20:26:53 PST 2023

$ tail -n2 logs/*log
==> logs/centos7-test.log <==
Built target test-out
Finished centos7 at Sun Jan  8 20:17:09 PST 2023

==> logs/debian10-test.log <==
Built target test-out
Finished debian10 at Sun Jan  8 20:20:51 PST 2023

==> logs/debian10_jdk=11-test.log <==
Built target test-out
Finished debian10_jdk=11 at Sun Jan  8 20:22:35 PST 2023

==> logs/debian11-test.log <==
Built target test-out
Finished debian11 at Sun Jan  8 20:20:58 PST 2023

==> logs/fedora37-test.log <==
Built target test-out
Finished fedora37 at Sun Jan  8 20:22:26 PST 2023

==> logs/ubuntu18-test.log <==
Built target test-out
Finished ubuntu18 at Sun Jan  8 20:16:50 PST 2023

==> logs/ubuntu20-test.log <==
Built target test-out
Finished ubuntu20 at Sun Jan  8 20:21:25 PST 2023

==> logs/ubuntu20_jdk=11-test.log <==
Built target test-out
Finished ubuntu20_jdk=11 at Sun Jan  8 20:26:53 PST 2023

==> logs/ubuntu20_jdk=11_cc=clang-test.log <==
Built target test-out
Finished ubuntu20_jdk=11_cc=clang at Sun Jan  8 20:18:33 PST 2023

==> logs/ubuntu22-test.log <==
Built target test-out
Finished ubuntu22 at Sun Jan  8 20:23:15 PST 2023

Release Apache ORC 1.7.7

  • GitHub Action Check including Mac11/12 and Windows
  • Docker Test: CentOS7, Debian10/11, Ubuntu18/20/22.
==> logs/centos7-test.log <==
    Start 7: java-bench-spark-test
7/8 Test #7: java-bench-spark-test ............   Passed  116.46 sec
    Start 8: tool-test
8/8 Test #8: tool-test ........................   Passed   30.60 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) = 903.70 sec
Built target test-out
Finished centos7 at Sun Nov 13 20:16:56 PST 2022

==> logs/debian10-test.log <==
    Start 7: java-bench-spark-test
7/8 Test #7: java-bench-spark-test ............   Passed   94.99 sec
    Start 8: tool-test
8/8 Test #8: tool-test ........................   Passed   19.61 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) = 788.09 sec
Built target test-out
Finished debian10 at Sun Nov 13 20:19:40 PST 2022

==> logs/debian10_jdk=11-test.log <==
    Start 7: java-bench-spark-test
7/8 Test #7: java-bench-spark-test ............   Passed   90.23 sec
    Start 8: tool-test
8/8 Test #8: tool-test ........................   Passed   14.42 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) = 793.46 sec
Built target test-out
Finished debian10_jdk=11 at Sun Nov 13 20:20:59 PST 2022

==> logs/debian11-test.log <==
    Start 7: java-bench-spark-test
7/8 Test #7: java-bench-spark-test ............   Passed  115.32 sec
    Start 8: tool-test
8/8 Test #8: tool-test ........................   Passed   26.52 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) = 858.74 sec
Built target test-out
Finished debian11 at Sun Nov 13 20:16:44 PST 2022

==> logs/ubuntu18-test.log <==
    Start 7: java-bench-spark-test
7/8 Test #7: java-bench-spark-test ............   Passed  114.11 sec
    Start 8: tool-test
8/8 Test #8: tool-test ........................   Passed   27.41 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) = 881.91 sec
Built target test-out
Finished ubuntu18 at Sun Nov 13 20:16:48 PST 2022

==> logs/ubuntu20-test.log <==
    Start 7: java-bench-spark-test
7/8 Test #7: java-bench-spark-test ............   Passed   85.41 sec
    Start 8: tool-test
8/8 Test #8: tool-test ........................   Passed   12.85 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) = 680.50 sec
Built target test-out
Finished ubuntu20 at Sun Nov 13 20:21:22 PST 2022

==> logs/ubuntu20_jdk=11-test.log <==
    Start 7: java-bench-spark-test
7/8 Test #7: java-bench-spark-test ............   Passed   82.77 sec
    Start 8: tool-test
8/8 Test #8: tool-test ........................   Passed   11.02 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) = 613.05 sec
Built target test-out
Finished ubuntu20_jdk=11 at Sun Nov 13 20:22:28 PST 2022

==> logs/ubuntu20_jdk=11_cc=clang-test.log <==
    Start 7: java-bench-spark-test
7/8 Test #7: java-bench-spark-test ............   Passed  114.86 sec
    Start 8: tool-test
8/8 Test #8: tool-test ........................   Passed   26.20 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) = 925.71 sec
Built target test-out
Finished ubuntu20_jdk=11_cc=clang at Sun Nov 13 20:17:08 PST 2022

==> logs/ubuntu22-test.log <==
    Start 7: java-bench-spark-test
7/8 Test #7: java-bench-spark-test ............   Passed   96.34 sec
    Start 8: tool-test
8/8 Test #8: tool-test ........................   Passed   20.23 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) = 797.71 sec
Built target test-out
Finished ubuntu22 at Sun Nov 13 20:19:38 PST 2022

AFTER VOTE

Read orc from GCP

Hello, I am trying to read orc data from gcs bucket in java. Is there any way or example to read orc data from GCP directly? Many Thanks!

Huge memory taken for each field when exporting

Hello,
Using arrow adapter, I became aware that the memory (RAM) footprint of the export (exporting an orc file) was very huge for each field. For instance, exporting a table with 10000 fields can take up to 30Go, even if there is only 10 records.
Even for 100 fields, that could take 100Mo+.
The "issue" seems to be coming from here :

1 * 1024 * 1024,

When we create a writer with the "createWriter" (

orc/c++/src/Writer.cc

Lines 681 to 684 in 432a7aa

std::unique_ptr<Writer> createWriter(
const Type& type,
OutputStream* stream,
const WriterOptions& options) {
), a stream (compressor) is created for each field. As we allocate a Buffer of 1 * 1024 *1024 we get as a minimum 1Mo additionnal size taken in memory for each field.

Is there a reason the BufferedOutputStream initial capacity is that high ? I circumvented my problem by lowering it to 1Ko (it didn't change much the performance according to my testing, but it may depend on usecases). Could it be envisaged to put a global variable (or static one) to parametrize this to allow changing this hard coded parameter ?
Thanks

[JAVA] mvn package fails if test compiling was skipped

I would like to run mvn -Dmaven.test.skip=true clean package, but maven-dependency-plugin complains Unused declared dependencies for some libraries used by the test code, which breaks the compilation. Please check the attached logs.

I'm not a java expert, but I'm guessing the cause of the problem is the misuse of analyze-only in the package phase. According to the documentation, the analyze-only goal is meant to be used during the test-compile phase. In our case, the test class was not compiled, so the dependency analyzer treated some libraries as unused. I don't know the right way to fix it though.

The setting of maven-dependency-plugin:

orc/java/pom.xml

Lines 373 to 388 in 8cf1047

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.1.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>analyze-only</goal>
</goals>
</execution>
</executions>
<configuration>
<failOnWarning>true</failOnWarning>
</configuration>
</plugin>

Logs:

$ cd orc/java

$ mvn -Dmaven.test.skip=true clean package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Apache ORC                                                         [pom]
[INFO] ORC Shims                                                          [jar]
[INFO] ORC Core                                                           [jar]
[INFO] ORC MapReduce                                                      [jar]
[INFO] ORC Tools                                                          [jar]
[INFO] ORC Examples                                                       [jar]
[INFO]
[INFO] -------------------------< org.apache.orc:orc >-------------------------
[INFO] Building Apache ORC 1.9.0-SNAPSHOT                                 [1/6]
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:3.2.0:clean (default-clean) @ orc ---
[INFO] Deleting /Users/x/Documents/playground/orc/java/target
[INFO]
[INFO] --- maven-enforcer-plugin:3.1.0:enforce (enforce-maven-version) @ orc ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.1.0:enforce (enforce-java-version) @ orc ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.1.0:enforce (enforce-maven) @ orc ---
[INFO]
[INFO] --- maven-remote-resources-plugin:1.7.0:process (process-resource-bundles) @ orc ---
[INFO] Preparing remote bundle org.apache:apache-jar-resource-bundle:1.4
[INFO] Copying 3 resources from 1 bundle.
[INFO]
[INFO] --- maven-antrun-plugin:3.1.0:run (setup-test-dirs) @ orc ---
[INFO] Executing tasks
[INFO]     [mkdir] Created dir: /Users/x/Documents/playground/orc/java/target/testing-tmp
[INFO] Executed tasks
[INFO]
[INFO] --- maven-site-plugin:3.12.0:attach-descriptor (attach-descriptor) @ orc ---
[INFO] No site descriptor found: nothing to attach.
[INFO]
[INFO] --- maven-source-plugin:3.2.1:jar-no-fork (create-source-jar) @ orc ---
[INFO]
[INFO] --- maven-source-plugin:3.2.1:test-jar-no-fork (create-source-jar) @ orc ---
[INFO]
[INFO] --- reproducible-build-maven-plugin:0.15:strip-jar (default) @ orc ---
[INFO]
[INFO] ----------------------< org.apache.orc:orc-shims >----------------------
[INFO] Building ORC Shims 1.9.0-SNAPSHOT                                  [2/6]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:3.2.0:clean (default-clean) @ orc-shims ---
[INFO] Deleting /Users/x/Documents/playground/orc/java/shims/target
[INFO]
[INFO] --- maven-enforcer-plugin:3.1.0:enforce (enforce-maven-version) @ orc-shims ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.1.0:enforce (enforce-java-version) @ orc-shims ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.1.0:enforce (enforce-maven) @ orc-shims ---
[INFO]
[INFO] --- build-helper-maven-plugin:3.3.0:add-source (add-source) @ orc-shims ---
[INFO] Source directory: /Users/x/Documents/playground/orc/java/shims/target/generated-sources added.
[INFO]
[INFO] --- maven-remote-resources-plugin:1.7.0:process (process-resource-bundles) @ orc-shims ---
[INFO] Preparing remote bundle org.apache:apache-jar-resource-bundle:1.4
[INFO] Copying 3 resources from 1 bundle.
[INFO]
[INFO] --- maven-resources-plugin:3.2.0:resources (default-resources) @ orc-shims ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Using 'UTF-8' encoding to copy filtered properties files.
[INFO] skip non existing resourceDirectory /Users/x/Documents/playground/orc/java/shims/src/main/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- maven-compiler-plugin:3.10.1:compile (default-compile) @ orc-shims ---
[INFO] Compiling 13 source files to /Users/x/Documents/playground/orc/java/shims/target/classes
[INFO]
[INFO] --- maven-resources-plugin:3.2.0:testResources (default-testResources) @ orc-shims ---
[INFO] Not copying test resources
[INFO]
[INFO] --- maven-antrun-plugin:3.1.0:run (setup-test-dirs) @ orc-shims ---
[INFO] Executing tasks
[INFO]     [mkdir] Created dir: /Users/x/Documents/playground/orc/java/shims/target/testing-tmp
[INFO] Executed tasks
[INFO]
[INFO] --- maven-compiler-plugin:3.10.1:testCompile (default-testCompile) @ orc-shims ---
[INFO] Not compiling test sources
[INFO]
[INFO] --- maven-surefire-plugin:3.0.0-M5:test (default-test) @ orc-shims ---
[INFO] Tests are skipped.
[INFO]
[INFO] --- maven-jar-plugin:3.3.0:jar (default-jar) @ orc-shims ---
[INFO] Building jar: /Users/x/Documents/playground/orc/java/shims/target/orc-shims-1.9.0-SNAPSHOT.jar
[INFO]
[INFO] --- maven-site-plugin:3.12.0:attach-descriptor (attach-descriptor) @ orc-shims ---
[INFO] Skipping because packaging 'jar' is not pom.
[INFO]
[INFO] --- maven-source-plugin:3.2.1:jar-no-fork (create-source-jar) @ orc-shims ---
[INFO] Building jar: /Users/x/Documents/playground/orc/java/shims/target/orc-shims-1.9.0-SNAPSHOT-sources.jar
[INFO]
[INFO] --- maven-source-plugin:3.2.1:test-jar-no-fork (create-source-jar) @ orc-shims ---
[INFO] Building jar: /Users/x/Documents/playground/orc/java/shims/target/orc-shims-1.9.0-SNAPSHOT-test-sources.jar
[INFO]
[INFO] --- reproducible-build-maven-plugin:0.15:strip-jar (default) @ orc-shims ---
[INFO] Stripping /Users/x/Documents/playground/orc/java/shims/target/orc-shims-1.9.0-SNAPSHOT-test-sources.jar
[INFO] Stripping /Users/x/Documents/playground/orc/java/shims/target/orc-shims-1.9.0-SNAPSHOT-sources.jar
[INFO] Stripping /Users/x/Documents/playground/orc/java/shims/target/orc-shims-1.9.0-SNAPSHOT.jar
[INFO]
[INFO] --- maven-dependency-plugin:3.1.2:analyze-only (default) @ orc-shims ---
[WARNING] Unused declared dependencies found:
[WARNING]    org.junit.jupiter:junit-jupiter-api:jar:5.9.0:test
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache ORC 1.9.0-SNAPSHOT:
[INFO]
[INFO] Apache ORC ......................................... SUCCESS [  4.108 s]
[INFO] ORC Shims .......................................... FAILURE [  4.817 s]
[INFO] ORC Core ........................................... SKIPPED
[INFO] ORC MapReduce ...................................... SKIPPED
[INFO] ORC Tools .......................................... SKIPPED
[INFO] ORC Examples ....................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  9.189 s
[INFO] Finished at: 2022-10-13T15:07:06+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.1.2:analyze-only (default) on project orc-shims: Dependency problems found -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :orc-shims

Release Apache ORC 1.7.4

Please vote on releasing the following candidate as Apache ORC version
1.7.4.

[ ] +1 Release this package as Apache ORC 1.7.4
[ ] -1 Do not release this package because ...

TAG:
https://github.com/apache/orc/releases/tag/v1.7.4-rc0

RELEASE FILES:
https://dist.apache.org/repos/dist/dev/orc/v1.7.4-rc0

STAGING REPOSITORY:
https://repository.apache.org/content/repositories/orgapacheorc-1056

LIST OF ISSUES:
https://issues.apache.org/jira/projects/ORC/versions/12351349
https://github.com/apache/orc/milestone/7?closed=1

This vote will be open for 72 hours.

Regards,
William

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.