Git Product home page Git Product logo

Comments (6)

liangrui1988 avatar liangrui1988 commented on July 4, 2024 1

Mine is a private cluster deployment, on disk, will this matter?Does it affect the application's ability to read the file?

from orc.

dongjoon-hyun avatar dongjoon-hyun commented on July 4, 2024

Hi, @liangrui1988 . Please refer ORC spec in our official website.

Here is the formula we use. Padding exists simply to match the underlying file system's block size historically. If you are using S3, your program will not read that part at all. So, there is no impact in the modern cloud infra.

public static long getTotalPaddingSize(Reader reader) throws IOException {
long paddedBytes = 0;
List<StripeInformation> stripes = reader.getStripes();
for (int i = 1; i < stripes.size(); i++) {
long prevStripeOffset = stripes.get(i - 1).getOffset();
long prevStripeLen = stripes.get(i - 1).getLength();
paddedBytes += stripes.get(i).getOffset() - (prevStripeOffset + prevStripeLen);
}
return paddedBytes;
}

from orc.

liangrui1988 avatar liangrui1988 commented on July 4, 2024

Does reading and writing different versions of ORC matter?For example, 1.5.12 to write files, 1.5.5 to read files?thank you

from orc.

dongjoon-hyun avatar dongjoon-hyun commented on July 4, 2024

Sorry but why don't you test your actual private cluster? It's really up to you in unknown cases. We have no recommendations for private cluster deployments because we don't know what you are using and talking about it, @liangrui1988 .

from orc.

liangrui1988 avatar liangrui1988 commented on July 4, 2024

This is caused by the ORC parameter problem, not the cluster problem. The writer reduces the stripe of the ORC and restores it to 256MB.

from orc.

dongjoon-hyun avatar dongjoon-hyun commented on July 4, 2024

Could you provide some reproducible example with the latest Apache ORC 1.7.x, @liangrui1988 ?
FYI, Apache ORC 1.5.x is EOL and 1.6.x will reach EOL soon.

from orc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.