Comments (14)
@dongjoon-hyun @wgtmac @LouisClt I will follow up on this issues(ORC-1280) and implement a much smarter memory management.
from orc.
Hello, it seems there were commits referencing this issue. Is this issue now fixed ?
@LouisClt Thanks for your follow-up.
We have implemented a block-based buffer called BlockBuffer
(by @coderex2522) and used it to replace the output buffer in the CompressionStream
. It can decrease the memory footprint to some extent.
IMO, the next step is to use it to replace the input buffer of the CompressionStream
which has the size of compressionBlockSize
per stream.
from orc.
I will work on it.
from orc.
cc @wgtmac , @stiga-huang , @coderex2522
from orc.
@LouisClt To support the zero-copy mechanism, class BufferedOutputStream will have an internal data buffer. And the default capacity of the internal data buffer is 1MB. This default capacity size should be able to be modified, but here's a hint that if the buffer capacity is set too small, it may cause the buffer to expand and trigger memcpy function frequently.
from orc.
We may replace the DataBuffer by a new Buffer implementation with a much smarter memory management to automatically grow and shrink its size according to actual usage. This management can happen on the column basis.
from orc.
Thanks everyone for your answers. I understand the possible performances issues linked with lowering too much the size of the buffer (on my testing it was OK in my case though).
I think the solution given by @wgtmac would be fine for me, and better than passing by global variables, if it is feasible.
from orc.
I have created a JIRA to track the progress: https://issues.apache.org/jira/browse/ORC-1280
from orc.
Thank you, @coderex2522 .
from orc.
Hello, it seems there were commits referencing this issue. Is this issue now fixed ?
from orc.
Hello, it seems there were commits referencing this issue. Is this issue now fixed ?
@LouisClt Thanks for your follow-up.
We have implemented a block-based buffer called
BlockBuffer
(by @coderex2522) and used it to replace the output buffer in theCompressionStream
. It can decrease the memory footprint to some extent.IMO, the next step is to use it to replace the input buffer of the
CompressionStream
which has the size ofcompressionBlockSize
per stream.
To be precise, the rawInputBuffer
of every CompressionStream is fixed to the compression block size which is 1M by default. Writer with many columns will suffer from large memory footprint and nothing can be done to alleviate it.
I have created a JIRA to track it: https://issues.apache.org/jira/browse/ORC-1365
cc @coderex2522
from orc.
Thanks for your reply @wgtmac and the implementation of the BlockBuffer
.
I'll wait for the replacement of the rawInputBuffer
by the BlockBuffer
in every compression stream then. Do you think it will take long ?
from orc.
Hi, @LouisClt . FYI, according to the Apache ORC release cycle, newly developed features will be delivered via v1.9.0 on September 2023 (if they are merged to Apache ORC before.)
from orc.
Understood, and thanks for your answer !
from orc.
Related Issues (20)
- ORC-1562: Bump `com.google.guava:guava` to 33.0.0-jre HOT 1
- one compression block not allow to bigger than 8M HOT 15
- ORC-1576: Upgrade `spark.jackson.version` to 2.15.2 in `bench` module HOT 1
- ORC-1578: Fix `SparkBenchmark` on `sales` data according to SPARK-40918 HOT 1
- ORC-1578: Fix `SparkBenchmark` on `sales` data according to SPARK-40918 HOT 1
- ORC-1578: Fix `SparkBenchmark` on `sales` data according to SPARK-40918 HOT 1
- ORC-1586: Fix IllegalAccessError when SparkBenchmark runs on JDK17 HOT 1
- ORC-1591: Lower log level from INFO to DEBUG in `*ReaderImpl/WriterImpl/PhysicalFsWriter` HOT 1
- ORC-1592: Suppress `KeyProvider` missing log HOT 1
- ORC-1602: [C++] limit compression block size HOT 1
- ORC-1607: Fix `testDoubleNaNAndInfinite` to use `TestFileDump.checkOutput` HOT 1
- 1.9.2: build fails HOT 6
- orc CXX fail to build if libgtest-dev is installed (debian-like systems) HOT 5
- ORC-1616: Upgrade aircompressor to 0.26 HOT 1
- In cpp/java sdk, SearchArgument looks like didn't use the footer and stripe stats. HOT 1
- ORC-1618: Disable building tests for snappy HOT 1
- ORC-1620: Add Apple Silicon Test Coverage HOT 1
- ORC-1621: Switch to `oraclelinux9` from `rocky9` HOT 1
- ORC-1621: Switch to `oraclelinux9` from `rocky9` HOT 1
- What's the meaning of EvaluatedRowGroupCount in ReaderMetrics HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from orc.