Comments (6)
DataFile
and Metrics
are the classes that contain metrics and are good candidates for where truncation could happen. I think we would want truncation to be configurable using settings in TableProperties
. Metrics are scraped from Parquet metadata in ParquetMetrics
, which is called by ParquetWriter
.
You might want to explore passing a truncate length option to ParquetWriter
. The writer would pass it to ParquetMetrics
to truncate values right away. The setting would come from the table when creating a writer. For that, I think you'd update the write builder in Parquet
.
from iceberg.
@aokolnychyi, #78 reminded me about this item as well.
from iceberg.
@rdblue is it an issue good for n00b? if yes, I am interested to take this one :)
from iceberg.
@feng-tao, I think this could be a good first issue. Let us know if you need any help or context.
from iceberg.
@rdblue , I am still reading the code base. It would be great if you could guide me a little bit on the context or the related code path. Thanks a lot :)
from iceberg.
thanks @rdblue , will take a look
from iceberg.
Related Issues (20)
- branch schema affected by main table schema HOT 1
- com.esotericsoftware.kryo.KryoException: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.nio.ByteBuffer HOT 4
- Spec inconsistency: partition_spec_id column in ManifestList vs. partition_specs in metadata.json
- Spec is ambiguous w.r.t. optional fields in field_summary
- ValidationException: Missing required files to delete HOT 4
- Bug: Flink data loss after failed to refresh table HOT 1
- [Docs, Flink] Iceberg Flink docs do not include support for enhanced DDL support added in #7628
- Confusion about latest_schema_id in metadata_log_entries
- Is it possible to add a set of existing partitioned parquet files to the Iceberg table via the Java Standalone API HOT 1
- `JdbcCatalog` `add-view-support` should be evaluated individually from `initializeCatalogTables` flag
- Issue with 'writeTo' HOT 3
- truncate partitioning underflows, leads to wrong results
- The truncate partition transform is underspecified
- Why does FlinkSink writes position deletes in append-mode if identifier fields are specified? HOT 3
- Spark read failed when migrate hive orc table with `timestamp` column HOT 1
- S3FileIO does not support Iceberg Cross-Region API Calls to Amazon S3 buckets HOT 2
- start-timestamp not utilized in create_changelog_view
- Inconsistency in deleting manifest and data files
- Cannot find constructor for interface org.apache.parquet.column.page.PageWriteStore? HOT 1
- Writing Equality Deletes using Iceberg Java API HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iceberg.