Comments (8)
Hi @seaguest,
Looking at the snippet you shared it doesn't look like you are using segmentio/parquet-go.
Using the following :
package main
import (
"os"
"log"
segmentparquet "github.com/segmentio/parquet-go"
)
func write() {
f, err := os.OpenFile("outputs.parquet", os.O_APPEND|os.O_WRONLY|os.O_CREATE|os.O_TRUNC, os.ModePerm)
if err != nil {
log.Println(err)
return
}
defer f.Close()
writer := segmentparquet.NewWriter(f)
type record struct {
Format string `parquet:"format,snappy"`
DataType int32 `parquet:"data_type,snappy"`
Country string `parquet:"country,snappy"`
}
num := 1000
for i := 0; i < num; i++ {
stu := record{
Format: "Test",
DataType: 1,
Country: "IN",
}
writer.Write(stu) // here argument can only be []byte
}
// Closing the writer is necessary to flush buffers and write the file footer.
if err := writer.Close(); err != nil {
log.Println(err)
}
}
func main() {
write()
}
ends up creating a file of 1060 bytes so about 1KB.
from parquet-go.
I tried to use
type record struct {
Format string `parquet:"format,snappy"`
DataType int32 `parquet:"data_type,snappy"`
Country string `parquet:"country,snappy"`
}
but the output file is still very large, about 34M,while with other 2 libraries, they are only 1KB
from parquet-go.
Indeed it works now.
But I am curious why should we put snappy annotation for each field, usually we won't have different compression type for different fields in one struct.
I saw other library has an option like this
pw.CompressionType = parquet2.CompressionCodec_GZIP
why this library doesn't have such an option?
from parquet-go.
usually we won't have different compression type for different fields in one struct.
We're planning to use different compression types for different fields in one struct (tracing data), which is why we thought that choice was a good fit.
from parquet-go.
I'm going to close this - thanks for the issue report and glad you got it working!
from parquet-go.
are you planning to provide an option of compression type for all fields in the future?
If we have hundreds of fields, it would be a disaster to add "snappy" for each, and that is meaningless in case we need only one compression type.
thanks for your quick reply~
from parquet-go.
+1 on this. It would be great to provide an alternative way to pass compression config while initialising an writer.
from parquet-go.
Created #124 as a follow up
from parquet-go.
Related Issues (20)
- corruption of already read byte values on read of subsequent pages HOT 2
- bigdata is slow
- Specifying row group size as in bytes
- io.ReaderAt interface acceptance is dishonest
- parquet-go can read nested objects, but not parquet cli (Parquet/Avro schema mismatch) HOT 1
- A struct where two tags specify the same column name causes records to silently be not written
- Add more examples for reading parquet files HOT 1
- WriteBooleans appears to be broken (panic) HOT 2
- How to pivot columns
- Broken reader - index out of range HOT 4
- Truncated column index for binary columns are incorrect
- Optional *time.Time panics HOT 2
- AWS Athena's `=` not working for string HOT 3
- Library sometimes flips boolean values HOT 6
- Flakey test: "TestOpenFile/testdata/rle_boolean_encoding.parquet" HOT 2
- Trouble creating a modified schema HOT 1
- GenericWriter should write map keys to matching columns HOT 2
- List-type columns should able to write null in parquet file HOT 2
- panic: reflect: call of reflect.Value.Field on zero Value HOT 2
- Panic in page.go on IBM Z HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parquet-go.