I had been using the lz4 commit 623b5a2 in our production environment since January 2019 to compress messages written to Kafka with the Sarama Kafka client library. Performance was very good, both in terms of CPU and memory usage.
I upgraded our lz4 dependency to commit 057d66e (tag v2.2.4
) and saw dramatically worse performance in the HTTP response times for our service that receives beacon data and writes to Kafka:
Running a debug trace on the old and new versions of the application showed that the newer application was performing a whole lot of garbage collection:
So, 20% of the program execution time was spent performing garbage collection. Garbage collection didn't run even once during the 5 second trace I captured for the older app.
I ran a heap dump of the new application and identified the top two nodes:
→ go tool pprof heap.out
File: app
Build ID: 1532f41779e0d517df7a17dd48010b6e0f2f8061
Type: inuse_space
Time: Aug 29, 2019 at 6:01pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 2
Showing nodes accounting for 17.59MB, 83.18% of 21.15MB total
Showing top 2 nodes out of 68
flat flat% sum% cum cum%
16MB 75.65% 75.65% 16MB 75.65% github.com/pierrec/lz4.(*Writer).writeHeader
1.59MB 7.53% 83.18% 1.59MB 7.53% github.com/pierrec/lz4.NewWriter
Over 75% of the heap operations were attributes to the lz4 writeHeader
function.
Benchmarks
I wrote a benchmark to see what the performance of the Write/Close/Reset functions would be when reusing a writer instance:
func BenchmarkWrite(b *testing.B) {
writer := lz4.NewWriter(nil)
src, err := ioutil.ReadFile("testdata/gettysburg.txt")
if err != nil {
b.Fatal(err)
}
for n := 0; n < b.N; n++ {
var buf bytes.Buffer
writer.Reset(&buf)
if _, err := writer.Write(src); err != nil {
b.Fatal(err)
}
if err := writer.Close(); err != nil {
b.Fatal(err)
}
}
}
goos: darwin
goarch: amd64
pkg: github.com/pierrec/lz4
BenchmarkWrite-4 200000 6366 ns/op 1564 B/op 3 allocs/op
PASS
ok github.com/pierrec/lz4 1.356s
Success: Benchmarks passed.
goos: darwin
goarch: amd64
pkg: github.com/pierrec/lz4
BenchmarkWrite-4 1000 1282000 ns/op 8390668 B/op 4 allocs/op
PASS
ok github.com/pierrec/lz4 1.456s
Success: Benchmarks passed.
That's a 200x slowdown, and many orders of magnitude increase in the number of bytes allocated per operation.
I narrowed down the commit where this performance issue began: 6749706
This commit guarantees that the buffer for compressed data will be reallocated on every invocation of writeHeader
. The z.zdata
slice will always have a capacity that's less than 2x the z.Header.BlockMaxSize
value, leading to the z.zdata
slice being reassigned on every call. This is extremely wasteful and unnecessary. I'm still trying to understand why this change was even introduced.
Printf Debugging
I instrumented this code with some logging to show the logic slice capacities with each code revision:
Pass 1
Checking whether to allocate zdata memory, cap(z.data)=0 cap(z.zdata)=0 n=8388608
Allocating zdata memory, cap(z.data)=0 cap(z.zdata)=0 n=8388608
bSize: 4194304
before: z.data len=0 cap=0
before: z.zdata len=8388608 cap=8388608
after: z.data len=4194304 cap=4194304
after: z.zdata len=4194304 cap=8388608
Pass 2
Checking whether to allocate zdata memory, cap(z.data)=4194304 cap(z.zdata)=8388608 n=8388608
bSize: 4194304
before: z.data len=0 cap=4194304
before: z.zdata len=0 cap=8388608
after: z.data len=4194304 cap=4194304
after: z.zdata len=4194304 cap=8388608
Pass 1
Checking whether to allocate zdata memory, cap(z.data)=0 cap(z.zdata)=0 n=8388608
Allocating zdata memory, cap(z.data)=0 cap(z.zdata)=0 n=8388608
bSize: 4194304
before: z.data len=0 cap=0
before: z.zdata len=8388608 cap=8388608
after: z.data len=4194304 cap=8388608
after: z.zdata len=4194304 cap=4194304
Pass 2
Checking whether to allocate zdata memory, cap(z.data)=8388608 cap(z.zdata)=4194304 n=8388608
Allocating zdata memory, cap(z.data)=8388608 cap(z.zdata)=4194304 n=8388608
bSize: 4194304
before: z.data len=0 cap=8388608
before: z.zdata len=8388608 cap=8388608
after: z.data len=4194304 cap=8388608
after: z.zdata len=4194304 cap=4194304