Comments (31)
I have synched with R&D team and we have plan to use logstash with gelf docker logging driver as interim solution (long term is kafka driver moby/moby#18604).
So I will stick with gelf driver for now.
Hope you or other contributor will find time to fix this production critical bug.
Otherwise people will go the sam route I did.
from fluent-logger-golang.
Can you create minimum code and environment to reproduce it?
Using docker containers for reproduction is too heavy for both of identifying core problem and fixing it.
from fluent-logger-golang.
Can not disagree. I will try and update you.
from fluent-logger-golang.
While I am looking to easier method to reproduce panic let us discuss possible solutions.
It seems we need to check net object is not nil before calling method. Is it not main reason of panic ?
from fluent-logger-golang.
@nikolaiderzhak your point looks correct. IMO it's particularly problematic under the situation with many communication errors.
Options are:
- add a check for
f.conn
existence just beforeWrite
- check and fix exclusive control for
f.conn
especially forsend
andreconnect
The former will work well, but it makes situation a bit more complex (What should we do when f.conn
is missing?).
from fluent-logger-golang.
The former will work well, but it makes situation a bit more complex (What should we do when f.conn is missing?).
Something safe. For me as DevOps engineer goal is make sure docker daemon will not die :)
I am trying staging setup to reproduce it now. Will try to emulate slow aggregator responses to trigger that chain. So we could try some fixes.
If you propose one I cloud quickly test if it works.
from fluent-logger-golang.
I'm sorry that I don't have enough time to work with this right now... but will merge (and try to merge it into Docker) ASAP when reproduction steps and patches are proposed.
from fluent-logger-golang.
Thanks !
I was not able to reproduce it locally, including docker containers with chatty service and lower buffer limits. So next I will try is to replicate production setup with smaller nodes in AWS. We need it to be done anyway for testing purposes.
Question to you. I see f.conn.Write() is wrapped to if f.conn == nil already (since 2013) .
How it happened we got this exception then ? Sorry - not familiar enough with code.
@ypjin @rdong Can you please check it too ?
from fluent-logger-golang.
@nikolaiderzhak checking f.conn
must be done in exclusive process (in other words, in locking)
from fluent-logger-golang.
I see. It can be accessed before it initialised....
from fluent-logger-golang.
Got critical things on plate. Did not create pre-prod to reproduce it yet. Seems as will start working on his next week. Will update you.
from fluent-logger-golang.
Any updates on this?
from fluent-logger-golang.
Nothing yet. I couldn't create tests or scripts to reproduce it.
@rcsavage do you have it?
from fluent-logger-golang.
@tagomoris we have reproduced . We will provide the data soon.
from fluent-logger-golang.
Great to hear that! Then I'll fix it.
from fluent-logger-golang.
That is good news. How soon for a fix?
from fluent-logger-golang.
This is a show-stopper for our fluentd adoption. Any word here on progress?
from fluent-logger-golang.
Please give me any code to show how to reproduce it...
from fluent-logger-golang.
I managed to reproduce it since we have been hitting this more and more recently. I am not able to reproduce it on my machine but I can make it crash every time on an AWS instance running Amazon Linux and Docker 1.11.2. The time it takes to crash is variable though. My test setup is under Docker however since I am not that good in Go(and ruby) but I guess it will be easy enough for someone on the project to replicate the setup outside of Docker.
fluent.conf
<source>
type forward
port 24224
bind 0.0.0.0
</source>
<match docker.*>
type down
buffer_type memory
buffer_chunk_limit 5k
buffer_queue_limit 32
</match>
out_down.rb plugin
module Fluent
class DownOutput < BufferedOutput
Fluent::Plugin.register_output('down', self)
unless method_defined?(:log)
define_method(:log) { $log }
end
def initialize
super
end
def configure(conf)
# override default buffer_chunk_limit so it overflows very fast.
conf['buffer_chunk_limit'] ||= '10k'
super
end
def start
super
end
def shutdown
super
end
def format(tag, time, record)
[tag, time, record].to_msgpack
end
def write(chunk)
count = 0
chunk.msgpack_each { |tag, time, record|
count += 1
}
log.info "Processed #{count} entries in chunk."
end
end
end
I spin one container running Fluentd and another container with this command:
docker run -d --log-driver fluentd --log-opt fluentd-async-connect=true --name logger alpine:3.4 sh -c 'while true; do echo {\"time\":\"2016-09-13T14:49:32.082Z\", \"mylog\":\"This is my log!\"}; done'
I tried reproducing on CentOS 7 with Docker 1.11.2 as well but didn't have luck. I tried CentOS because installing Docker 1.12 seems impossible on Amazon Linux and I wanted to compare both version.
Update: I also managed to crash docker 1.11.2 on CentOS 7.
Update#2: I also managed to crash Docker 1.12.
I found a workaround for people hitting the problem that may acceptable. In fluentd-0.12.26, a config option named buffer_queue_full_action has been introduced. Setting it to drop_oldest_chunk avoid the problem for us so far but with the obvious consequence of dropping some logs. I let you decide if this is acceptable.
from fluent-logger-golang.
@tagomoris I gave a try to writing a program in Go to reproduce and succeeded. I used the same fluent.conf as previously. Here's the Go source code:
package main
import (
"time"
"github.com/fluent/fluent-logger-golang/fluent"
)
func main() {
f, err := fluent.New(fluent.Config{})
if err != nil {
panic(err)
}
data := map[string]string{
"mylog": "This is my json log!",
}
tm := time.Now()
for {
if err := f.PostWithTime("docker.logs", tm, data); err != nil {
panic(err)
}
}
}
It generated the same callstack as the crash within docker:
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x4e785c]
goroutine 1 [running]:
panic(0x5a95c0, 0xc82000a110)
/usr/lib/go-1.6/src/runtime/panic.go:481 +0x3e6
net.(*TCPConn).Write(0x0, 0xc82037a000, 0x34e0, 0x78000, 0x0, 0x0, 0x0)
<autogenerated>:69 +0x2c
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).send(0xc82007e000, 0x0, 0x0)
/home/eric/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:287 +0x1ec
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).PostRawData(0xc82007e000, 0xc8202ddc70, 0x2f, 0x47)
/home/eric/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:172 +0x189
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).EncodeAndPostData(0xc82007e000, 0x5e7640, 0xb, 0xecf6cd5f0, 0xc80c58991b, 0x6d8bc0, 0x567f00, 0xc8202deed0, 0x0, 0x0)
/home/eric/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:163 +0x257
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).PostWithTime(0xc82007e000, 0x5e7640, 0xb, 0xecf6cd5f0, 0xc80c58991b, 0x6d8bc0, 0x568000, 0xc820082060, 0x0, 0x0)
/home/eric/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:155 +0x998
main.main()
/home/eric/go/src/github.com/EricFortin/fluentdCrash/logger.go:20 +0x1ea
from fluent-logger-golang.
I'm trying just same code & configuration on my Mac, but it's not reproduced yet (in 10 minutes).
Is it Linux specific problem?
@EricFortin How long time you need to reproduce it?
from fluent-logger-golang.
@tagomoris So far, I have reproduced it only in VM on AWS. I run Linux in a VM on my local machine and weren't able to reproduce it either. Since we only experienced it in AWS, where we run our stuff, I developed the test case there. It usually takes between 2-10 minutes to crash.
from fluent-logger-golang.
@EricFortin Thank you for great report! I'll try it on AWS too and some patches next week.
from fluent-logger-golang.
@EricFortin Finally, I made it! (w/ Amazon Linux 2016.03.3)
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x4e959c]
goroutine 1 [running]:
net.(*TCPConn).Write(0x0, 0xc820500000, 0x242b, 0x60000, 0x0, 0x0, 0x0)
<autogenerated>:75 +0x2c
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).send(0xc82007e000, 0x0, 0x0)
/home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:287 +0x1ec
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).PostRawData(0xc82007e000, 0xc8203efea0, 0x2f, 0x47)
/home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:172 +0x189
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).EncodeAndPostData(0xc82007e000, 0x5ea680, 0xb, 0xecf72d1cc, 0xc8336109e0, 0x6dea00, 0x568880, 0xc8203e7b30, 0x0, 0x0)
/home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:163 +0x257
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).PostWithTime(0xc82007e000, 0x5ea680, 0xb, 0xecf72d1cc, 0xc8336109e0, 0x6dea00, 0x568980, 0xc8200106f0, 0x0, 0x0)
/home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:155 +0x996
main.main()
/home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/test/main.go:19 +0x1e9
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/usr/lib/golang/src/runtime/asm_amd64.s:1721 +0x1
goroutine 32359 [semacquire]:
sync.runtime_Semacquire(0xc82007e0a0)
/usr/lib/golang/src/runtime/sema.go:43 +0x26
sync.(*Mutex).Lock(0xc82007e09c)
/usr/lib/golang/src/sync/mutex.go:82 +0x1c4
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).reconnect.func1(0xc82007e000)
/home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:255 +0x6a
created by github.com/fluent/fluent-logger-golang/fluent.(*Fluent).reconnect
/home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:267 +0x35
from fluent-logger-golang.
👍
from fluent-logger-golang.
I updated the code not to panic but retries to send message, with showing err content.
package main
import (
"fmt"
"time"
"github.com/fluent/fluent-logger-golang/fluent"
)
func main() {
f, err := fluent.New(fluent.Config{})
if err != nil {
panic(err)
}
data := map[string]string{
"mylog": "This is my json log!",
}
for {
tm := time.Now()
if err := f.PostWithTime("docker.logs", tm, data); err != nil {
fmt.Printf("error: %s\n", err);
continue;
}
}
}
With the change of #30, I got:
- "write tcp 127.0.0.1:36746->127.0.0.1:24224: write: connection reset by peer" at first
- "fluent#send: can't send logs, client is reconnecting" for a while
- "Buffer full" at last
With this behavior, fluent-logger-golang (and fluentd logging driver of docker) doesn't stop Docket itself.
from fluent-logger-golang.
Nice !
from fluent-logger-golang.
I created a pull-request for docker to solve this problem!
moby/moby#26740
from fluent-logger-golang.
That was merged.
Look forward to Docker 1.13.0 release!
from fluent-logger-golang.
@EricFortin can you give your configuration to use buffer_queue_full_action.
Currently i am using below configuration in fluentd setting
<match docker.**>
type elasticsearch
logstash_format true
include_tag_key true
tag_key image
flush_interval 10s
do you mean adding below paramter ?
buffer_queue_full_action drop_oldest_chunk
from fluent-logger-golang.
@hridyeshpant Yes you add that to your elasticsearch plugin config. It is available because this plugin inherits BufferedOutput. I am using flume on my side but since it also inherits from BufferedOutput, I can use this settings. Here's my config:
<match docker.*>
type flume
host FLUME_HOST
port FLUME_PORT
remove_prefix docker
buffer_queue_full_action drop_oldest_chunk
</match>
from fluent-logger-golang.
Related Issues (20)
- tagomoris will step down from the maintainer HOT 4
- Disordered Ack messages will be missed HOT 3
- Calls to Write() after calling Close() in sync mode reopen the connection HOT 5
- Migrate testing to Go's testing package HOT 1
- Release v1.8.0 HOT 5
- Feature: periodically reconnect to fluentd-address HOT 1
- Add a new option TLSCertPath
- Option to skip loggertag and timestamp
- Support for fluentd `shared_key` and `user_auth`
- Panic if TLS fluentd server cannot be connected
- Support to take TLS certificate in case of tls enabled
- Fluent-bit Supported
- RequestAck usage
- Sent payloads not valid JSON, wrapped in function call
- Send JSON encoded payloads without re-marshalling them?
- Add a separate `BufferLimitBytes` option to limit memory usage
- Unknown writing error: MaxRetry shows zero HOT 3
- Update README to describe Config parameters clearly HOT 1
- How to forward output plugin records HOT 3
- Passing struct pointers to `Post` method HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fluent-logger-golang.