Git Product home page Git Product logo

Comments (31)

nikolai-derzhak-distillery avatar nikolai-derzhak-distillery commented on July 28, 2024 1

I have synched with R&D team and we have plan to use logstash with gelf docker logging driver as interim solution (long term is kafka driver moby/moby#18604).

So I will stick with gelf driver for now.

Hope you or other contributor will find time to fix this production critical bug.

Otherwise people will go the sam route I did.

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

Can you create minimum code and environment to reproduce it?
Using docker containers for reproduction is too heavy for both of identifying core problem and fixing it.

from fluent-logger-golang.

nikolaiderzhak avatar nikolaiderzhak commented on July 28, 2024

Can not disagree. I will try and update you.

from fluent-logger-golang.

nikolaiderzhak avatar nikolaiderzhak commented on July 28, 2024

While I am looking to easier method to reproduce panic let us discuss possible solutions.

It seems we need to check net object is not nil before calling method. Is it not main reason of panic ?

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

@nikolaiderzhak your point looks correct. IMO it's particularly problematic under the situation with many communication errors.
Options are:

  • add a check for f.conn existence just before Write
  • check and fix exclusive control for f.conn especially for send and reconnect

The former will work well, but it makes situation a bit more complex (What should we do when f.conn is missing?).

from fluent-logger-golang.

nikolaiderzhak avatar nikolaiderzhak commented on July 28, 2024

The former will work well, but it makes situation a bit more complex (What should we do when f.conn is missing?).

Something safe. For me as DevOps engineer goal is make sure docker daemon will not die :)

I am trying staging setup to reproduce it now. Will try to emulate slow aggregator responses to trigger that chain. So we could try some fixes.

If you propose one I cloud quickly test if it works.

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

I'm sorry that I don't have enough time to work with this right now... but will merge (and try to merge it into Docker) ASAP when reproduction steps and patches are proposed.

from fluent-logger-golang.

nikolaiderzhak avatar nikolaiderzhak commented on July 28, 2024

Thanks !

I was not able to reproduce it locally, including docker containers with chatty service and lower buffer limits. So next I will try is to replicate production setup with smaller nodes in AWS. We need it to be done anyway for testing purposes.

Question to you. I see f.conn.Write() is wrapped to if f.conn == nil already (since 2013) .
How it happened we got this exception then ? Sorry - not familiar enough with code.

@ypjin @rdong Can you please check it too ?

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

@nikolaiderzhak checking f.conn must be done in exclusive process (in other words, in locking)

from fluent-logger-golang.

nikolaiderzhak avatar nikolaiderzhak commented on July 28, 2024

I see. It can be accessed before it initialised....

from fluent-logger-golang.

nikolai-derzhak-distillery avatar nikolai-derzhak-distillery commented on July 28, 2024

Got critical things on plate. Did not create pre-prod to reproduce it yet. Seems as will start working on his next week. Will update you.

from fluent-logger-golang.

rorysavage77 avatar rorysavage77 commented on July 28, 2024

Any updates on this?

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

Nothing yet. I couldn't create tests or scripts to reproduce it.
@rcsavage do you have it?

from fluent-logger-golang.

toddlers avatar toddlers commented on July 28, 2024

@tagomoris we have reproduced . We will provide the data soon.

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

Great to hear that! Then I'll fix it.

from fluent-logger-golang.

rorysavage77 avatar rorysavage77 commented on July 28, 2024

That is good news. How soon for a fix?

from fluent-logger-golang.

spanktar avatar spanktar commented on July 28, 2024

This is a show-stopper for our fluentd adoption. Any word here on progress?

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

Please give me any code to show how to reproduce it...

from fluent-logger-golang.

EricFortin avatar EricFortin commented on July 28, 2024

@tagomoris

I managed to reproduce it since we have been hitting this more and more recently. I am not able to reproduce it on my machine but I can make it crash every time on an AWS instance running Amazon Linux and Docker 1.11.2. The time it takes to crash is variable though. My test setup is under Docker however since I am not that good in Go(and ruby) but I guess it will be easy enough for someone on the project to replicate the setup outside of Docker.

fluent.conf

<source>
  type forward
  port 24224
  bind 0.0.0.0
</source>

<match docker.*>
  type down

  buffer_type memory
  buffer_chunk_limit 5k
  buffer_queue_limit 32
</match>

out_down.rb plugin

module Fluent

class DownOutput < BufferedOutput
  Fluent::Plugin.register_output('down', self)

  unless method_defined?(:log)
    define_method(:log) { $log }
  end

  def initialize
    super
  end

  def configure(conf)
    # override default buffer_chunk_limit so it overflows very fast.
    conf['buffer_chunk_limit'] ||= '10k'
    super
  end

  def start
    super
  end

  def shutdown
    super
  end

  def format(tag, time, record)
    [tag, time, record].to_msgpack
  end

  def write(chunk)
    count = 0
    chunk.msgpack_each { |tag, time, record|

        count += 1
    }
    log.info "Processed #{count} entries in chunk."
  end
end

end

I spin one container running Fluentd and another container with this command:

docker run -d --log-driver fluentd --log-opt fluentd-async-connect=true --name logger alpine:3.4 sh -c 'while true; do echo {\"time\":\"2016-09-13T14:49:32.082Z\", \"mylog\":\"This is my log!\"}; done'

I tried reproducing on CentOS 7 with Docker 1.11.2 as well but didn't have luck. I tried CentOS because installing Docker 1.12 seems impossible on Amazon Linux and I wanted to compare both version.

Update: I also managed to crash docker 1.11.2 on CentOS 7.
Update#2: I also managed to crash Docker 1.12.

I found a workaround for people hitting the problem that may acceptable. In fluentd-0.12.26, a config option named buffer_queue_full_action has been introduced. Setting it to drop_oldest_chunk avoid the problem for us so far but with the obvious consequence of dropping some logs. I let you decide if this is acceptable.

from fluent-logger-golang.

EricFortin avatar EricFortin commented on July 28, 2024

@tagomoris I gave a try to writing a program in Go to reproduce and succeeded. I used the same fluent.conf as previously. Here's the Go source code:

package main

import (
    "time"
    "github.com/fluent/fluent-logger-golang/fluent"
)

func main() {
    f, err := fluent.New(fluent.Config{})
    if err != nil {
        panic(err)
    }

    data := map[string]string{
        "mylog": "This is my json log!",
    }
    tm := time.Now()
    for {
        if err := f.PostWithTime("docker.logs", tm, data); err != nil {
            panic(err)
        }
    }
}

It generated the same callstack as the crash within docker:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x4e785c]

goroutine 1 [running]:
panic(0x5a95c0, 0xc82000a110)
    /usr/lib/go-1.6/src/runtime/panic.go:481 +0x3e6
net.(*TCPConn).Write(0x0, 0xc82037a000, 0x34e0, 0x78000, 0x0, 0x0, 0x0)
    <autogenerated>:69 +0x2c
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).send(0xc82007e000, 0x0, 0x0)
    /home/eric/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:287 +0x1ec
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).PostRawData(0xc82007e000, 0xc8202ddc70, 0x2f, 0x47)
    /home/eric/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:172 +0x189
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).EncodeAndPostData(0xc82007e000, 0x5e7640, 0xb, 0xecf6cd5f0, 0xc80c58991b, 0x6d8bc0, 0x567f00, 0xc8202deed0, 0x0, 0x0)
    /home/eric/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:163 +0x257
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).PostWithTime(0xc82007e000, 0x5e7640, 0xb, 0xecf6cd5f0, 0xc80c58991b, 0x6d8bc0, 0x568000, 0xc820082060, 0x0, 0x0)
    /home/eric/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:155 +0x998
main.main()
    /home/eric/go/src/github.com/EricFortin/fluentdCrash/logger.go:20 +0x1ea

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

I'm trying just same code & configuration on my Mac, but it's not reproduced yet (in 10 minutes).
Is it Linux specific problem?
@EricFortin How long time you need to reproduce it?

from fluent-logger-golang.

EricFortin avatar EricFortin commented on July 28, 2024

@tagomoris So far, I have reproduced it only in VM on AWS. I run Linux in a VM on my local machine and weren't able to reproduce it either. Since we only experienced it in AWS, where we run our stuff, I developed the test case there. It usually takes between 2-10 minutes to crash.

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

@EricFortin Thank you for great report! I'll try it on AWS too and some patches next week.

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

@EricFortin Finally, I made it! (w/ Amazon Linux 2016.03.3)

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x4e959c]

goroutine 1 [running]:
net.(*TCPConn).Write(0x0, 0xc820500000, 0x242b, 0x60000, 0x0, 0x0, 0x0)
    <autogenerated>:75 +0x2c
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).send(0xc82007e000, 0x0, 0x0)
    /home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:287 +0x1ec
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).PostRawData(0xc82007e000, 0xc8203efea0, 0x2f, 0x47)
    /home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:172 +0x189
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).EncodeAndPostData(0xc82007e000, 0x5ea680, 0xb, 0xecf72d1cc, 0xc8336109e0, 0x6dea00, 0x568880, 0xc8203e7b30, 0x0, 0x0)
    /home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:163 +0x257
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).PostWithTime(0xc82007e000, 0x5ea680, 0xb, 0xecf72d1cc, 0xc8336109e0, 0x6dea00, 0x568980, 0xc8200106f0, 0x0, 0x0)
    /home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:155 +0x996
main.main()
    /home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/test/main.go:19 +0x1e9

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1721 +0x1

goroutine 32359 [semacquire]:
sync.runtime_Semacquire(0xc82007e0a0)
    /usr/lib/golang/src/runtime/sema.go:43 +0x26
sync.(*Mutex).Lock(0xc82007e09c)
    /usr/lib/golang/src/sync/mutex.go:82 +0x1c4
github.com/fluent/fluent-logger-golang/fluent.(*Fluent).reconnect.func1(0xc82007e000)
    /home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:255 +0x6a
created by github.com/fluent/fluent-logger-golang/fluent.(*Fluent).reconnect
    /home/ec2-user/go/src/github.com/fluent/fluent-logger-golang/fluent/fluent.go:267 +0x35

from fluent-logger-golang.

nikolai-derzhak-distillery avatar nikolai-derzhak-distillery commented on July 28, 2024

👍

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

I updated the code not to panic but retries to send message, with showing err content.

package main

import (
    "fmt"
    "time"
    "github.com/fluent/fluent-logger-golang/fluent"
)

func main() {
    f, err := fluent.New(fluent.Config{})
    if err != nil {
        panic(err)
    }

    data := map[string]string{
        "mylog": "This is my json log!",
    }
    for {
        tm := time.Now()
        if err := f.PostWithTime("docker.logs", tm, data); err != nil {
            fmt.Printf("error: %s\n", err);
            continue;
        }
    }
}

With the change of #30, I got:

  1. "write tcp 127.0.0.1:36746->127.0.0.1:24224: write: connection reset by peer" at first
  2. "fluent#send: can't send logs, client is reconnecting" for a while
  3. "Buffer full" at last

With this behavior, fluent-logger-golang (and fluentd logging driver of docker) doesn't stop Docket itself.

from fluent-logger-golang.

EricFortin avatar EricFortin commented on July 28, 2024

Nice !

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

I created a pull-request for docker to solve this problem!
moby/moby#26740

from fluent-logger-golang.

tagomoris avatar tagomoris commented on July 28, 2024

That was merged.
Look forward to Docker 1.13.0 release!

from fluent-logger-golang.

hridyeshpant avatar hridyeshpant commented on July 28, 2024

@EricFortin can you give your configuration to use buffer_queue_full_action.
Currently i am using below configuration in fluentd setting
<match docker.**>
type elasticsearch
logstash_format true
include_tag_key true
tag_key image
flush_interval 10s

do you mean adding below paramter ?
buffer_queue_full_action drop_oldest_chunk

from fluent-logger-golang.

EricFortin avatar EricFortin commented on July 28, 2024

@hridyeshpant Yes you add that to your elasticsearch plugin config. It is available because this plugin inherits BufferedOutput. I am using flume on my side but since it also inherits from BufferedOutput, I can use this settings. Here's my config:

<match docker.*>
  type flume
  host FLUME_HOST
  port FLUME_PORT
  remove_prefix docker

  buffer_queue_full_action drop_oldest_chunk
</match>

from fluent-logger-golang.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.