Git Product home page Git Product logo

Comments (12)

okkez avatar okkez commented on August 24, 2024

Could you try #84 ?

Is this easy to reproduce?
I could not reproduce this issue in my local environment.

from fluent-logger-node.

vgoloviznin avatar vgoloviznin commented on August 24, 2024

@okkez yeah, it's not reproducable locally for me as well, happens from time to time on production...

I did a quick check of the patch, but I don't think it will work properly, when you go through _pendingPackets and try to write everything inside, upon error you will add the packet again into array, so there will be multiple packets. I also don't see where the array is flushed

from fluent-logger-node.

okkez avatar okkez commented on August 24, 2024

Thank you for checking patch.
I will take time for this issue in next week.

Could you tell me following points?:

  • how many logs per second?
  • log size
  • and more about your production

from fluent-logger-node.

okkez avatar okkez commented on August 24, 2024

@vgoloviznin

I also don't see where the array is flushed

Flush _pendingPackets here

from fluent-logger-node.

vgoloviznin avatar vgoloviznin commented on August 24, 2024

@okkez working on answering the questions of yours!

As for flushing, I don't see that items are removed after\during the foreach

from fluent-logger-node.

shuttie avatar shuttie commented on August 24, 2024

how many logs per second?

~100 per second

log size

Quite heavy, ~1-2kb each

and more about your production

We observed this issue at least 3 times within the following scenario:

  1. Fluentd has ES as a back-end store.
  2. ES due to capacity-throughtput reasons dies with OOM.
  3. Fluentd starts buffering log events in RAM, waiting for ES to come back.
  4. Fluentd RAM buffer is exhausted, so it starts emitting a ton of exceptions in log (https://gist.github.com/shuttie/fa467d80425887a4e81987dafa6c0271)
  5. And after that fluent-logger-node starts pooping with errors as described earlier, breaking our prod.

from fluent-logger-node.

okkez avatar okkez commented on August 24, 2024

Thank you for describing scenario! @shuttie

from fluent-logger-node.

okkez avatar okkez commented on August 24, 2024

Solution (improve Fluentd robustness):

from fluent-logger-node.

mcuelenaere avatar mcuelenaere commented on August 24, 2024

I'm also seeing this stacktrace, when the FluentD server is unavailable. Shouldn't the code at https://github.com/fluent/fluent-logger-node/blob/master/lib/sender.js#L360 check if this._socket is set before trying to dereference it?

Console output:

Fluentd will reconnect after 60 seconds
Fluentd error { Error: write EPIPE
     at _errnoException (util.js:1024:11)
     at WriteWrap.afterWrite (net.js:867:14) code: 'EPIPE', errno: 'EPIPE', syscall: 'write' }
TypeError: Cannot read property 'write' of null
    at FluentSender._doWrite (/code/node_modules/fluent-logger/lib/sender.js:360:16)
    at FluentSender._doFlushSendQueue (/code/node_modules/fluent-logger/lib/sender.js:334:10)
    at process.nextTick (/code/node_modules/fluent-logger/lib/sender.js:390:14)
    at _combinedTickCallback (internal/process/next_tick.js:131:7)
    at process._tickDomainCallback (internal/process/next_tick.js:218:9)
[nodemon] app crashed - waiting for file changes before starting...

from fluent-logger-node.

fujimotos avatar fujimotos commented on August 24, 2024

I can confirm the issue reported by @mcuelenaere in HEAD. Stopping the
fluentd server under a heavy load will result in an uncaught exception
in the client.

For now, fluentd-logger-node consumes the message queue as follows:

1: _connect()
2: _flushSendQueue()    // Check if the socket is available
3: _doFlushSendQueue()  // Take the first message from the queue
4: _doWrite()           // Send it.
5: _doFlushSendQueue()  // Take the second message from the queue.
6: _doWrite()           // Send it.
7: ...                  // Repeat until the queue is empty...

The problem is that, while the socket might be closed (asynchronously)
after the first write, _doFlushSendQueue and _doWrite naively assume
that the socket is always available. So if the socket gets closed
somewhere after the step 4 above, the logger client will just crash.

The pull request #90 is my attempts to fix this issue by adding a bunch of
checks on the related part.

from fluent-logger-node.

okkez avatar okkez commented on August 24, 2024

@mcuelenaere Could you try the PR #90?

from fluent-logger-node.

okkez avatar okkez commented on August 24, 2024

Merged #90 and released v2.6.2 .

from fluent-logger-node.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.