Git Product home page Git Product logo

Comments (5)

karimra avatar karimra commented on September 13, 2024

The bottleneck could be on the output side.

Do you see all the metrics being written to influxDB?
You can tweak batch-size: 1000 (set to a lower value) and flush-timer: 10s (set to lower value) to speed up the flushing of metrics to the DB and avoid storing them in memory.

from gnmic.

ebinachan avatar ebinachan commented on September 13, 2024

Hi @karimra,

I've just tested: batch-size: 200, flush-timer: 2s; batch-size: 100, flush-timer: 1s. Didn't seem to make a difference. RAM increase jumps happen every minute, in sync with all the IXRs sending metrics to gNMIc.
When the VM RAM gets full, some of the metrics aren't sent to InfluxDB anymore. Influx data for a specific node might be missing "/state/port/statistics/out-octets", while ".../in-octets" are present.

EC

from gnmic.

karimra avatar karimra commented on September 13, 2024

Would you be able to test without processors, just to see if the behavior is different?

from gnmic.

ebinachan avatar ebinachan commented on September 13, 2024

Yes! I've disabled the event-processors for the influxdb output and RAM usage stays low.
This is the template I've been using, output1 was already disabled:
https://github.com/ebinachan/telemetry/blob/master/sros/gnmic.yaml

Edit: RAM usage is low even with source_trim processor. The moment I enable either convert_int, convert_float, or both, I'm seeing the gradual RAM increase and loss of metrics.
It seems to be related to these messages:

Mar 31 13:35:02 linux bash[24818]: 2023/03/31 13:35:02 influxdb2client E! Write error: 400 Bad Request: partial write: field type conflict: input field "/state/port/statistics/in-octets" on measurement "PS_SROS_PORTSTATS" is type string, already exists as type integer dropped=1
Mar 31 13:35:57 linux bash[24818]: 2023/03/31 13:35:57 influxdb2client E! Write error: 400 Bad Request: partial write: field type conflict: input field "/state/port/statistics/in-octets" on measurement "PS_SROS_PORTSTATS" is type string, already exists as type integer dropped=1
Mar 31 13:36:57 linux bash[24818]: 2023/03/31 13:36:57 influxdb2client E! Write error: 400 Bad Request: partial write: field type conflict: input field "/state/port/statistics/in-octets" on measurement "PS_SROS_PORTSTATS" is type string, already exists as type integer dropped=1
Mar 31 13:37:59 linux bash[24818]: 2023/03/31 13:37:59 influxdb2client E! Write error: 400 Bad Request: partial write: field type conflict: input field "/state/port/statistics/in-octets" on measurement "PS_SROS_PORTSTATS" is type string, already exists as type integer dropped=1

Then, I changed the config to match on exact field names as in:
https://github.com/ebinachan/telemetry/blob/master/sros/20230331_gnmic.yaml
Dropped all SROS telemetry measurements from influx, and started gNMIc again. I'm still seeing the logs above, and a gradual loss in metrics along with the RAM increase.

I should have pasted the "in-octets" write error initially, but I honestly thought it was a one-off unrelated error. Now I'm thinking this might cause some sort of soft-lockup and then RAM usage starts inflating.

Second Edit:
It could be that one of the SROS nodes was reporting a high enough "in-octets" value that was too large for the "int" converter.
Moved the in-octets/out-octets fields to the float converter, dropped InfluxDB measurements before process start, and the write errors are gone, RAM usage is low for now.

EC

from gnmic.

karimra avatar karimra commented on September 13, 2024

Thanks for the update! Glad the ram issues are gone.

from gnmic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.