Git Product home page Git Product logo

collectd-cloudhealth-memory-aws-plugin's Introduction

collectd CloudHealth memory plugin for AWS

Upload EC2 memory metrics to CloudHealth for better cost optimization!

Scope

We are aware that there is an official agent to upload metrics. However, the agent failed to fit in our environment due to variuos reasons:

  • Runs it's own embedded collectd daemon (besides the one we're already using)
  • Misc. compliance requirements (where each component needs to be validated):
    • Installation process fetches a lot of resources from the outside world via wget (contrasting to a single package from a signed repository)
    • Agent code runs in embedded Ruby

Within AWS, CloudHealth already has knowledge about CPU and network metrics, so we just need to add the memory metrics on our own by using the API.

CloudHealth API Limitation

  • You can only post CPU, memory, and file system metrics.
  • You can only post up to 8 days of historical metrics data.
  • Metrics must have an hourly resolution.
  • An active AWS Instance associated with the metrics must already be present and active in the CloudHealth Platform and not be Chef-managed.
  • Metric retrieval is for individual assets only, that is, for AWS EC2 Instances or file systems of AWS EC2 Instances.
  • The payload can contain a max of 1000 data points. If there are more than 1000 data points, the entire request is rejected with a 422 response.
  • When posting to file systems, the associated instance must be present and active. However, if a file system object does not currently exist, a new one is automatically created and linked to the instance.

Additionally to the information provided by CloudHealth, the API expects the "hourly resolution" to be sliced to the full hour (e.g. 2020-12-04T17:00:00). If not, the API will respond:

Click to expand!
{
  "errors": [],
  "succeeded": 0,
  "failed": 1,
  "datasets": [
    {
      "errors": [],
      "succeeded": 0,
      "failures": [
        {
          "error": "Date/time value '2020-12-04T17:43:35' cannot have a non-zero minute value.",
          "row": [
            "<region>:<aws-account-id>:<instance-id>",
            "2020-12-04T17:43:35",
            37.088733582900176,
            52.81394681853567,
            33.08149301429918
          ]
        }
      ]
    }
  ]
}

Design

  • Runs as plain collectd python plugin which only receives memory.used.percent metrics from a filter chain.
  • Collects the following metrics for a period of 1 hour (see CloudHealth API Limitation):
    • memory:used:percent.avg
    • memory:used:percent.max
    • memory:used:percent.min
  • Stores metrics in memory by default (see cloudhealthmemory.conf for persistence)
  • Uses a background thread for uploading the metrics after 1 hour
    • including retry on connection issues
    • if upload does not work, the next upload cycle will include all missing metrics

Design - can be edited with draw.io

Requirements

  • collectd-core
  • python-requests
  • python-yaml

Installation

  • Install requirements
  • Copy cloudhealthmemory.py into your collectd plugin path (most probably /var/lib/collectd)
  • Copy cloudhealthmemory.conf into your collectd configuration directory (e.g. /etc/collectd/collectd.conf.d/)
  • Make sure the config files are being loaded by collectd
  • Restart collectd daemon

Example:

apt-get install -y collectd-core python libpython2.7 python-yaml python-requests
mkdir -p /etc/collectd/collectd.conf.d
curl -Lo /var/lib/collectd/cloudhealthmemory.py https://raw.githubusercontent.com/root360/collectd-cloudhealth-memory-aws-plugin/master/cloudhealthmemory.py
curl -Lo /etc/collectd/collectd.conf.d/cloudhealthmemory.conf https://raw.githubusercontent.com/root360/collectd-cloudhealth-memory-aws-plugin/master/cloudhealthmemory.conf
grep "collectd.conf.d/\*\.conf" /etc/collectd/collectd.conf || echo 'Include "/etc/collectd/collectd.conf.d/*.conf"' >> /etc/collectd/collectd.conf
systemctl restart collectd

collectd-cloudhealth-memory-aws-plugin's People

Contributors

root360-stefanheitmueller avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.