Git Product home page Git Product logo

carbonapi's Introduction

Carbonapi: high-performance Graphite front-end

Build Status Go Report Card

Carbonapi is a Go-based Graphite frontend. It provides two binaries, carbonapi and carbonzipper, that unify responses from multiple Graphite backends and provide math and graphing functions.

This project is run in production at Booking.com. We are in the process of documenting its installation and setup, but can answer any questions that interested persons have.

CarbonAPI supports a significant subset of graphite functions; see COMPATIBILITY. In our testing it has shown to be 5x-10x faster than requesting data from graphite-web.

Build

To build both the carbonapi and carbonzipper binaries, run:

make

To build the binaries with debug symbols, run:

make debug

Note: build process might require pkg-config to be installed:

Mac OS X

find pkg-config version on you want to install and run script below with replaced "VERSION_TO_INSTALL":

PKG_CONFIG_VERSION="VERSION_TO_INSTALL" bash -c 'curl https://pkgconfig.freedesktop.org/releases/pkg-config-$PKG_CONFIG_VERSION.tar.gz -o pkgconfig.tgz'
mkdir pkg-config && tar -zxf pkgconfig.tgz -C pkg-config --strip-components 1 && cd pkg-config

There is a circular dependency between pkg-config and glib. To break it, pkg-config includes a version of glib, which is enough to break the dependency cycle and compile it with --with-internal-glib key:

env LDFLAGS="-framework CoreFoundation -framework Carbon" ./configure --with-internal-glib && make install

We do not provide packages for install at this time. Contact us if you're interested in those.

Run

Run the full stack carbonapi -> zipper -> go-carbon with:

docker-compose up

You can feed in sample data with:

echo "test.test 5 `date +%s`" | nc -c localhost 2003

and get it back with:

curl 'http://localhost:8081/render?target=test.test&format=json&from=-10m'

Requirements

We officially support go 1.19.

OSX Build Notes

Some additional steps may be needed to build carbonapi with cairo rendering on MacOSX.

Install cairo:

brew install Caskroom/cask/xquartz

brew install cairo

Xquartz is a required dependency for cairo.

Backend support

go-carbon

The main supported backend is go-carbon Graphite store.

Acknowledgement and history

This program was originally developed for Booking.com. With approval from Booking.com, the code was generalised and published as Open Source on GitHub, for which the author would like to express his gratitude.

This is Booking.com's fork of go-graphite/carbonapi. That project's current performance characteristics are not sufficient for our production needs, and we decided it had moved too far ahead for us to be able to improve them effectively. We thus reverted back to versions 0.9.2 of carbonapi and 0.74 of carbonzipper, and are moving more slowly in the same direction as the original project.

License

This code is licensed under the BSD-2 license.

carbonapi's People

Contributors

arodland avatar auguzun avatar avereha avatar azhiltsov avatar bom-d-van avatar borovskyav avatar cashlo avatar civil avatar cldellow avatar deniszh avatar dgryski avatar dieterbe avatar emadolsky avatar gksinghjsr avatar grlvrl avatar grzkv avatar gysinghb avatar ibuclaw avatar jaderdias avatar kamaev avatar kostty avatar kozzykoder avatar ksurent avatar lomik avatar nnuss avatar nvbn avatar oriordan avatar paskal avatar spacefreak86 avatar szibis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

carbonapi's Issues

Unlucky requests can get needlessly dropped

Our requests limiter works like a semaphore now. The requests are not processed in a FIFO queue but are picked randomly from an unordered pool. The requests have a timeout. This means, that some requests can get unlucky and will not be picked up for longer than needed, and will be timed out.

Example

Say, we have 10 requests to be processed, each takes 1 second, but every second we get a new request in. The number of requests to be processed remains constant i.e. 10. Requests are processed one-at-a-time.

Intuitively, the waiting time for a request should be 9 seconds.

What we have now

Requests are picked randomly. In this case, the probability that a request is picked is 0.9 at each processing cycle. The chance that a request will be in the queue for >30 seconds is 0.9^29 ~ 5%. This is much longer than needed, and chances are that the request will be timed out and dropped.

What would be nice to have

Requests go into a FIFO queue. This way, each request waits for 9 seconds, then it is processed. The waiting time is more predictable and fair.

find cache is not used when sendGlobAsIs is true

The logic of making find requests before making render requests has evolved over time. However, the caching logic has not kept pace with the same. Even when sendGlobAsIs is true, we make find request, in order to determine whether to send the query one by one, or in one go, making a find request in the process. Caching the result would help avoid this find request.

Refactor logging to make it more controlled

Currently, there are the following problems with logging that need to be fixed:

  • loggers are global variables
  • there are many loggers and it is hard to know how many
  • new loggers can be spawned randomly
  • it is hard to control log format
  • it is hard to flush and sync loggers

We need to make the following changes to resolve this:

  • have one logger
  • eliminate global logger variables
  • use logger levels correctly (as suggested by @emadolsky here)
  • #359
  • provide good logger settings
    • unify logging format
    • setup throttling
    • set proper flushing and buffering policy

Have clear errors metric for carbonapi

The majority of errors are suppressed in carbonapi, remaining invisible. There is no clear metric of error rate.

We are limited in what HTTP codes we can return by compatibility concerns. So, it's best to expose this as a metric and think about reponse codes later.

Errors characteristics:

  • Errors include context cancellations and timeouts. Relates to #64
  • Bad requests are not errors
  • Not found metrics are not errors

Refactor sendGlobAsIs ?

We have 3 parameters:
sendGlobsAsIs: true|false
alwaysSendGlobsAsIs: true|false
maxBatchSize: int

Their logic is convoluted
sendGlobsAsIs in true is working together with maxBatchSize
carbonapi is sending a find request and depending on amount of metrics returned from stores acts differently:

  • if amount of metrics is lower than maxBatchSize it sends request as is with globs
  • if amount of metrics is higher it uses list of metrics returned from find and query them one by one

sendGlobsAsIs in false is always sending find and ignoring maxBatchSize

alwaysSendGlobsAsIs I suppose is neglecting sendGlobsAsIs settings and always send a render request without 'find'

Proposing deprecate sendGlobsAsIs and alwaysSendGlobsAsIs in favor of
resolveGlobs: true|false in conjunction with maxBatchSize

resolveGlobs: false -> send render query as it is
resolveGlobs: true -> send find query and if:
count of metrics < maxBatchSize - send as it is
count of metrics > maxBatchSize - group them in batches of maxBatchSize and send them in batches

Mirror prometheus metrics into graphite

We currently mirror expvars into graphite. We need to mirror the Prometheus metrics as well.

After this is done, we could clean up some of the expvars and graphite metrics, since the Prometheus ones will replace them.

First guess where to start is here

Use a path cache for find requests?

staring at logs in go-carbon I see a lot of find ERROR not related to the metrics stored on this host:

{"level":"ERROR","timestamp":"2019-02-27T10:46:33.115+0100","logger":"access","message":"find failed","handler":"find","url":"/metrics/find/?format=protobuf&query=metric.name.","peer":"10.1.8.2:60779","carbonapi_uuid":"01134b54-cc10-4bb5-8451-f297a42a299a","query":["metric.name."],"format":"protobuf","runtime_seconds":0.000046948,"reason":"Not Found","error":"Not Found","http_code":404}

We should be more clever about a scope of servers which receive our find requests and limit the list of target hosts by utilizing the path cache instead of fan-out.

metric render fails if new metric was not indexed by trigram

Create a new metric, make sure that it on the disk, but indexer of go-carbon did not index it yet
Send a render request.
It will fail because we are sending find in advance and it won't find it until it indexed by go-carbon.
wait ~5 minutes and send a render request again - it will return a data.

This is controversial to #73 where metric does not actually exist on the given host out of a many.
In this case metrics in the question was stored on a single host.

Dockerize carbonapi setup

Make a Docker config that would include the following:

  1. carbonapi
  2. carbonzipper
  3. go-carbon
  4. Some data pushed to storage to play with

This will:

  • allow other people to try carbonapi easily, and contribute as well
  • will make system testing possible, inside CI as well
  • will make manual testing easier

Unknown content type 'application/octet-stream'

On last commit get error "Unknown content type 'application/octet-stream'", when receive responce from backend.
From commit 25bf30e all work fine.
This is strange, becouse no changes in pkg/backend/net/net.go and only "application/x-protobuf" and "application/protobuf" allowed.
For qiuck fix I add "application/octet-stream" to Render and Find function in pkg/backend/net/net.go

Stacktrace:

{"level":"ERROR","timestamp":"2019-01-16T15:06:30.313+0300","logger":"render","message":"find error","carbonapi_uuid":"714f6d17-1a35-4a14-8aaa-02d0d7468250","username":"devops","metric":"test","error":"All backend requests failed: 1 backends: Unknown content type 'application/octet-stream'",
"errorVerbose":"Unknown content type 'application/octet-stream'
github.com/bookingcom/carbonapi/pkg/backend/net.Backend.Find
/go/src/github.com/bookingcom/carbonapi/pkg/backend/net/net.go:462
github.com/bookingcom/carbonapi/pkg/backend/net.(*Backend).Find
:1
github.com/bookingcom/carbonapi/pkg/backend.Finds.func1
/go/src/github.com/bookingcom/carbonapi/pkg/backend/rpc.go:134
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:23611 backends\nAll backend requests failed"}

moving* functions shifting results to the left query boundary

When query executed on a whisper archive crossing it left boundary the results of
movingMedian movingSum movingMin movingMax movingMax movingAverage (might be more)
are shifted to the left query boundary:
Lets say you have a whisper archive with a single archive storing 2 days of secondly data:

{
"10.10.10.10:8080": {
"name": "secondly.views_sum",
"aggregationMethod": "Sum",
"maxRetention": 172800,
"retentions": [
{
"secondsPerPoint": 1,
"numberOfPoints": 172800
}
]
}

you querying it on the range from=now-3d until=now
you getting 2 days of data, padded with nulls from the LEFT
applying moving* function on top of it will shift result to the left, padding it with nulls from the RIGHT
image

This issue does not exist in ewma function for instance

Setup full-system benchmark that allows to see performance impact of a code change

This should be a follow-up to #70

We can use a docker-compose setup for benchmarking if long-term benchmark comparison is left-out. Docker is not suited to track benchmark performance over the long term with a track record. The environment cannot be always reproduced.

If we will be able to make the environment reproducible, this harness can be used for long-term benchmark book-keeping and analysis.

Improve telemetry and monitoring

This includes several points:

  • Expose more metrics via Prometheus
  • Add separate request timing stats for cached and uncached responses
  • Make graphite push optional
  • Add saturation and load metrics
  • General visibility improvement

Separate request statistics for different endpoints

Gather request statistics separately for requests on different endpoints:

  • histograms separately for /render and /find
  • remove request histogram for all requests
  • separate counting for cached responses, see #66

do this on

  • carbonapi
  • zipper

Memory consumption drops drastically after service restart

When either carbonapi or carbonzipper services are restarted the RAM consumption drops to be approx x5 times less.

We need to investigate why this happens because this may be a sign of memory leak. Another explanation would be caching.

Separate metric requests counting served from cache

The requests served from cache have drastically different characteristics than ones served with remote requests. It makes sense to keep separate statistics for them.

Hence, we need three histograms:

  • all
  • cache
  • no cache

Don't send a render to backend which return on find 'no metrics found'

from go-carbon logs:

{
  "level": "ERROR",
  "timestamp": "2019-02-05T14:27:17.576+0100",
  "logger": "access",
  "message": "find failed",
  "handler": "find",
  "url": "/metrics/find/?format=protobuf&query=bla.%2A.clusters.%2A.foo.active",
  "peer": "10.10.10.10:47549",
  "carbonapi_uuid": "7652d624-c031-4384-8bc2-c96039beb7be",
  "query": [
    "bla.*.clusters.*.foo.active"
  ],
  "format": "protobuf",
  "runtime_seconds": 5.7953e-05,
  "reason": "Not Found",
  "error": "Not Found",
  "http_code": 404
}

and immediately after that

{
  "level": "ERROR",
  "timestamp": "2019-02-05T14:27:18.903+0100",
  "logger": "access",
  "message": "fetch failed",
  "handler": "render",
  "url": "/render/?format=protobuf&from=1549344432&target=bla.%2A.clusters.%2A.foo.active&until=1549373232",
  "peer": "10.10.10.10:50115",
  "carbonapi_uuid": "7652d624-c031-4384-8bc2-c96039beb7be",
  "format": "carbonapi_v2_pb",
  "targets": [
    "bla.*.clusters.*.foo.active"
  ],
  "runtime_seconds": 5.4372e-05,
  "reason": "no metrics found",
  "http_code": 404
}

Add graceful SIGINT handling

Currently, we don't properly gracefully handle the termination signals.

The graceful shutdown should include

  • the flushes
  • stop receiving new requests
  • finish hanging requests

Actualize config examples, remove unused, add missing options

Rename https://github.com/bookingcom/carbonapi/tree/master/config/carbonzipper.conf to yaml

is this being used?

concurrencyLimitPerServer: 1025

why this in upstreams section?

buckets: 10

What is this? Some comments needed with explanation.

start: 0.05
bucketsNum: 25
bucketSize: 2

keepalive every 30s? really?

keepAliveInterval: "30s"

What do we cache exactly?

# If not zero, enabled cache for find requests

Dead:

carbonsearch:

what is this?
https://github.com/bookingcom/carbonapi/blob/133e554781d1dfd46c1675479eef362dca99ac90/config/graphiteWeb.yaml

Do we use it? Where is the config parameter for this file?
https://github.com/bookingcom/carbonapi/blob/133e554781d1dfd46c1675479eef362dca99ac90/config/graphTemplates.yaml

Request timeouts are broken

Global and per-backend timeouts don't work in general case. They are only applied when the request waits in "queue", but not in the general case.

Refactor code to decrease cyclomatic complexity to be <20

A lot of functions in the code are way to complex and long. Here's the gometalinter log:

expr/functions/cairo/png/cairo.go:1289::warning: cyclomatic complexity 65 of function setupTwoYAxes() is high (> 10) (gocyclo)
expr/functions/cairo/png/cairo.go:1025::warning: cyclomatic complexity 65 of function drawGraph() is high (> 10) (gocyclo)
expr/functions/asPercent/function.go:34::warning: cyclomatic complexity 55 of function (*asPercent).Do() is high (> 10) (gocyclo)
app/carbonapi/http_handlers.go:181::warning: cyclomatic complexity 42 of function (*App).renderHandler() is high (> 10) (gocyclo)
expr/functions/cairo/png/cairo.go:2227::warning: cyclomatic complexity 40 of function drawLines() is high (> 10) (gocyclo)
expr/functions/cairo/png/cairo.go:1616::warning: cyclomatic complexity 38 of function setupYAxis() is high (> 10) (gocyclo)
expr/functions/cairo/png/cairo.go:673::warning: cyclomatic complexity 34 of function EvalExprGraph() is high (> 10) (gocyclo)
expr/functions/tukey/function.go:33::warning: cyclomatic complexity 29 of function (*tukey).Do() is high (> 10) (gocyclo)
app/carbonapi/app.go:282::warning: cyclomatic complexity 29 of function setUpConfig() is high (> 10) (gocyclo)
expr/functions/nonNegativeDerivative/function.go:31::warning: cyclomatic complexity 27 of function (*nonNegativeDerivative).Do() is high (> 10) (gocyclo)
expr/functions/perSecond/function.go:32::warning: cyclomatic complexity 27 of function (*perSecond).Do() is high (> 10) (gocyclo)
expr/functions/pearsonClosest/function.go:33::warning: cyclomatic complexity 25 of function (*pearsonClosest).Do() is high (> 10) (gocyclo)
pkg/parser/parser.go:433::warning: cyclomatic complexity 23 of function parseArgList() is high (> 10) (gocyclo)
expr/functions/graphiteWeb/function.go:78::warning: cyclomatic complexity 21 of function New() is high (> 10) (gocyclo)
expr/functions/moving/function.go:32::warning: cyclomatic complexity 21 of function (*moving).Do() is high (> 10) (gocyclo)
expr/functions/summarize/function.go:33::warning: cyclomatic complexity 21 of function (*summarize).Do() is high (> 10) (gocyclo)
pkg/parser/parser.go:119::warning: cyclomatic complexity 20 of function (*expr).Metrics() is high (> 10) (gocyclo)
date/date.go:49::warning: cyclomatic complexity 20 of function DateParamToEpoch() is high (> 10) (gocyclo)
expr/functions/cairo/png/cairo.go:2025::warning: cyclomatic complexity 20 of function drawGridLines() is high (> 10) (gocyclo)
pkg/parser/parser.go:415::warning: cyclomatic complexity 20 of function IsNameChar() is high (> 10) (gocyclo)

This makes code convoluted. Makes sense to refactor to reach at least complexity <20.

Add full-stack system tests

Add automatic system test that:

  1. Spins-up the setup from #58
  2. Feeds some mock data on the storage
  3. Retrieves data via the go-carbonโ†’zipperโ†’carbonapi chain
  4. Verifies correctness

Fix config

1. Do .toml configs work?

When trying to load a .toml config, I get an error:

2019-01-13T13:50:55.626+0100    FATAL   main    Failed to parse config file     {"error": "yaml: unmarshal errors:\n  line 5: cannot unmarshal !!str `concure...` into cfg.preAPI"}

Also, I see only yaml annotations everywhere in config code.

If .toml configs don't work, we should consider removing them.

2. Current config does not work out-of-the box

Probably needs to be tweaked.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.