Git Product home page Git Product logo

go-sundheit's Introduction

go-sundheit

Actions Status CircleCI Coverage Status Go Report Card Godocs Mentioned in Awesome Go

A library built to provide support for defining service health for golang services. It allows you to register async health checks for your dependencies and the service itself, and provides a health endpoint that exposes their status.

What's go-sundheit?

The project is named after the German word Gesundheit which means ‘health’, and it is pronounced /ɡəˈzʊntˌhaɪ̯t/.

Installation

Using go modules:

go get github.com/AppsFlyer/[email protected]

Usage

import (
	"net/http"
	"time"
	"log"

	"github.com/pkg/errors"
	"github.com/AppsFlyer/go-sundheit"

	healthhttp "github.com/AppsFlyer/go-sundheit/http"
	"github.com/AppsFlyer/go-sundheit/checks"
)

func main() {
	// create a new health instance
	h := gosundheit.New()
	
	// define an HTTP dependency check
	httpCheckConf := checks.HTTPCheckConfig{
		CheckName: "httpbin.url.check",
		Timeout:   1 * time.Second,
		// dependency you're checking - use your own URL here...
		// this URL will fail 50% of the times
		URL:       "http://httpbin.org/status/200,300",
	}
	// create the HTTP check for the dependency
	// fail fast when you misconfigured the URL. Don't ignore errors!!!
	httpCheck, err := checks.NewHTTPCheck(httpCheckConf)
	if err != nil {
		fmt.Println(err)
		return // your call...
	}

	// Alternatively panic when creating a check fails
	httpCheck = checks.Must(checks.NewHTTPCheck(httpCheckConf))

	err = h.RegisterCheck(
		httpCheck,
		gosundheit.InitialDelay(time.Second),         // the check will run once after 1 sec
		gosundheit.ExecutionPeriod(10 * time.Second), // the check will be executed every 10 sec
	)
	
	if err != nil {
		fmt.Println("Failed to register check: ", err)
		return // or whatever
	}

	// define more checks...
	
	// register a health endpoint
	http.Handle("/admin/health.json", healthhttp.HandleHealthJSON(h))
	
	// serve HTTP
	log.Fatal(http.ListenAndServe(":8080", nil))
}

Using Option to Configure Health Service

To create a health service, it's simple as calling the following code:

gosundheit.New(options ...Option)

The optional parameters of options allows the user to configure the Health Service by passing configuration functions (implementing Option signature).
All options are marked with the prefix WithX. Available options:

  • WithCheckListeners - enables you to act on check registration, start and completed events
  • WithHealthListeners - enables you to act on changes in the health service results

Built-in Checks

The library comes with a set of built-in checks. Currently implemented checks are as follows:

HTTP built-in check

The HTTP check allows you to trigger an HTTP request to one of your dependencies, and verify the response status, and optionally the content of the response body. Example was given above in the usage section

DNS built-in check(s)

The DNS checks allow you to perform lookup to a given hostname / domain name / CNAME / etc, and validate that it resolves to at least the minimum number of required results.

Creating a host lookup check is easy:

// Schedule a host resolution check for `example.com`, requiring at least one results, and running every 10 sec
h.RegisterCheck(
	checks.NewHostResolveCheck("example.com", 1),
	gosundheit.ExecutionPeriod(10 * time.Second),
)

You may also use the low level checks.NewResolveCheck specifying a custom LookupFunc if you want to to perform other kinds of lookups. For example you may register a reverse DNS lookup check like so:

func ReverseDNLookup(ctx context.Context, addr string) (resolvedCount int, err error) {
	names, err := net.DefaultResolver.LookupAddr(ctx, addr)
	resolvedCount = len(names)
	return
}

//...

h.RegisterCheck(
	checks.NewResolveCheck(ReverseDNLookup, "127.0.0.1", 3),
	gosundheit.ExecutionPeriod(10 * time.Second),
	gosundheit.ExecutionTimeout(1*time.Second)
)

Ping built-in check(s)

The ping checks allow you to verifies that a resource is still alive and reachable. For example, you can use it as a DB ping check (sql.DB implements the Pinger interface):

	db, err := sql.Open(...)
	dbCheck, err := checks.NewPingCheck("db.check", db)
	_ = h.RegisterCheck(&gosundheit.Config{
		Check: dbCheck,
		// ...
	})

You can also use the ping check to test a generic connection like so:

	pinger := checks.NewDialPinger("tcp", "example.com")
	pingCheck, err := checks.NewPingCheck("example.com.reachable", pinger)
	h.RegisterCheck(pingCheck)

The NewDialPinger function supports all the network/address parameters supported by the net.Dial() function(s)

Custom Checks

The library provides 2 means of defining a custom check. The bottom line is that you need an implementation of the Check interface:

// Check is the API for defining health checks.
// A valid check has a non empty Name() and a check (Execute()) function.
type Check interface {
	// Name is the name of the check.
	// Check names must be metric compatible.
	Name() string
	// Execute runs a single time check, and returns an error when the check fails, and an optional details object.
	Execute() (details interface{}, err error)
}

See examples in the following 2 sections below.

Use the CustomCheck struct

The checksCustomCheck struct implements the checks.Check interface, and is the simplest way to implement a check if all you need is to define a check function.

Let's define a check function that fails 50% of the times:

func lotteryCheck() (details interface{}, err error) {
	lottery := rand.Float32()
	details = fmt.Sprintf("lottery=%f", lottery)
	if lottery < 0.5 {
		err = errors.New("Sorry, I failed")
	}
	return
}

Now we register the check to start running right away, and execute once per 2 minutes with a timeout of 5 seconds:

h := gosundheit.New()
...

h.RegisterCheck(
	&checks.CustomCheck{
		CheckName: "lottery.check",
		CheckFunc: lotteryCheck,
	},
	gosundheit.InitialDelay(0),
	gosundheit.ExecutionPeriod(2 * time.Minute), 
	gosundheit.ExecutionTimeout(5 * time.Second)
)

Implement the Check interface

Sometimes you need to define a more elaborate custom check. For example when you need to manage state. For these cases it's best to implement the Check interface yourself.

Let's define a flexible example of the lottery check, that allows you to define a fail probability:

type Lottery struct {
	myname string
	probability float32
}

func (l Lottery) Execute() (details interface{}, err error) {
	lottery := rand.Float32()
	details = fmt.Sprintf("lottery=%f", lottery)
	if lottery < l.probability {
		err = errors.New("Sorry, I failed")
	}
	return
}

func (l Lottery) Name() string {
	return l.myname
}

And register our custom check, scheduling it to run every 30 seconds (after a 1 second initial delay) with a 5 seconds timeout:

h := gosundheit.New()
...

h.RegisterCheck(
	Lottery{myname: "custom.lottery.check", probability:0.3},
	gosundheit.InitialDelay(1*time.Second),
	gosundheit.ExecutionPeriod(30*time.Second),
	gosundheit.ExecutionTimeout(5*time.Second),
)

Custom Checks Notes

  1. If a check take longer than the specified rate period, then next execution will be delayed, but will not be concurrently executed.
  2. Checks must complete within a reasonable time. If a check doesn't complete or gets hung, the next check execution will be delayed. Use proper time outs.
  3. Checks must respect the provided context. Specifically, a check must abort its execution, and return an error, if the context has been cancelled.
  4. A health-check name must be a metric name compatible string (i.e. no funky characters, and spaces allowed - just make it simple like clicks-db-check). See here: https://help.datadoghq.com/hc/en-us/articles/203764705-What-are-valid-metric-names-

Expose Health Endpoint

The library provides an HTTP handler function for serving health stats in JSON format. You can register it using your favorite HTTP implementation like so:

http.Handle("/admin/health.json", healthhttp.HandleHealthJSON(h))

The endpoint can be called like so:

~ $ curl -i http://localhost:8080/admin/health.json
HTTP/1.1 503 Service Unavailable
Content-Type: application/json
Date: Tue, 22 Jan 2019 09:31:46 GMT
Content-Length: 701

{
	"custom.lottery.check": {
		"message": "lottery=0.206583",
		"error": {
			"message": "Sorry, I failed"
		},
		"timestamp": "2019-01-22T11:31:44.632415432+02:00",
		"num_failures": 2,
		"first_failure_time": "2019-01-22T11:31:41.632400256+02:00"
	},
	"lottery.check": {
		"message": "lottery=0.865335",
		"timestamp": "2019-01-22T11:31:44.63244047+02:00",
		"num_failures": 0,
		"first_failure_time": null
	},
	"url.check": {
		"message": "http://httpbin.org/status/200,300",
		"error": {
			"message": "unexpected status code: '300' expected: '200'"
		},
		"timestamp": "2019-01-22T11:31:44.632442937+02:00",
		"num_failures": 4,
		"first_failure_time": "2019-01-22T11:31:38.632485339+02:00"
	}
}

Or for the shorter version:

~ $ curl -i http://localhost:8080/admin/health.json?type=short
HTTP/1.1 503 Service Unavailable
Content-Type: application/json
Date: Tue, 22 Jan 2019 09:40:19 GMT
Content-Length: 105

{
	"custom.lottery.check": "PASS",
	"lottery.check": "PASS",
	"my.check": "FAIL",
	"url.check": "PASS"
}

The short response type is suitable for the consul health checks / LB heath checks.

The response code is 200 when the tests pass, and 503 when they fail.

CheckListener

It is sometimes desired to keep track of checks execution and apply custom logic. For example, you may want to add logging, or external metrics to your checks, or add some trigger some recovery logic when a check fails after 3 consecutive times.

The gosundheit.CheckListener interface allows you to hook this custom logic.

For example, lets add a logging listener to our health repository:

type checkEventsLogger struct{}

func (l checkEventsLogger) OnCheckRegistered(name string, res gosundheit.Result) {
	log.Printf("Check %q registered with initial result: %v\n", name, res)
}

func (l checkEventsLogger) OnCheckStarted(name string) {
	log.Printf("Check %q started...\n", name)
}

func (l checkEventsLogger) OnCheckCompleted(name string, res gosundheit.Result) {
	log.Printf("Check %q completed with result: %v\n", name, res)
}

To register your listener:

h := gosundheit.New(gosundheit.WithCheckListeners(&checkEventsLogger))

Please note that your CheckListener implementation must not block!

HealthListener

It is something desired to track changes in registered checks results. For example, you may want to log the amount of results monitored, or send metrics on these results.

The gosundheit.HealthListener interface allows you to hook this custom logic.

For example, lets add a logging listener:

type healthLogger struct{}

func (l healthLogger) OnResultsUpdated(results map[string]Result) {
	log.Printf("There are %d results, general health is %t\n", len(results), allHealthy(results))
}

To register your listener:

h := gosundheit.New(gosundheit.WithHealthListeners(&checkHealthLogger))

Metrics

The library can expose metrics using a CheckListener. At the moment, OpenCensus is available and exposes the following metrics:

  • health/check_status_by_name - An aggregated health status gauge (0/1 for fail/pass) at the time of sampling. The aggregation uses the following tags:
    • check=allChecks - all checks aggregation
    • check=<check-name> - specific check aggregation
  • health/check_count_by_name_and_status - Aggregated pass/fail counts for checks, with the following tags:
    • check=allChecks - all checks aggregation
    • check=<check-name> - specific check aggregation
    • check-passing=[true|false]
  • health/executeTime - The time it took to execute a checks. Using the following tag:
    • check=<check-name> - specific check aggregation

The views can be registered like so:

import (
	"github.com/AppsFlyer/go-sundheit"
	"github.com/AppsFlyer/go-sundheit/opencensus"
	"go.opencensus.io/stats/view"
)
// This listener can act both as check and health listener for reporting metrics
oc := opencensus.NewMetricsListener()
h := gosundheit.New(gosundheit.WithCheckListeners(oc), gosundheit.WithHealthListeners(oc))
// ...
view.Register(opencensus.DefaultHealthViews...)
// or register individual views. For example:
view.Register(opencensus.ViewCheckExecutionTime, opencensus.ViewCheckStatusByName, ...)

Classification

It is sometimes required to report metrics for different check types (e.g. setup, liveness, readiness). To report metrics using classification tag - it's possible to initialize the OpenCensus listener with classification:

// startup
opencensus.NewMetricsListener(opencensus.WithStartupClassification())
// liveness
opencensus.NewMetricsListener(opencensus.WithLivenessClassification())
// readiness
opencensus.NewMetricsListener(opencensus.WithReadinessClassification())
// custom
opencensus.NewMetricsListener(opencensus.WithClassification("custom"))

go-sundheit's People

Contributors

bivas avatar dcaba avatar dmarkhas avatar dyhex avatar eranharel avatar kirilldanshin avatar moshederri avatar qjcg avatar rantav avatar sagikazarmark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-sundheit's Issues

Allow for one-shot cheks

Nice little package :-)

In my experience, it's fairly common to expose alongside the health check values some static information about the service, typically the version.

Problem is, the only way to expose those information with go-sundheit is to define a check which means having a ticker refreshing the value. A ticker is not a performance issue but it's still a bit silly. It would be nice if go-sundheit allowed to do that in a less hacky way.

Allow hooking on health state change

It would be nice if the library offered a way to trigger a function when the service's health status changed. This could be used for logging, or for stopping/starting some components, when the services becomes unhealthy/healthy.

Can't understand what is wrong in POST request

I used example in README like this:


import (
  "bytes"
  "fmt"
  health "github.com/AppsFlyer/go-sundheit"
  //"io/ioutil"
  "net/http"
  "time"
  "log"

  healthhttp "github.com/AppsFlyer/go-sundheit/http"
  "github.com/AppsFlyer/go-sundheit/checks"
)

func main() {
  // create a new health instance
  h := health.New()
  s := '{"test":"json"}'
  r := bytes.NewBuffer([]byte(s))
  httpCheckConf := checks.HTTPCheckConfig{
    CheckName: "reverse",
    Timeout:   1 * time.Second,
    Body:	r,
    // dependency you're checking - use your own URL here...
    // this URL will fail 50% of the times
    URL:       "http://127.0.0.1:9991/g",
  }
  // create the HTTP check for the dependency
  // fail fast when you misconfigured the URL. Don't ignore errors!!!
  httpCheck, err := checks.NewHTTPCheck(httpCheckConf)
  if err != nil {
    fmt.Println(err)
    return // your call...
  }

  // Alternatively panic when creating a check fails
  httpCheck = checks.Must(checks.NewHTTPCheck(httpCheckConf))

  err = h.RegisterCheck(&health.Config{
    Check:           httpCheck, 
    InitialDelay:    time.Second,      // the check will run once after 1 sec
    ExecutionPeriod: 10 * time.Second, // the check will be executed every 10 sec
  })
  
  if err != nil {
    fmt.Println("Failed to register check: ", err)
    return // or whatever
  }

  // define more checks...
  
  // register a health endpoint
  http.Handle("/admin/health.json", healthhttp.HandleHealthJSON(h))
  
  // serve HTTP
  log.Fatal(http.ListenAndServe(":8080", nil))
}

In this code we POST simple JSON
s := '{"test":"json"}' r := bytes.NewBuffer([]byte(s))

When app started everything is OK - request sent as expected. But on the second iteration i have empty Body in request because they have type io.Reader and can`t use second time.
tcpdump:

GET /g HTTP/1.1
Host: 127.0.0.1:9991
User-Agent: Go-http-client/1.1
Content-Length: 15
Accept-Encoding: gzip

{"test":"json"}HTTP/1.1 404 Not Found
Content-Type: text/plain
Date: Wed, 29 Apr 2020 13:46:40 GMT
Content-Length: 18

404 page not found
-----

GET /g HTTP/1.1
Host: 127.0.0.1:9991
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip

HTTP/1.1 404 Not Found
Content-Type: text/plain
Date: Wed, 29 Apr 2020 12:36:04 GMT
Content-Length: 18

404 page not found

I am newby in Golang and maybe don't understand how to send Body on each check.
In my fork i change type of Body to 'string' and write NewRequest via NewReader:

- req, err := http.NewRequest(check.config.Method, check.config.URL, check.config.Body)
+ req, err := http.NewRequest(check.config.Method, check.config.URL, bytes.NewReader([]byte(check.config.Body)))

After that example send Body on each ticker
diff master...revverse:master

Replace OpenCensus with a metrics interface

Now that OpenTelemetry is becoming a thing, adding support for it would be nice.

However, both OpenTelemetry and OpenCensus are quite heavy dependencies and one might not want to pull in both.

I propose adding an interface for recording metrics that's independent from any metrics collection library. It would allow users to use any implementation (one might just want to use the Prometheus client library, go-kit users could use the go-kit metrics abstraction, etc).

go-sundheit could still ship an OpenCensus implementation, but preferably in a separate module, so that its dependencies are not pulled in by default.

I'd be happy to provide a PR that first relocates OpenCensus integration to a separate module behind a Metrics interface, then an OpenTelemetry implementation in a second PR.

WDYT?

Migrate to OpenTelemetry

Hey,

Very cool package, I'm planning to add it to my toolchain but unfortunately you only have openCensus ("deprecated"). Did you ever though about migrating to OpenTelemetry?

LMK, perhaps I can contribute a new tracer.

Best
Ori

Different probe types

Hey,
is it possible to register different checks / coniditions for the Kubernetes probe types (readines, liveness & startup probe) and expose these different probe types on different endpoints?

For instance I may want to have a single Database check (which checks DB connectivity). If this check fails this should make the readiness & startup probe/endpoint fail but not the liveness probe.

Soft Failure of Individual Health Checks

This might be a bit of an anti-pattern and a bit of a quirk of our environment rather than something that is widely beneficial. We're currently using this package to provide a health check that is consumed via an ALB within AWS. It works well, but we have a dependancy in there which is degrading (SOLR), not catastrophic. Therefore, I would like the ability to have a healthcheck that doesn't return a 503 if failed, as a soft check. That way, if we are seeing issues, we can still diagnose quickly as we can see the issue from an application's perspective, but we are in a degraded state that is not fully customer impacting.

Right now, a 503 error will cause our instances to restart, causing us to have operational issues for customers instead. A soft failure here would give us the best of both words of visibility but not impacting our state. We would obviously monitor this application via another mechanism.

Is this something I can already achieve? Alternatively, is this something you think makes sense and would be interested in having as part of go-sundheit?

API improvements

A couple ideas to improve the API of the package:

RegisterCheck should accept a check and a config separately: RegisterCheck(check Check, cfg *Config) error

This would allow making the config optional (given the health checker accepts a default config applied to all checks). I realize this might not make sense in all cases, but it would give the user a cleaner API when they get to learn the package.

I could even imagine

RegisterCheck(check Check) error
RegisterCheckWithConfig(check Check, cfg Config) error

Or with functional options:

type CheckOption interface {
    applyCheck(check *check)
}

RegisterCheck(check Check, ...CheckOption) error

(Note: #25 implements functional options, this would work on top of that)


Consider dropping deregistration: is it really a useful feature? Making the health checker mutable this way is confusing and prone to error in my opinion. Obviously, if there is a legitimate use case it should stay.

As a consequence, I'd also add a Start (and Stop?) method to the health checker and make RegisterCheck return an error for a started health checker.

Alternatively, create a separate HealthBuilder that's mutable and build a HealthChecker with the builder that's immutable.


I realize these are huge changes, but in the long run these could improve the package API by making it more obvious and compact.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.