Git Product home page Git Product logo

go-marathon's Introduction

Build Status GoDoc Go Report Card Coverage Status

Go-Marathon

Go-marathon is a API library for working with Marathon. It currently supports

  • Application and group deployment
  • Helper filters for pulling the status, configuration and tasks
  • Multiple Endpoint support for HA deployments
  • Marathon Event Subscriptions and Event Streams
  • Pods

Note: the library is still under active development; users should expect frequent (possibly breaking) API changes for the time being.

It requires Go version 1.6 or higher.

Code Examples

There is also an examples directory in the source which shows hints and snippets of code of how to use it — which is probably the best place to start.

You can use examples/docker-compose.yml in order to start a test cluster.

Creating a client

import (
	marathon "github.com/gambol99/go-marathon"
)

marathonURL := "http://10.241.1.71:8080"
config := marathon.NewDefaultConfig()
config.URL = marathonURL
client, err := marathon.NewClient(config)
if err != nil {
	log.Fatalf("Failed to create a client for marathon, error: %s", err)
}

applications, err := client.Applications(nil)
...

Note, you can also specify multiple endpoint for Marathon (i.e. you have setup Marathon in HA mode and having multiple running)

marathonURL := "http://10.241.1.71:8080,10.241.1.72:8080,10.241.1.73:8080"

The first one specified will be used, if that goes offline the member is marked as "unavailable" and a background process will continue to ping the member until it's back online.

You can also pass a custom path to the URL, which is especially needed in case of DCOS:

marathonURL := "http://10.241.1.71:8080/cluster,10.241.1.72:8080/cluster,10.241.1.73:8080/cluster"

If you specify a DCOSToken in the configuration file but do not pass a custom URL path, /marathon will be used.

Customizing the HTTP Clients

HTTP clients with reasonable timeouts are used by default. It is possible to pass custom clients to the configuration though if the behavior should be customized (e.g., to bypass TLS verification, load root CAs, or change timeouts).

Two clients can be given independently of each other:

  • HTTPClient used only for (non-SSE) HTTP API requests. By default, an http.Client with 10 seconds timeout for the entire request is used.
  • HTTPSSEClient used only for SSE-based subscription requests. Note that HTTPSSEClient cannot have a response read timeout set as this breaks SSE communication; trying to do so will lead to an error during the SSE connection setup. By default, an http.Client with 5 seconds timeout for dial and TLS handshake, and 10 seconds timeout for response headers received is used.

If no HTTPSSEClient is given but an HTTPClient is, it will be used for SSE subscriptions as well (thereby overriding the default SSE HTTP client).

marathonURL := "http://10.241.1.71:8080"
config := marathon.NewDefaultConfig()
config.URL = marathonURL
config.HTTPClient = &http.Client{
    Timeout: (time.Duration(10) * time.Second),
    Transport: &http.Transport{
        Dial: (&net.Dialer{
            Timeout:   10 * time.Second,
            KeepAlive: 10 * time.Second,
        }).Dial,
        TLSClientConfig: &tls.Config{
            InsecureSkipVerify: true,
        },
    },
}
config.HTTPSSEClient = &http.Client{
    // Invalid to set Timeout as it contains timeout for reading a response body
    Transport: &http.Transport{
        Dial: (&net.Dialer{
            Timeout:   10 * time.Second,
            KeepAlive: 10 * time.Second,
        }).Dial,
        TLSClientConfig: &tls.Config{
            InsecureSkipVerify: true,
        },
    },
}

Listing the applications

applications, err := client.Applications(nil)
if err != nil {
	log.Fatalf("Failed to list applications: %s", err)
}

log.Printf("Found %d application(s) running", len(applications.Apps))
for _, application := range applications.Apps {
	log.Printf("Application: %s", application)
	appID := application.ID

	details, err := client.Application(appID)
	if err != nil {
		log.Fatalf("Failed to get application %s: %s", appID, err)
	}
	if details.Tasks != nil {
		for _, task := range details.Tasks {
			log.Printf("application %s has task: %s", appID, task)
		}
	}
}

Creating a new application

log.Printf("Deploying a new application")
application := marathon.NewDockerApplication().
  Name(applicationName).
  CPU(0.1).
  Memory(64).
  Storage(0.0).
  Count(2).
  AddArgs("/usr/sbin/apache2ctl", "-D", "FOREGROUND").
  AddEnv("NAME", "frontend_http").
  AddEnv("SERVICE_80_NAME", "test_http").
  CheckHTTP("/health", 10, 5)

application.
  Container.Docker.Container("quay.io/gambol99/apache-php:latest").
  Bridged().
  Expose(80).
  Expose(443)

if _, err := client.CreateApplication(application); err != nil {
	log.Fatalf("Failed to create application: %s, error: %s", application, err)
} else {
	log.Printf("Created the application: %s", application)
}

Note: Applications may also be defined by means of initializing a marathon.Application struct instance directly. However, go-marathon's DSL as shown above provides a more concise way to achieve the same.

Scaling application

Change the number of application instances to 4

log.Printf("Scale to 4 instances")
if err := client.ScaleApplicationInstances(application.ID, 4); err != nil {
	log.Fatalf("Failed to delete the application: %s, error: %s", application, err)
} else {
	client.WaitOnApplication(application.ID, 30 * time.Second)
	log.Printf("Successfully scaled the application")
}

Pods

Pods allow you to deploy groups of tasks as a unit. All tasks in a single instance of a pod share networking and storage. View the Marathon documentation for more details on this feature.

Examples of their usage can be seen in the examples/pods directory, and a smaller snippet is below.

// Initialize a single-container pod running nginx
pod := marathon.NewPod()

image := marathon.NewDockerPodContainerImage().SetID("nginx")

container := marathon.NewPodContainer().
	SetName("container", i).
	CPUs(0.1).
	Memory(128).
	SetImage(image)

pod.Name("mypod").AddContainer(container)

// Create it and wait for it to start up
pod, err := client.CreatePod(pod)
err = client.WaitOnPod(pod.ID, time.Minute*1)

// Scale it
pod.Count(5)
pod, err = client.UpdatePod(pod, true)

// Delete it
id, err := client.DeletePod(pod.ID, true)

Subscription & Events

Request to listen to events related to applications — namely status updates, health checks changes and failures. There are two different event transports controlled by EventsTransport setting with the following possible values: EventsTransportSSE and EventsTransportCallback (default value). See Event Stream and Event Subscriptions for details.

Event subscriptions can also be individually controlled with the Subscribe and Unsubscribe functions. See Controlling subscriptions for more details.

Event Stream

Only available in Marathon >= 0.9.0. Does not require any special configuration or prerequisites.

// Configure client
config := marathon.NewDefaultConfig()
config.URL = marathonURL
config.EventsTransport = marathon.EventsTransportSSE

client, err := marathon.NewClient(config)
if err != nil {
	log.Fatalf("Failed to create a client for marathon, error: %s", err)
}

// Register for events
events, err = client.AddEventsListener(marathon.EventIDApplications)
if err != nil {
	log.Fatalf("Failed to register for events, %s", err)
}

timer := time.After(60 * time.Second)
done := false

// Receive events from channel for 60 seconds
for {
	if done {
		break
	}
	select {
	case <-timer:
		log.Printf("Exiting the loop")
		done = true
	case event := <-events:
		log.Printf("Received event: %s", event)
	}
}

// Unsubscribe from Marathon events
client.RemoveEventsListener(events)

Event Subscriptions

Requires to start a built-in web server accessible by Marathon to connect and push events to. Consider the following additional settings:

  • EventsInterface — the interface we should be listening on for events. Default "eth0".
  • EventsPort — built-in web server port. Default 10001.
  • CallbackURL — custom callback URL. Default "".
// Configure client
config := marathon.NewDefaultConfig()
config.URL = marathonURL
config.EventsInterface = marathonInterface
config.EventsPort = marathonPort

client, err := marathon.NewClient(config)
if err != nil {
	log.Fatalf("Failed to create a client for marathon, error: %s", err)
}

// Register for events
events, err = client.AddEventsListener(marathon.EventIDApplications)
if err != nil {
	log.Fatalf("Failed to register for events, %s", err)
}

timer := time.After(60 * time.Second)
done := false

// Receive events from channel for 60 seconds
for {
	if done {
		break
	}
	select {
	case <-timer:
		log.Printf("Exiting the loop")
		done = true
	case event := <-events:
		log.Printf("Received event: %s", event)
	}
}

// Unsubscribe from Marathon events
client.RemoveEventsListener(events)

See events.go for a full list of event IDs.

Controlling subscriptions

If you simply want to (de)register event subscribers (i.e. without starting an internal web server) you can use the Subscribe and Unsubscribe methods.

// Configure client
config := marathon.NewDefaultConfig()
config.URL = marathonURL

client, err := marathon.NewClient(config)
if err != nil {
	log.Fatalf("Failed to create a client for marathon, error: %s", err)
}

// Register an event subscriber via a callback URL
callbackURL := "http://10.241.1.71:9494"
if err := client.Subscribe(callbackURL); err != nil {
	log.Fatalf("Unable to register the callbackURL [%s], error: %s", callbackURL, err)
}

// Deregister the same subscriber
if err := client.Unsubscribe(callbackURL); err != nil {
	log.Fatalf("Unable to deregister the callbackURL [%s], error: %s", callbackURL, err)
}

Contributing

See the contribution guidelines.

Development

Marathon Fake

go-marathon employs a fake Marathon implementation for testing purposes. It maintains a YML-encoded list of HTTP response messages which are returned upon a successful match based upon a number of attributes, the so-called message identifier:

  • HTTP URI (without the protocol and the hostname, e.g., /v2/apps)
  • HTTP method (e.g., GET)
  • response content (i.e., the message returned)
  • scope (see below)

Response Content

The response content can be provided in one of two forms:

  • static: A pure response message returned on every match, including repeated queries.
  • index: A list of response messages associated to a particular (indexed) sequence order. A message will be returned iff it matches and its zero-based index equals the current request count.

An example for a trivial static response content is

- uri: /v2/apps
  method: POST
  content: |
		{
		"app": {
		}
		}

which would be returned for every POST request targetting /v2/apps.

An indexed response content would look like:

- uri: /v2/apps
  method: POST
  contentSequence:
		- index: 1
		- content: |
			{
			"app": {
				"id": "foo"
			}
			}
		- index: 3
		- content: |
			{
			"app": {
				"id": "bar"
			}
			}

What this means is that the first POST request to /v2/apps would yield a 404, the second one the foo app, the third one 404 again, the fourth one bar, and every following request thereafter a 404 again. Indexed responses enable more flexible testing required by some use cases.

Trying to define both a static and indexed response content constitutes an error and leads to panic.

Scope

By default, all responses are defined globally: Every message can be queried by any request across all tests. This enables reusability and allows to keep the YML definition fairly short. For certain cases, however, it is desirable to define a set of responses that are delivered exclusively for a particular test. Scopes offer a means to do so by representing a concept similar to namespaces. Combined with indexed responses, they allow to return different responses for message identifiers already defined at the global level.

Scopes do not have a particular format -- they are just strings. A scope must be defined in two places: The message specification and the server configuration. They are pure strings without any particular structure. Given the messages specification

- uri: /v2/apps
  method: GET
	# Note: no scope defined.
  content: |
		{
		"app": {
			"id": "foo"
		}
		}
- uri: /v2/apps
  method: GET
	scope: v1.1.1  # This one does have a scope.
  contentSequence:
		- index: 1
		- content: |
			{
			"app": {
				"id": "bar"
			}
			}

and the tests

func TestFoo(t * testing.T) {
	endpoint := newFakeMarathonEndpoint(t, nil)  // No custom configs given.
	defer endpoint.Close()
	app, err := endpoint.Client.Applications(nil)
	// Do something with "foo"
}

func TestFoo(t * testing.T) {
	endpoint := newFakeMarathonEndpoint(t, &configContainer{
		server: &serverConfig{
			scope: "v1.1.1",		// Matches the message spec's scope.
		},
	})
	defer endpoint.Close()
	app, err := endpoint.Client.Applications(nil)
	// Do something with "bar"
}

The "foo" response can be used by all tests using the default fake endpoint (such as TestFoo), while the "bar" response is only visible by tests that explicitly set the scope to 1.1.1 (as TestBar does) and query the endpoint twice.

go-marathon's People

Contributors

andrewburnell avatar atheatos avatar blahhhhh avatar bobrik avatar br-lewis avatar ch3lo avatar derek-elliott avatar eicca avatar elliot avatar gambol99 avatar h-marvin avatar iandyh avatar jhaals avatar kesselborn avatar lamdor avatar makkes avatar mariomarin avatar matt-deboer avatar mattes avatar mike-solomon avatar mjeri avatar ondrej-smola avatar raffo avatar timoreimann avatar wu8685 avatar x-cray avatar xinxian0458 avatar xperimental avatar zaneobrien avatar zyfdegh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-marathon's Issues

Generalize and enrich REST API error representations

HTTP errors coming in from Marathon's REST API have a fairly limited representation in the library. In particular, the following two shortcomings exist:

  1. HTTP error codes are not included, making fault analysis more difficult for clients than necessary.
  2. For errors not represented by dedicated error structs, an attempt is made to retrieve a top-level message property from the HTTP response. Unfortunately, Marathon does not always provide such a property, causing go-marathon to fall back to a rather non-informative unknown error description. Or Marathon provides properties in addition to message which are missed out. An example for the former is a 409 returned after a POST /v2/apps/{appId} (which also contains the deployment ID in a deployments array); a case for the latter is a 422 response (Unprocessable Entity) when an invalid app ID is passed to Marathon. (The resulting response will repeat the violating ID in an attribute property along with error details in an error property, both wrapped in an errors object.)

It'd be great if go-marathon could be more verbose regarding the error details it exposes.

createApplication with tested config fails

I tried this config directly in my marathon instance and it works.
But using the API it returns:

error: Marathon API error: Object is not valid (attribute '': )

{
"id": "/appname",
"container": {
"type": "DOCKER",
"docker": {
"forcePullImage": true,
"image": "dockerimagelocation",
"network": "BRIDGE",
"parameters": [],
"portMappings": [
{
"containerPort": 8080,
"hostPort": 0,
"servicePort": 10004,
"protocol": "tcp",
"name": "app-port",
"labels": {}
},
{
"containerPort": 8000,
"hostPort": 0,
"servicePort": 10005,
"protocol": "tcp",
"name": "debug-port",
"labels": {}
},
{
"containerPort": 6300,
"hostPort": 0,
"servicePort": 10006,
"protocol": "tcp",
"name": "code-coverage-port",
"labels": {}
}
],
"privileged": true
},
"volumes": []
},
"cpus": 1,
"env": {
"JAVA_OPTS": "-Xmx512m",
"environment": "Production",
},
"healthChecks": [
{
"portIndex": 0,
"path": "/health",
"maxConsecutiveFailures": 3,
"protocol": "HTTP",
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20
}
],
"instances": 1,
"mem": 1000,
"ports": null,
"dependencies": null,
"uris": [],
"labels": {
"HAPROXY_10_VHOST": "app.marathon-lb.dcos.dev.zooz.co",
"HAPROXY_GROUP": "external"
},
"fetch": null
}

Authenticate http

is there a way to set authentication for the client?
I need the equivalent of
curl -u

Improved error handling

I wanted to improve the current implementation of the error handling. To do this, I had to get rid of the simple handling that was in place till now. I'm working on a separate branch, we can discuss this in the following PR: #77

PLEASE DO NOT MERGE THE PR! I want to discuss this with the community and understand how to proceed. We can maybe also think to schedule it for a major (breaking) release.

Application and Group GET API should support embed options.

For APP:
embedone of (app.tasks, app.count, app.deployments, app.lastTaskFailure, app.failures, app.taskStats), repeatable
Embeds nested resources that match the supplied path. You can specify this parameter multiple times with different values.
• app.tasks. embed tasks Note: if this embed is definded, it automatically sets apps.deployments but this will change in a future release. Please define all embeds explicitly.
• app.counts. embed all task counts (tasksStaged, tasksRunning, tasksHealthy, tasksUnhealthy)
Note: currently embedded by default but this will change in a future release. Please define all embeds explicitly.
• app.deployments. embed all deployment identifier, if the related app currently is in deployment.
• app.readiness embed all readiness check results
• app.lastTaskFailure embeds the lastTaskFailure for the application if there is one.
• app.failures Shorthand for apps.lastTaskFailure, apps.tasks, apps.counts and apps.deployments.
Note: deprecated and will be removed in future versions Please define all embeds explicitly.
• app.taskStats exposes task statatistics in the JSON.
Example: embed=app.deployments&embed=app.lastTaskFailure

For Group:
Query Parameters
embedone of (group.groups, group.apps, group.apps.tasks, group.apps.count, group.apps.deployments, group.apps.lastTaskFailure, group.apps.failures, group.apps.taskStats), repeatable

Embeds nested resources that match the supplied path. You can specify this parameter multiple times with different values. Unknown embed parameters are ignored. If you omit this parameter, it defaults to group.groups, group.apps

group.groups embed all child groups of each group
group.apps embed all apps of each group
group.apps.tasks embed all tasks of each application
group.apps.counts embed all task counts (tasksStaged, tasksRunning, tasksHealthy, tasksUnhealthy)
group.apps.deployments embed all deployment identifier, if the related app currently is in deployment.
group.apps.readiness embed all readiness check results
group.apps.lastTaskFailure embeds the lastTaskFailure for the application if there is one.
group.apps.taskStats exposes task statistics in the JSON.

Example: group.apps

Marathon API Doc:https://mesosphere.github.io/marathon/docs/rest-api.html

go-marathon does not work with marathon 1.1.1 due to changed deployments/steps schema

go-marathon does not work correctly anymore with marathon 1.1.1 as the deployments/steps schema was changed.
Up to now it looks more like this is a marathon bug which currently is set to fixed but does not fix the real problem (see comment on the issue ).
This issue is to track the solution and create a pull request should the new schema be correct. Happy to provide a pull request once the situation is more clear.

Golint conformance

I'm opening this issue here for discussion on the subject. @eicca suggested (#79 (comment)) that we might consider making project code meet golint rules. However, there are many places which are not just simply missing doc comments but require renaming of exported names. This will definitely be a breaking change. So, two questions here:

  • Do we introduce a versioning strategy? (Also to track Marathon API changes)
  • Do we try to make code meet golint rules?

TestApplications failing on Go 1.7

As can be observed on this Travis CI build run, running all tests under Go 1.7 produces an error:

=== RUN   TestApplications
--- FAIL: TestApplications (0.00s)
    Error Trace:    application_test.go:265
    Error:      Received unexpected error "Marathon API error: not found"

    Error Trace:    application_test.go:266
    Error:      Expected value not to be nil.

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x48e641]

goroutine 44 [running]:
panic(0x7083c0, 0xc420014140)
    /home/travis/.gimme/versions/go1.7.linux.amd64/src/runtime/panic.go:500 +0x1a1
testing.tRunner.func1(0xc420158240)
    /home/travis/.gimme/versions/go1.7.linux.amd64/src/testing/testing.go:579 +0x25d
panic(0x7083c0, 0xc420014140)
    /home/travis/.gimme/versions/go1.7.linux.amd64/src/runtime/panic.go:458 +0x243
github.com/gambol99/go-marathon.TestApplications(0xc420158240)
    /home/travis/gopath/src/github.com/gambol99/go-marathon/application_test.go:267 +0x141
testing.tRunner(0xc420158240, 0x790d40)
    /home/travis/.gimme/versions/go1.7.linux.amd64/src/testing/testing.go:610 +0x81
created by testing.(*T).Run
    /home/travis/.gimme/versions/go1.7.linux.amd64/src/testing/testing.go:646 +0x2ec
exit status 2
FAIL    github.com/gambol99/go-marathon 0.012s
make: *** [test] Error 1

Under Go 1.5 and 1.6, everything is working fine.

Listing apps with labels

Marathon can filter apps with specific labels: mesosphere/marathon#1416. You can also embed tasks into apps to avoid extra requests with embed=apps.tasks.

I propose add an argument to Marathon.Applications(), like this:

type ApplicationsOptions struct {
    Embed string
    Cmd   string
    Label string
}

I can come up with a PR.

Odd Error object issue

My app was having Marathon connectivity issues and unexpectedly panicked when using the err object returned by client.CreateApplication.

My code:

        _, err := m.Client.CreateApplication(application)
    if err != nil {
        log.Error(err)
        if err.Error() == ErrClusterConnectivity.Error() {
            m.resetClient()
        }
      }

The call to log.Error(err) prints '<nil>' and the call to err.Error() panics with a null pointer exception. Since this is all wrapped by if err != nil, it feels like the err object returned by .CreateApplication (https://github.com/gambol99/go-marathon/blob/master/application.go#L538) might be improperly initialized.

This the actual line that causes the panic: https://github.com/gambol99/go-marathon/blob/master/error.go#L55

Any idea what's going on here?

Can not unmarshal the attribute [CurrentStep] in Event [Deployment Info] to object EventDeploymentInfo

either can EventDeploymentStepSuccess and EventDeploymentStepFailure

Taking EventDeploymentInfo for example, the event response message is:

....
    "currentStep": {
        "actions": [
            {
                "type": "ScaleApplication",
                "app": "/testorg/testspace/test.app"
            }
        ]
    },
....

But the definition of EventDeploymentInfo is:

// EventDeploymentInfo describes a 'deployment_info' event.
type EventDeploymentInfo struct {
    EventType   string          `json:"eventType"`
    CurrentStep *DeploymentStep `json:"currentStep"`
    Timestamp   string          `json:"timestamp"`
    Plan        *DeploymentPlan `json:"plan"`
}

// DeploymentStep is a step in the application deployment plan
type DeploymentStep struct {
    Action string `json:"action"`
    App    string `json:"app"`
}

They do not match.

Tasks endpoint does not return JSON

For some reason, Marathon does not return JSON for its v2/tasks endpoint. This library expects a JSON response and fails with "failed to unmarshall the response from marathon," when calling client.GetAllTasks().

Unreliable IP retrieval for Event Subscription

client.RegisterSubscription in subscription.go assumes that the interface passed in from the config is unique. On my (and I'm sure most) machines, there is are IPv6 and IPv4 interfaces with the same name. The matching in the getInterfaceAddress in utils.go will most likely get an unusable address. I'm not sure what the best way to handle this is though :(. Perhaps optionally allow an IP to be explicitly set in the config which puts the onus on the developer to retrieve and use the correct IP?

Potential goroutine leak when sending events to listeners (subscription.go)

This is different from #199 and was partially discussed in #198 .

The second of code in question currently looks like this:

// step: check if anyone is listen for this event
for channel, filter := range r.listeners {
    // step: check if this listener wants this event type
    if event.ID&filter != 0 {
        go func(ch EventsChannel, e *Event) {
            ch <- e
        }(channel, event)
    }
}

For each event, a new goroutine is created for the sole purpose of sending an event to a channel that a consumer is listening on. #198 describes a race condition that will be fixed, but there's still the issue of if the consumer is slow or stop consuming, idle goroutines will be created and consume resources. While this may not be a major issue, it's still something that should be prevented if possible as it's a poor practice to spawn goroutines that may/will never exit.

An initial attempt to fix #198 in a local project was to remove the use of a goroutine and simply rely on the consumer to ensure that the channel is buffered and properly drained. The current code using this library looks similar to this:

update := make(marathon.EventsChannel, 5)
go func(stop chan bool, client.marathon.Marathon, update marathon.EventsChannel) {
    <-stop
    client.RemoveEventsListener(update)
    close(update)
}
for event := range update {
    // Handle the event here.
}

After running for several days, this is working very well. Unfortunately, it suffers from a significant problem: if a consumer is very slow or stopped consuming, all listeners are blocked.

One thought to resolve this is to create one goroutine per listener. Each of these goroutines would have an internal buffer that would contain unsent events. Some management would be needed to keep the buffer in check. E.g. when full, drop the newest or oldest message. Now, we're down to 1 goroutine per listener vs. N goroutines per event.

Another solution that can work whether goroutines are used or not is to use a timeout when sending. This would ensure that goroutines are not permanent, but this can still be problematic because if the timeout is too short, events will be dropped if the consumer is a bit slow, but still consuming. If it's too long, then everything can slow down significantly if the events are not being consumed. Below is a snippet illustrating this idea:

select {
case eventChan <- event:
    fmt.Println("Event sent!")
case <-time.After(time.Second):
    fmt.Println("Event wasn't sent due to timeout sending the message.")
}

I've also toyed with the idea of having the consumer pass in a function/closure to the a method that will manage reading the events and calling the function, but I'm not sure where I sit on that. While I like the benefit of it simplifying things for the consumer, it may be overkill and I have a nagging feeling that it may result in consumers jumping through hoops under some circumstances.

In my case, the first solution is working very well, but not knowing all use cases, I don't believe it's ideal. The second solution that uses goroutines with buffers is probably the best all-around solution that I can think of right now, but it does introduce some complexity and may be overkill for many cases. Perhaps there's a better solution. I hope this sparks a discussion for the best solution possible.

One last note: in #198, I believe both @timoreimann and myself agree that perhaps the user should not pass in a channel but instead the subscription code should create the channel that will be used. This would help ensure that regardless of the solution used, the channel is correctly configured.

create and update application not returning the response

I noticed that in the current version the response from UpdateApplication or CreateApplication is only the error. For my use case, I would need these 2 operations to also return the deserialized Application (which is already available in those methods) so as to know the deployment ids.

generic logger

Implement a generic logger interface, defaulting to glog but allowing the consumer to override and run their own ..

Ping returns a false if marathon sends a pong back

// Pings the current marathon endpoint (note, this is not a ICMP ping, but a rest api call)
func (client *Client) Ping() (bool, error) {
    if err := client.apiGet(MARATHON_API_PING, nil, nil); err != nil {
        return false, err
    } else {
        return true, nil
    }
}

For me this looks like if it can ping it returns a false, but it should be a true or am I wrong?

By design? ApplicationOk returns true for health checks without results

Is it by design that ApplicationOk returns true if an application has a health check defined but the health check result being nil?

We're talking about this section of code here:

    // step: iterate the application checks and look for false
    for _, task := range application.Tasks {
        if task.HealthCheckResult != nil {
            for _, check := range task.HealthCheckResult {
                //When a task is flapping in Marathon, this is sometimes nil
                if check == nil {
                    return false, nil
                }
                if !check.Alive {
                    return false, nil
                }
            }
        }
    }

What I am trying to accomplish is to implement a zero-downtime deployment where an older version of an app (e.g. /my/app/v1) should continue to run until the new one (e.g. /my/app/v2) has been deployed successfully with all health checks passing.
Polling ApplicationOk until that returns true does not work, because of the behaviour stated above: For the longest time during a deployment of a new application version, all HealthCheckResults of its Tasks are nil - and therefore ApplicationOk returns true before the new app version is initialised or even up.

My question is: Am I holding it wrong? ;)
I mean is there a different and better way you designed to do this?
Or is the behaviour described above simply a corner case not accounted for just yet?

Docker Container PortMapping Labels

Currently, go-marathon doesn't support labels for Docker container portmappings:

"container": {
    "docker": {
      "forcePullImage": false,
      "image": "mesosphere:marathon/latest",
      "network": "BRIDGE",
      "parameters": [
        {
          "key": "name",
          "value": "kdc"
        }
      ],
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 0,
          "protocol": "tcp",
          "servicePort": 10019,
          "name": "http",
          **"labels": { "vip": "192.168.0.1:80" }**
        }
      ],
      "privileged": false
    }

I think a map should be added on the PortMapping struct to allow for labels to be passed in. These labels are useful for setting VIPs in DCOS.

If you guys find this useful, I already have a fork of go-marathon with these changes.

Improve unit tests

I noticed that there a lot of unit tests missing for some of the operations. I'm opening this issue to keep track of this, so that some of us will eventually start working on this.

kill task does not work if app id has hierarchical path

func (r *marathonClient) KillTask(taskID string, scale bool) (*Task, error) {
  ...
  appNamewithUnderScore := taskID[0:strings.LastIndex(taskID, ".")]

  // following line needs to be added. 
  appName := strings.Replace(appNamewithUnderScore, "_", "/", -1)
}

Marathon application schema changes.

taskKillGracePeriodSeconds field was added to marathon application struct.
Documentation about it can be found here - https://github.com/mesosphere/marathon/blob/master/docs/docs/rest-api/mesosphere/marathon/api/v2/AppsResource_create.md

Also secrets field was added, but i can't find any documentation about it right now, its only mentioned in schema - https://github.com/mesosphere/marathon/blob/master/docs/docs/rest-api/public/api/v2/schema/AppDefinition.json

Basic Auth/TLS broken in marathonClient.AddEventsListener()

When we create a marathon client with basic auth and/or TLS, the call to marathonClient.AddEventsListener() returns an error:

401: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 401 </title>
</head>
<body>
<h2>HTTP ERROR: 401</h2>
<p>Problem accessing /v2/events. Reason:
<pre>    Unauthorized</pre></p>
<hr /><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.3.z-SNAPSHOT</a><hr/>
</body>
</html>

with TLS:

x509: certificate is valid for *.mydomain.net, mydomain.net, not ***.***.***.***.mydomain.net

Here is the code used to register listeners: https://github.com/containous/traefik/blob/master/provider/marathon.go#L67

It seems marathonClient.AddEventsListener() doesn't use TLS/basic auth client config.

HasHealthCheckResults false is not the same as unhealthy

There is an misunderstanding about HealthCheckResults usage, as I saw in tasks.go allHealthChecksAlive() bool:
// check: does the task have a health check result, if NOT, it's because the
// health of the task hasn't yet been performed, hence we assume it as DOWN

It is incorrect to assume this, Marathon in some situations has trouble to do healthchecks, so you can't have results, but not necessarily the app isn't healthy. For example,when a new leader is elected, Marathon run healthchecks in all tasks, so you start with 0 tasks with HealthCheckResults, but your apps are healthy.

wait_on_running in CreateApplication immediately times out

This

if wait_on_running {
        return client.WaitOnApplication(application.ID, 0)
}

immediately times out. We should be able to pass a timeout time to WaitOnApplication. Perhaps changing wait_on_running to an int is easiest, and then.

if wait_on_running != 0 {
        return client.WaitOnApplication(application.ID, wait_on_running)
}

If you want to do this, I can submit a PR.

Optional Marathon app spec attributes causing fatal error

When I'm trying to read following Marathon app spec from a file and then launch it:

"docker": {
    "image": "python:3",
    "network": "BRIDGE",
    "portMappings": [
        {
            "containerPort": 8080,
            "hostPort": 0
        }
    ]
}

then I get a:

FATA[0000] Failed to create application null. Error: Marathon API error: Object is not valid (attribute '': )` 

I need to add: "servicePort": 0, "protocol": "tcp" there to make it work. This shouldn't be necessary since the latter two attributes are not required.

I'm using Marathon version 1.1.1 on DC/OS 1.7.

Cluster members may have URL prefix

Hello,

First of all, thanks for your work !

It looks like #178, but i'm not sure it's the same use case.
My Mesos hosts are hosted behind a reverse proxy (https://<my-host>/marathon/<foo>).
newCluster function (https://github.com/gambol99/go-marathon/blob/master/cluster.go#L109) ignores path.

So I fixed it with this :

Let me know if you want me to fix it a different way.

Regards
François

Application.ApplicationOK doesn't check for expected number of tasks

In https://github.com/gambol99/go-marathon/blob/master/application.go#L337 the Application is marked OK if 0 tasks are running. There are times that a bad container is deployed and the application is marked ok if they all tasks are removed at the same time (while marathon retries them).

Would it be possible to add a check to ensure that the number of running tasks matches the expected number of tasks? I'm not sure where to get that expected number of tasks from or I'd open the PR myself.

HTTP error code 422 (Unprocessable Entity) not deserialized correctly

At least for Marathon 1.1.1, go-marathon does not deserialize the error message from a 422 response properly: As found out in #162, the JSON body

{
  "message": "Object is not valid",
  "details": [
    {
      "path": "\/value",
      "errors": [
        "Requested service port 10000 conflicts with a service port in app \/myapp-green2",
        "Requested service port 10000 conflicts with a service port in app \/myapp"
      ]
    }
  ]
}

produces the APIError

Marathon API error: Object is not valid (attribute '': )

We need to update the parsing logic.

Potential goroutine leak when using SSE subscriptions

The goroutine here listens for events on two channels. However, the goroutine has no way to know when it should stop. If I create an application that wants to close the marathon connection, this goroutine will continue to run in the background. It is highly suggested, and a best practice, that this goroutine be modified so that it can be stopped if the user wishes to close the connection to Marathon.

Vendor dependencies

We currently use go get to pull in dependencies and thus rely on HEAD-level compatibility for all third party packages. With vendoring becoming the default mechanism in Go to manage and pin dependencies, I suggest to introduce that in order to provide some means of protection from breaking and more subtle, non-breaking upstream changes.

FWIW, my organization happened to have reviewed a number of tools fairly recently, looking at godep, govendor, and glide in detail. We didn't like glide for a few reasons and couldn't use godep due to the way our repository is structured, and eventually went with govendor which was both fairly simple to use and provided most useful features. Personally, I'd be happy to go either with that or godep, possibly preferring the latter slightly more since it seems to be more wide-spread, comes with all necessary features, and might be easier to use by contributors.

Assuming we agree on the usefulness of vendoring our dependencies, do people have opinions on what tool we should use?

Marathon in HA mode

HA -- run any number of Marathon schedulers, but only one gets elected as leader; if you access a non-leader, your request gets proxied to the current leader

As proxying adds latency, we should try to find the leader (/v2/leader endpoint) in the cluster logic and then send all further requests to the master/leader directly.

Whenever a http request fails:

  1. find the new leader.
  2. re-send the request if step 1 only took x ms? (to make it more fault tolerant)

Also check for the latest leader/master in the background every minute or so. Because the currently used leader/master might transition to a slave in the meanwhile.

Add missing deserialization tests

There's a fairly large discrepancy between the number of fields we populated the test YML file with and the amount of test checks we have. Effectively, this boils down to a lack of test coverage with regards to deserialization. At the same time, we have lots of duplication in the test messages. While this can be reasonable at times, in our case it mostly seems superfluous.

We should attempt to close the coverage gap and simultaneously reduce the amount of spec duplication. Hopefully this can be accomplished by distinguishing better between YML content that pertains to deserialization testing and other that affects code logic testing.

Discovered along #140.

Race condition when sending events to listeners

Some background: I modified Traefik, which uses this library, to periodically shutdown the Marathon client and create a new one. This is due to a bug in Marathon where Traefik will stop receiving events when an instance crashes. AFAIK, this hasn't been fixed yet and I don't know when, hence the work-around.

So, after getting this code working better, I started seeing panic from the below code where it tries to send on a closed channel:

for channel, filter := range r.listeners {
    // step: check if this listener wants this event type
    if event.ID&filter != 0 {
        go func(ch EventsChannel, e *Event) {
            ch <- e
        }(channel, event)
    }
}

The code I'm working on when it is told to stop will remove the listener, then close the channel it was using for updates. Looking at your code, this would appear to work fine until I realized that because you're sending the events to the channel within a goroutine, that what must be happening is the following:

  1. An event comes in and a goroutine is spawned to send the event to a channel.
    • The goroutine does not run at this time.
  2. My code gets a stop message.
  3. The listener is removed.
  4. The channel is closed.
  5. The goroutine in step one now runs and panics.

Now, I do understand why the code is designed this way: you don't want to block other listeners from receiving events if one channel is full which will force the goroutine to block. However, I'm thinking it may be best to not use a goroutine in this case and document that buffered channels should be used instead. Perhaps a function could be added to create the correct channel vs. relying on the user of the API to make sure it's buffered?

I understand this may be an atypical use case, but I wanted to report this bug so that it was known before this affects production systems.

HasApplication uses ListApplications and retrieves entire state of marathon

When using the HasApplication function go-marathon queries through to marathon with the apps endpoint. In deployments with large numbers of applications this causes marathon to return a significant amount of data. Functionally, HasApplication seems nearly equivalent to Application (it just handles the ErrDoesNotExist error by returning a falsey value).

Is there any difference between the approach I describe and the current implementation with ListApplications?

If not, I'm happy to submit a PR to improve this. Thanks!

Publish new version tag and define future tag policy

go-marathon has a single git tag v0.0.1 which is fairly outdated. I believe we should push another tag now and make it become a habit to do so periodically.

My reasoning is that the Go ecosystem has produced tooling by now which is capable of leveraging and even detecting tag-based semantic versioning. In fact, I noticed the single tag version by running glide over our internal code base and seeing how it would offer me to use go-marathon 0.0.1. This gives the impression that this is a fairly recent release, which it isn't.

Although I can't recall clearly anymore, it seems it was me who pushed the single tag back in January, since it's related to one my features. Thus, one option we have is to just delete the tag and continue running without any versions. Following the recent discussions around package management and versioning, however, there seems to be consensus to bless semantic versioning by the Go community in the future. Thus, I think it would be advisable to come into a habit of versioning go-marathon releases regularly.

If we can find agreement on this point, the next question would be whether we should go with a 1.0.0 release or continue working on 0.x.y versions. The former would require a certain sense of stability whereas the latter would allowed us to make additional breaking changes until we are more confident about the API. Given that the majority of changes done over the last few months were of an additive nature (mostly to account for extensions to the Marathon REST API), I believe we are pretty close to a stable 1.0.0. What I would suggest though is to spend a bit of dedicated time going over the code base looking for design deficiencies that we would need to fix through breaking API changes. We could start this effort by tagging the current HEAD with v0.1.0, bumping the minor version as we fix things (as recommended), and finalize with the first major release. I don't expect this process to take too many minor releases, but I believe it would be good to have before making any stability promises.

Opinions on this one? @gambol99, what do you think?

newCluster() broken code

https://github.com/gambol99/go-marathon/blob/master/cluster.go#L90

newCluster() should first split by commas and then do the url.Parse(). I'm surprised that url.Parse isn't failing anyway. The marathon variables holds:

&url.URL{Scheme:"http", Opaque:"", User:(*url.Userinfo)(nil), Host:"10.241.1.71:8080,10.241.1.72:8080,10.241.1.73:8080", Path:"", RawPath:"", RawQuery:"", Fragment:""}

... and Host doesn't look like a valid host to me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.