Is it by design that ApplicationOk returns <code clas

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

By design? ApplicationOk returns true for health checks without results about go-marathon HOT 10 CLOSED

gambol99 commented on May 30, 2024

By design? ApplicationOk returns true for health checks without results

from go-marathon.

Comments (10)

commented on May 30, 2024 1

I also encountered this issue few days ago, will check again today.

from go-marathon.

timoreimann commented on May 30, 2024

I don't know exactly why the decision to consider tasks with missing health checks as healthy was made. It took place along this commit introduced via PR #18, so hopefully @iderdik remembers.

My guess is that it's related to mesosphere/marathon#1270. If that's the case, then I'd opt for dropping the logic in question as the issue got fixed in 0.8.1 RC2 (read: a long long time ago).

If that's not the case, then my suggestion would be to change the logic such that we expose a ternary state to the caller: health state is good, bad, or unknown.

from go-marathon.

oolongbrothers commented on May 30, 2024

Thank you for the insight, timoreimann. I realise that I was maybe a bit imprecise with my description though.

The code from 1a00a84 that you reference, is this one:

                //When a task is flapping in Marathon, this is sometimes nil
                if check == nil {
                    return false, nil
                }

This is not causing the false positives (unknown == ok) I'm talking about. It's rather the condition on line 372 in combination with the final line of the function:

if task.HealthCheckResult != nil { ... }
...
return true, nil

This causes the function to return true when there are health checks defined but no health check results known (yet). This was the behaviour of ApplicationOk form the start, as added in 1efcfa4.

The code from 1a00a84 you refer to above is just a recheck if task.HealthCheckResult has maybe turned nil between lines 372 and 375. I have no idea if this extra check is made obsolete by an upstream bugfix, but it does not do any harm either.

However, both solutions you propose make sense to me. I could do pull request with a proposed solution, but first I would want to come to an agreement as to which of the two fixes should be implemented.
Can a user do anything useful with the information that the health check result in not known yet? Maybe?
Also a factor, how concerned are you about API backwards compatibility?

from go-marathon.

timoreimann commented on May 30, 2024

@oolongbrothers, major apologies for going silent on this one. I can only suspect that I forgot about the issue over last year's Christmas holidays.

I noticed the "offending" nil check has been refactored away by now. Is the existing go-marathon behavior still causing troubles for you? In case it does, I'd be happy to follow you through any required changes -- promised this time. :)

Thanks.

from go-marathon.

iderdik commented on May 30, 2024

I'm pretty sure I'm running into this again as well. Several applications that never came up properly (and never passed healthchecks) were still deemed "ok" and allowed the deploy to complete. I can't replicate reliably but it seems to be happening often enough to indicate a real issue.

from go-marathon.

iderdik commented on May 30, 2024

@timoreimann Sorry for missing your comment from December! PR #18 actually made a missing healthcheck indicate failure, not success. Perhaps this issue has resurfaced due to removal of that code?

from go-marathon.

timoreimann commented on May 30, 2024

@iderdik could be. Since several people have run into this problem by now, I'll try to find some time to dig deeper.

If anyone happens to have a reproducible use case (@jacob-koren maybe?), I'd be happy to hear. :-)

from go-marathon.

timoreimann commented on May 30, 2024

I think I found the root cause. Will report back later with a thorough analysis and some data points. :-)

from go-marathon.

timoreimann commented on May 30, 2024

curling the Marathon API endpoint continuously while deploying a container, I can see that the healthChecks and tasks properties get populated progressively. Following are their respective values at various distinct steps:

Step 1: both properties yet missing

healthChecks:

<nothing>

tasks:

<nothing>

Step 2: values turn null

healthChecks:

null

tasks:

null

Step 3: both values available with the task not started yet and healthCheckResults missing

healthChecks:

[
  {
    "path": "/health",
    "protocol": "HTTP",
    "portIndex": 0,
    "gracePeriodSeconds": 3,
    "intervalSeconds": 5,
    "timeoutSeconds": 5,
    "maxConsecutiveFailures": 2,
    "ignoreHttp1xx": false
  }
]

tasks:

[
  {
    "id": "app.0bc5c4d4-4f5a-11e6-9956-461ac47997d0",
    "host": "192.168.99.100",
    "ipAddresses": [],
    "ports": [
      31198
    ],
    "startedAt": null,
    "stagedAt": "2016-07-21T15:44:44.514Z",
    "version": "2016-07-21T15:44:43.908Z",
    "slaveId": "ec3708b4-ea0d-4437-b402-d571b212e6ad-S0",
    "appId": "/app"
  }
]

Step 4: task started (startedAt != null) but healthCheckResults still missing

healthChecks:

[
  {
    "path": "/health",
    "protocol": "HTTP",
    "portIndex": 0,
    "gracePeriodSeconds": 3,
    "intervalSeconds": 5,
    "timeoutSeconds": 5,
    "maxConsecutiveFailures": 2,
    "ignoreHttp1xx": false
  }
]

tasks:

[
  {
    "id": "app.0bc5c4d4-4f5a-11e6-9956-461ac47997d0",
    "host": "192.168.99.100",
    "ipAddresses": [],
    "ports": [
      31198
    ],
    "startedAt": "2016-07-21T15:44:45.463Z",
    "stagedAt": "2016-07-21T15:44:44.514Z",
    "version": "2016-07-21T15:44:43.908Z",
    "slaveId": "ec3708b4-ea0d-4437-b402-d571b212e6ad-S0",
    "appId": "/app"
  }
]

Step 5: healthCheckResults finally becomes available

healthChecks:

[
  {
    "path": "/health",
    "protocol": "HTTP",
    "portIndex": 0,
    "gracePeriodSeconds": 3,
    "intervalSeconds": 5,
    "timeoutSeconds": 5,
    "maxConsecutiveFailures": 2,
    "ignoreHttp1xx": false
  }
]

tasks:

[
  {
    "id": "app.0bc5c4d4-4f5a-11e6-9956-461ac47997d0",
    "host": "192.168.99.100",
    "ipAddresses": [],
    "ports": [
      31198
    ],
    "startedAt": "2016-07-21T15:44:45.463Z",
    "stagedAt": "2016-07-21T15:44:44.514Z",
    "version": "2016-07-21T15:44:43.908Z",
    "slaveId": "ec3708b4-ea0d-4437-b402-d571b212e6ad-S0",
    "appId": "/app",
    "healthCheckResults": [
      {
        "alive": true,
        "consecutiveFailures": 0,
        "firstSuccess": "2016-07-21T15:44:49.573Z",
        "lastFailure": null,
        "lastSuccess": "2016-07-21T15:44:49.573Z",
        "taskId": "app.0bc5c4d4-4f5a-11e6-9956-461ac47997d0"
      }
    ]
  }
]

If we execute ApplicationOK during either step 3 or 4, we are already pass the first three checks and subsequently try to iterate through all task's health check results. The latter yet missing in the JSON payload, we short-circuit and fall through to the return statement (as pointed out by @oolongbrothers originally).

Given that we already know about the existence of custom health checks at the time we start iterating (because the related check would have returned otherwise), I believe the right thing to do here is to add another check for task.HealthCheckResults being nil/empty and return false if that's the case.

I'll do some further testing to confirm my findings. In the meantime, I'd appreciate if someone could take a second verifying look. Thanks!

/cc @gambol99

from go-marathon.

timoreimann commented on May 30, 2024

The issue should hopefully be fixed now. If you experience the problem again with the latest version of go-marathon, please post to this ticket again and / or reopen it.

Thanks!

from go-marathon.

By design? ApplicationOk returns true for health checks without results about go-marathon HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent