Git Product home page Git Product logo

Comments (11)

jpmondet avatar jpmondet commented on August 19, 2024

That's a very good idea ! 👍
This could even go further by colorizing the edges depending on the latency.
I don't think it's heavy/redundant since it's not necessarily the same guys that use the UI and the ones that leverage the metrics :-)

from goldpinger.

thegedge avatar thegedge commented on August 19, 2024

We've been working on some network observability tooling @Shopify, inspired by Microsoft's PingMesh paper. We've come up with this so far:

latency grid

  • Each row represents a node sending out a ping, each column a node receiving a ping.
  • The dark cross is a failing node.
  • Green, yellow, and red indicates low, medium, and high round trips, respectively. Currently bucketed with fixed thresholds.

We were wondering if this visualization (plus the round trip additions) would be of interest to contribute upstream to resolve this issue?

[EDIT]
I should mention we're actually interested in emitting more than just round trips. We'll probably want to output TLS handshake, connection open, and DNS resolution times.

from goldpinger.

seeker89 avatar seeker89 commented on August 19, 2024

Hey @thegedge thanks a lot for putting this forward.

TL;DR: We really like this idea, and would absolutely welcome the contribution.

A few thoughts, in no particular order:

  • I like how this representation allows for a compact view of larger clusters, than what can be easily viewed as a graph,
  • there is probably a lot of tweaks that can be made to it, and that could probably be configurable to to suit various use cases. Some ideas that spring to mind:
    • using a continuous scale, instead of bucketing (it would look more like a height map)
    • using grayscale for the whole thing (to allow exporting very small images)
    • wondering if it would be interesting to produce animations that show evolution over time
  • I assume that the image is an artist's impression, otherwise there should be a repeating green on the diagonal ?
  • measuring TLS handshake, connection open, and DNS resolution times would all expand the spectrum of utility of goldpinger, so are a great idea.

So to answer your question: yes please. What can we do to help with that ?

from goldpinger.

thegedge avatar thegedge commented on August 19, 2024

using a continuous scale

I've discussed this with my team, and it's definitely another possibility. The hard thing about it (EDIT: "hard" here meaning configurability, in case someone wanted bucketing and someone else wanted continuous) is that these are just static files, so the best I think we'll be able to do is perhaps put some JS constants at the top of the file that people could tweak for their own preferences. Another option would be to set up a make target to compile the static files from templates.

We have lots of clusters, so we're actually planning on having a side service to persist/aggregate all of the data, and present a global view.

wondering if it would be interesting to produce animations that show evolution over time

Unfortunately, that would mean persisting the data, or having the JS keep some of it in memory. I'm sure this wouldn't be terribly difficult, but likely outside of the scope of what we'll be doing in goldpinger.

I assume that the image is an artist's impression, otherwise there should be a repeating green on the diagonal ?

Actually, this is live data from one of our own clusters (minus the black cross, which was an artificially introduced failure). I was pretty surprised also that the diagonal wasn't green. FYI green in that image would mean <100 ms round trip.

So to answer your question: yes please. What can we do to help with that ?

We already have this running internally, with our own fork of goldpinger :)

I'll polish it up a bit, and then get some PRs rolling.

from goldpinger.

seeker89 avatar seeker89 commented on August 19, 2024

I've discussed this with my team, and it's definitely another possibility. The hard thing about it (EDIT: "hard" here meaning configurability, in case someone wanted bucketing and someone else wanted continuous) is that these are just static files, so the best I think we'll be able to do is perhaps put some JS constants at the top of the file that people could tweak for their own preferences. Another option would be to set up a make target to compile the static files from templates.

I'm not sure I understand that bit. I initially thought it was an actual image being produced - do you mean that's a dynamically build HTML + CSS ? Or do you mean SVG or equivalent ?

We could probably just have a dropdown at the top bar of the UI, that allows you to pick some options ? Or something along these lines ?

Actually, this is live data from one of our own clusters (minus the black cross, which was an artificially introduced failure). I was pretty surprised also that the diagonal wasn't green. FYI green in that image would mean <100 ms round trip.

This is intriguing. I'd probably assume that something's seriously wrong, if a ping to localhost takes >100ms.

We already have this running internally, with our own fork of goldpinger :)

I'll polish it up a bit, and then get some PRs rolling.

Very sweet, looking forward to taking it for a spin !

from goldpinger.

thegedge avatar thegedge commented on August 19, 2024

I'm not sure I understand that bit. I initially thought it was an actual image being produced - do you mean that's a dynamically build HTML + CSS ? Or do you mean SVG or equivalent ?

Yep, it's an <svg> that gets populated using d3.js to supply data via the /check_all endpoint.

We could probably just have a dropdown at the top bar of the UI, that allows you to pick some options ? Or something along these lines ?

I'll just make these settings be JS variables with hardcoded values for the first PR, with (eventually) a follow-up PR that adds in some UI elements to configure them. How does that sound?

This is intriguing. I'd probably assume that something's seriously wrong, if a ping to localhost takes >100ms.

Agreed, and I'm looking into that. I'm thinking the problem could potentially be the use of wall-clock time with so many goroutines running, although I've seen ~100ms timings from something as simple as echo 'test' | nc localhost 8080 within the container. Maybe a combination of scheduling and some general slowness in the server. Definitely needs some more digging.

from goldpinger.

seeker89 avatar seeker89 commented on August 19, 2024

Hey @thegedge just checking in how that's going ? Do you need any assistance ?

from goldpinger.

thegedge avatar thegedge commented on August 19, 2024

Sorry for the lack of communication, @seeker89. I've been caught up on some changes for our own internal project which have been keeping me busy

Unfortunately this means we are no longer using goldpinger, but I did want to make this visualization available to the project. You can find it here: master...Shopify:add-latency. There's still some work to be done, so I'll hand it off there to someone who would like to take it to the finish line.

from goldpinger.

seeker89 avatar seeker89 commented on August 19, 2024

That's a real shame. What did you decide to build instead ?

from goldpinger.

thegedge avatar thegedge commented on August 19, 2024

We ended up rebuilding a stripped down pinger, without all the bells and whistles (no API, no swagger, no static file serving). Now we're focused on a federated Prometheus cluster, with a central dashboard to combine all of this data across clusters in a useful visualization (likely something similar to the screenshot I posted above).

Honestly, the primary reason for us making this move is the ability to move faster. Maintaining a fork with internal, experimental, and public work would be too much friction right now.

One other finding I can share: we had very low CPU requirements set up in k8s, so our round-trip times were way off (a combination of goroutine scheduling + cgroups throttling). A simple change that dramatically improved our timing was to do the pings serially instead of spawning goroutines for all of the pings at once. Eventually I plan on staggering the pings, but for now doing everything in serial mostly results in good timings.

from goldpinger.

seeker89 avatar seeker89 commented on August 19, 2024

We ended up rebuilding a stripped down pinger, without all the bells and whistles (no API, no swagger, no static file serving).

I would be curious to know why the bells and whistles were a problem ? They don't really add much of an overhead in any meaningful way ?

Now we're focused on a federated Prometheus cluster, with a central dashboard to combine all of this data across clusters in a useful visualization (likely something similar to the screenshot I posted above).

That's something the community could definitely benefit from. If you keep the prometheus metrics compatible with goldpinger's, maybe we could reuse the same dashboard !

Either way, good luck, and keep rocking!

from goldpinger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.