Git Product home page Git Product logo

Comments (13)

elasticmachine avatar elasticmachine commented on July 3, 2024

Original comment by @urso:

The needs sounds more like an input/config status, and less about logging. For the purpose of logging it was discussed that we will use filebeat, right?

As a configuration is split into a number of blocks, I assume we will have 2 IDs. One per block to be split, and the overall configuration ID. These IDs should be added via structured logging to all inputs configured.

Despite inputs, we should also consider to add some kind of 'Context' to beat.Event, allowing us to correlate events and logs on events to inputs/configurations. If for example the ES output drops a JSON event due to mapping conflicts, then we want to be able to correlate this log message/fail with the original config an event did originate from.

Back to status. Taking filebeat as example here. Currently the Run method has no return value. That is, we have nothing we could report. In my ongoing refactoring I actually change the signature of Run to also return an error. This one we will be able to report. But, inputs should not just fail. A many 'failures' can be recovered from, by updating the remote system we collect data from. For example the kafka input might fail to read a topic, because it doesn't exist yet. But the moment some other process decides to publish events, the topic will be there and the input recovers. Also network issue can turn an input into 'failure' mode. In this case Run will not return, but retry.
We could augment inputs/modules by reporting some status (similar to systemd/windows services report status) via:

type InputService interface {
  Starting()
  Running()
  Failing(err error)
  Stopping()
  Stopped()
  Fatal(err error)
}

The CM component integrating with agent would create events based on these callbacks, adding the config IDs. The logs themselves still would be shipped asynchronously via filebeat, and might therefore arrive fleet/ES much later then the status update.

For fleet UI we might consider data frames to compute a current status.

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 3, 2024

Original comment by @mattapperson:

Fleet will be sending what amounts to 3 IDs used to identify each configuration:

  • ID, a unique ID of this exact configuration version. It will change every time a configuration changes. This is what all errors relating to configurations need to be tied to.
  • version: An auto-incrementing ID, each change to a configuration bumps this. this number will never go down unless the shared_id changes. If a lower number of version is returned, but the shared_id is the same, the β€œnew” configuration is a bad cache and should be ignored.
  • shared_id: This ID persists across configuration changes, but changes if the agent gets moved to a new configuration (not just to a new version of a configuration)

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 3, 2024

Original comment by @michalpristas:

@ph i think this what matt said is important for stateresolver

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 3, 2024

Original comment by @ph:

@urso I love your proposal here and nice inputs from your filbeat refactoring.

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 3, 2024

Original comment by @ph:

Looking at the IDS:

ID: This will need to be propagated down to the stateresolver.
version: This is only needed by the fetcher of the configuration
shared_id: This is only needed by the fetcher of the configuration

@michalpristas Now if we move to a sync flow as defined in LINK REDACTED we should be fine if we do this. (pseudo code incoming)

  • Receive a configRequest
  • newState, steps := Converge(currrentState, configRequest)
  • Send steps to operator.
  • Check for errors and Call report on the configRequest

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 3, 2024

Original comment by @ph:

Note the above remove the need for the event bus and we do not have aggregation or discard of events in that flow.

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 3, 2024

Original comment by @michalpristas:

we will need to remove all queues (pubsubs) and replace them with direct calls, or keep the capability of queues and introduce ACK(succ/err) for commands.
the benefits of ACKs is that it can work with sync flow as well as async flow.
the sync without a pubsub is easier to read.

both of these will require some work for sure

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 3, 2024

Original comment by @ph:

@michalpristas I've created the proposal as a google docs here and added a tasks list that with a tentative split.

LINK REDACTED

from elastic-agent.

ph avatar ph commented on July 3, 2024

We need to keep this open.

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 3, 2024

Pinging @elastic/ingest-management (Team:ingest-management)

from elastic-agent.

botelastic avatar botelastic commented on July 3, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from elastic-agent.

elasticmachine avatar elasticmachine commented on July 3, 2024

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

from elastic-agent.

ruflin avatar ruflin commented on July 3, 2024

@jlind23 This also partially ties into reporting of input status.

from elastic-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.