Comments (13)
Original comment by @urso:
The needs sounds more like an input/config status, and less about logging. For the purpose of logging it was discussed that we will use filebeat, right?
As a configuration is split into a number of blocks, I assume we will have 2 IDs. One per block to be split, and the overall configuration ID. These IDs should be added via structured logging to all inputs configured.
Despite inputs, we should also consider to add some kind of 'Context' to beat.Event, allowing us to correlate events and logs on events to inputs/configurations. If for example the ES output drops a JSON event due to mapping conflicts, then we want to be able to correlate this log message/fail with the original config an event did originate from.
Back to status. Taking filebeat as example here. Currently the Run
method has no return value. That is, we have nothing we could report. In my ongoing refactoring I actually change the signature of Run to also return an error. This one we will be able to report. But, inputs should not just fail. A many 'failures' can be recovered from, by updating the remote system we collect data from. For example the kafka input might fail to read a topic, because it doesn't exist yet. But the moment some other process decides to publish events, the topic will be there and the input recovers. Also network issue can turn an input into 'failure' mode. In this case Run
will not return, but retry.
We could augment inputs/modules by reporting some status (similar to systemd/windows services report status) via:
type InputService interface {
Starting()
Running()
Failing(err error)
Stopping()
Stopped()
Fatal(err error)
}
The CM component integrating with agent would create events based on these callbacks, adding the config IDs. The logs themselves still would be shipped asynchronously via filebeat, and might therefore arrive fleet/ES much later then the status update.
For fleet UI we might consider data frames to compute a current status.
from elastic-agent.
Original comment by @mattapperson:
Fleet will be sending what amounts to 3 IDs used to identify each configuration:
- ID, a unique ID of this exact configuration version. It will change every time a configuration changes. This is what all errors relating to configurations need to be tied to.
- version: An auto-incrementing ID, each change to a configuration bumps this. this number will never go down unless the shared_id changes. If a lower number of version is returned, but the shared_id is the same, the βnewβ configuration is a bad cache and should be ignored.
- shared_id: This ID persists across configuration changes, but changes if the agent gets moved to a new configuration (not just to a new version of a configuration)
from elastic-agent.
Original comment by @michalpristas:
@ph i think this what matt said is important for stateresolver
from elastic-agent.
Original comment by @ph:
@urso I love your proposal here and nice inputs from your filbeat refactoring.
from elastic-agent.
Original comment by @ph:
Looking at the IDS:
ID: This will need to be propagated down to the stateresolver.
version: This is only needed by the fetcher of the configuration
shared_id: This is only needed by the fetcher of the configuration
@michalpristas Now if we move to a sync flow as defined in LINK REDACTED we should be fine if we do this. (pseudo code incoming)
- Receive a configRequest
- newState, steps := Converge(currrentState, configRequest)
- Send steps to operator.
- Check for errors and Call report on the configRequest
from elastic-agent.
Original comment by @ph:
Note the above remove the need for the event bus and we do not have aggregation or discard of events in that flow.
from elastic-agent.
Original comment by @michalpristas:
we will need to remove all queues (pubsubs) and replace them with direct calls, or keep the capability of queues and introduce ACK(succ/err) for commands.
the benefits of ACKs is that it can work with sync flow as well as async flow.
the sync without a pubsub is easier to read.
both of these will require some work for sure
from elastic-agent.
Original comment by @ph:
@michalpristas I've created the proposal as a google docs here and added a tasks list that with a tentative split.
LINK REDACTED
from elastic-agent.
We need to keep this open.
from elastic-agent.
Pinging @elastic/ingest-management (Team:ingest-management)
from elastic-agent.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from elastic-agent.
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
from elastic-agent.
@jlind23 This also partially ties into reporting of input status.
from elastic-agent.
Related Issues (20)
- [Flaky Test]: TestActionDispatcher/Dispatch_multiples_events_returns_one_error β Expected error HOT 5
- allow multiple hosts to be passed in --fleet-server-es flag HOT 8
- QA test: State Store migrations HOT 4
- Make `elasticinframetricsprocessor` available in `otel` mode HOT 1
- [windows] move service startup to beginning of run function HOT 1
- Elastic Agent on Windows cannot be stopped or removed if --delay-enroll is retrying HOT 6
- Handle new action for switching Agent from privileged to unprivileged mode HOT 6
- [Fleet]: Multiple logs: `[elastic_agent][info] got checkin with pid 0` are generated for installed agent. HOT 5
- [Windows] Service startup failing on CI with otel dependencies linked HOT 4
- [Windows] - `system.diskio` datastream missing on Kibana for unprivileged mode. HOT 6
- Kubernetes e2e tests HOT 10
- Installing elastic agent on AWS EKS HOT 5
- Development agent gets unhealthy on adding Elastic Defend. when not added to the primary agent. HOT 9
- Retry artifact downloads in the integration test framework (artifact fetcher) HOT 2
- Support hints based autodiscover for Fleet managed Agents HOT 1
- [E2E test] Ingesting data with OTel-based shipper pipeline
- `panic: The system cannot find the file specified` when installing Elastic Agent on AWS EC2 Windows Server HOT 1
- Elastic Agent should accept CA fingerprint containing semi-columns HOT 2
- High memory and CPU consumption when fleet-server fails to start during enroll HOT 2
- Clarify cli flags for TLS config for fleet-server, ES and the gent itself HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elastic-agent.