alertmanager / alert_manager Goto Github PK

Splunk Alert Manager with advanced reporting on alerts, workflows (modify assignee, status, severity) and auto-resolve features

License: Other

Python 66.11% JavaScript 15.74% CSS 2.25% Shell 0.03% HTML 15.09% C 0.57% Ruby 0.22%

alert_manager's People

Contributors

Stargazers

Watchers

alert_manager's Issues

Documentation improvement

Use Case documentation
Detailed installation instructions
Add documentation to the app

Saving to index of large result sets is somehow limited or broken

sourcetype = alert_results
There seems to be a limitation around 10000 characters per result event produced by alert_handler.py
If there is more data the result does not get written (indexed) completely and therefore breaks the JSON envelope.

Should we deliver the urgencies as a sample file?

Probably we should deliver the app with an alert_urgencies.csv.sample and point transforms.conf to that file.

This allows users to have their own urgency definitions and we don't overwrite the csv when updating the app.

Idea blatantly stolen from ES.

eventtypes.conf missing from default

Please check in eventtypes.conf

Single value trend timerange is wrong

Now: -24h to -24h
Should be: 2x outer timerange

Deleting an alert in settings not working

Deleting an Alert under settings does not seem to work.

Change incident drilldown to a pulldown menu

Currently, there are two drilldowns:

Click on magnifier to open a new window and run the alert search
Click on incident row to load alert job results in inpage-drilldown panel

But there should be a third option, to run the event search of the alert (without statistic command).

To better organize the drilldown, there should be a pulldown menu integrated to the incident row.

Setting auto_ttl_resolve afterwards resolves all previous incidents

If this is the preferred behavior, maybe we should warn about this?

Or maybe we should keep the old incidents open, and only start closing incidents from the time of checkboxing onwards?

Adding new alert_users or alert_settings manually doesn't work

When adding a new row to the alert_users or alert_settings handsontable and trying to save, the backend throws a 500

Add installation verification

Check if TA is installed
Check if "alerts" index exists, warn user
Check if "custom" index already exists, warn if so

Usablity: alert_manager.conf - disable_save_results

The double negative irritates. What about replacing it with a positive: f.ex. save_to_index [0|1]
Typo in alert_manager.conf.spec: Wheter to save results to alerts index or not

Status change to auto_resolve_* should log previous status

For reporting, the previous_status should be logged for the auto_resolve_* stati.

Trend indicator should use white color for numbers

I think the trend numbers should be in white color, as red might be badly readable (eg, on a monitoring screen)

App should provide an auto_subsequent_resolve option

Currently we have an auto_previous_resolve option, that closes existing alerts in state new, when a new alert with the same name is arriving.

This works for use cases, where we don't care about the previous alert, and just want to know which alerts are in an active state.

An alternative use case is, when we want to know how long an alert has persisted. Therefore we need an option to preserve the first incident and close all subsequent alerts.

KPI Reporting: Keine Events werden gefunden

Der Suchstring scheint etwas komisch:
[...] |search status="new" status="resolved" [...]

Kann das sein?

The alert_handler.py script doesn't work with alert mode 'once per result'

The alert_handler.py script breaks trying to get the job at
serverResponse, serverContent = rest.simpleRequest(uri, sessionKey=sessionKey, getargs={'output_mode': 'json'})
(line 115).

When changing incident status the previous owner should be kept

Currently opening the dialog show the default owner (?)

alert.severity 6 (fatal) not considered

According to http://docs.splunk.com/Documentation/Splunk/latest/Admin/savedsearchesconf, alert.severity may have a value from 1 to 6. The alert manager only knows severites from 1 to 5.

It's not clear what's behind "saving alert results to index*

Create separate block in setup.xml
Add explanation
Document in wiki

Specification for custom hooks/integration of external systems [CustomIncidentHandler]

The plan is to provide an interface to build custom python classes, that catch events/hooks of the alert handler, in order to interrogate the incident workflow with custom actions.

List of current known hooks:

create incident
auto previous resolve incident
auto ttl resolve incident
change incident detail
- assign
- change status
- change priority

List of information needed on each hook:
[ to be defined ]

@my2ndhead : Let's discuss this on friday how to proceed.

Edit Status Dialog should not show "Assigned (Auto)" as an option

See title.

Move dashboards to the default folder for release

Status change to auto_previous_resolve does not log...

Looks like auto_previous_resolve does not log at all status changes...

Add alert description

Use saved search description, parse HTML

Create alert_manager add-on for distributed environments

Split Splunk configuration of the alert_manager related to index time configuration to a separate addon:

TA-alert_manager
- props.conf: index time configurations for alert_metadata, alert_results, incident_change
- indexes.conf: alerts index

The question is, if the addon has to be used too, if an all-in-one instance is used (indexing & searching on the same box) or if the main alert_manager app containts necessary configurations, as it already does by today.

Wrong filter used to catch incidents for auto_previous_resolve

The alert handler grabs incidents from the collection when auto_previous_resolve is active for a certain alert. The filter used with the query returns all incidents related to the alert instead of only new ones.

Add trend indicators to the incident posture dashboard

Add trend indicators for single values (number of incidents by urgency) to the incident posture dashboard.
The information should include.

Up/down arrow indicating increase/decrease of nr of incidents
Increase/decrease as number

Integrate e-mail notifications

There should be an option to activate email notification for each hook/action by alert (alert settings).
Also it should be possible to define a html template (or rtf?) per notification type and alert.

Current hooks when email notification could be sent:

create incident
auto previous resolve incident
auto ttl resolve incident
change incident detail
- assign
- change status
- change priority

List of information needed on each hook:
[ to be defined ]

Can may be integrated as CustomIncidentHandler (see issue #7 )

Incident Posture not updated after editing Urgency

After editing the urgency of an Alert the concerning single value in Incident Posture does not get updated.

Wrong TTL used

Unfortunately we can't use this TTL in this way, as it is an auto-counter towards 0:

entry['ttl'] = job['entry'][0]['content']['ttl']

We could use alert.expires

http://docs.splunk.com/Documentation/Splunk/6.2.0/RESTREF/RESTsearch#saved.2Fsearches

Valid values: [number][time-unit]

Sets the period of time to show the alert in the dashboard. Defaults to 24h.

Use [number][time-unit] to specify a time. For example: 60 = 60 seconds, 1m = 1 minute, 1h = 60 minutes = 1 hour.

Add alert manager capabilities

Edit incidents
Edit / remove Incident settings
Edit / remove Alert Manager users

Previous status logged wrong

previous_status has the same value as status:

time=2014-12-28T11:20:30.044053 severity=INFO origin="alert_manager_scheduler" event_id="b4725db3c989febc75aa76382f27d432" user="splunk-system-user" action="auto_ttl_resolve" previous_status="auto_ttl_resolved" status="auto_ttl_resolved" job_id="scheduler__admin__testapp__RMD5384fa1812534b4c4_at_1419761700_97"

Rewrite incident_settings and user_settings to not base on searches

Use bootstrap views instead

HTML incident_posture dashboard has wrong label in nav

After conversion to HTML, the dashboard shows the filename as label in the App nav

Improve incident details table expansion

Add severity and priority
Fix table
Make it pretty :)

Enrichment of incidents with results

It would be good, if incidents could be enriched with data from results.

The detail section could show per alert selected (configurable) fields and their values. This would help incident investigators, who have to decide about further actions, without looking at the full results.

Either fields are fully configurable, or we provide a set of often used fields from CIM, like src, host, user, action.

When there are multiple rows we have to think about presentation of the fields.

curl --get -k https://localhost:8089/servicesNS/admin/testapp/search/jobs/1418908235.66/results -d "output_mode=csv" -d "f=host f=action f=user" -u

alert_handler.py doesn't work on windows

Wrong path for stdout / stderr
Wrong parsing for job_id

Doku + Readme: Typos and Optimization

README.md

old: Due the usage of the App Key Value Store
new: Due to the usage of the App Key Value Store

old: Disable saving Alert results to index: Wheter to
new Disable saving Alert results to index: Whether to

old: configured as globally visible are showed in the list
new: configured as globally visible are shown in the list

old: E-mail notifications on incident assignement
new: E-mail notifications on incident assignment

old: cd $SPLUNK_HOME/bin/script && ln -s ../..
new: cd $SPLUNK_HOME/bin/scripts && ln -s ../..

Alert settings do not show unconfigured alerts who aren't visible globally

They only show up if the alert is set to be export = global or if it's placed in the alert manager app

alert_manager_scheduler is not working on Windows

The Splunk scripted input for the alert_manager_scheduler (e.g. for the auto_ttl_resolve alert scenario) is currently only available on Linux, since a shell script is used to wrap the python script.

For the final submission, the scheduler should also be working on Windows.

alert_handler.py needs to be app aware

This gets only searches from app search or when shared globally.

Get savedsearch settings

uri = '/servicesNS/nobody/search/admin/savedsearch/%s' % alert

The token after "nobody" should change according to the alert's app.

Create an eventtype for the datamodel including the index name
Adjust dashboards containing the index name, refer to the eventtype
Add instructions how to change the index name manually
Let the app setup configure the index

User Settings User Table too small. Save button not visible.

I have already 7 users and can't see the save button....

Add usage instructions to a help dashboard

According to the rules, we should add a help dashboard containing usage instructions:

How to configure alert and incident settings
How to use the incident posture and drilldowns
How to use the reporting dashbaord
How to use the KPI dashboards

Add alert manager as new Splunk alert action

Configure the alert manger as new type of alert actions, make it activatable as option in savedsearches.conf and if possible in the settings UI

Provide demo data

The app should provide demo / sample data, to work without full configuration of alerts
The demo data should cover all features
If possible, activate/deactivate demo data through de UI

Object name refactoring

alert_settings should be incident_settings
alert_users should be ???
alert_users: "user" should be "name"

wrong permissions of py scripts

.py scripts in ../bin/ must be commited to git with executable permissions.

alertmanager / alert_manager Goto Github PK

alert_manager's People

Contributors

Stargazers

Watchers

Forkers

alert_manager's Issues

README.md

Get savedsearch settings

Recommend Projects

Recommend Topics

Recommend Org