jamesoff / simplemonitor Goto Github PK

View Code? Open in Web Editor NEW

403.0 33.0 164.0 3.47 MB

A Python-based network and host monitor

Home Page: https://simplemonitor.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 87.63% CSS 2.72% HTML 6.17% Shell 1.93% Dockerfile 0.61% Makefile 0.45% Ruby 0.05% JavaScript 0.44%

python monitor monitoring

simplemonitor's People

Contributors

Stargazers

Watchers

Forkers

minektur sadatned cloudxtreme arunsingh tlegras techvoot utkarsh-devops ambar-cloudcover nonbeing digfish error454 varunmistry11 mandarup cabalist bkralik leandroferrari pvalsecc peterlavelle xenatisch shengyanjun jotbe mrbungle256 shane0 jhub1 colibasah josuhr igeekdom zeus911 nsweaney 1connect trigop josarsepi jimmkimoon ufabdyop hrabbit dimiauto rabitw cygery harkishen-singh tomoguara team-guardian nightylive ataur1 timypcr amartinr ksananse breezecloud devan0 cqfea bvolpato dkkan knaggita openube thejeshgn r4r3dev rdev-at danlat ac0lyte tzin00 nguyen127001 nullart2 hslatman dougbeal santoshpy progval kostalski ro9ueadmin moem9625 foxley teeso talyoffe iit-ideashop shamahjoe fagg pheuzoune thipuvaasan shakreiner 0xrustlang pelligrag aaditya420 copperbotrov dtembe shifting5164 magicalraccoon fakegit ai-saac mrnyg santoshkowshikhr sagivoulu widyamedia paullcm chitfknc kbarbora aburgool sanogotech andronkyr liqinsg dinoforcered tomasdariusdavainis wxcafe

simplemonitor's Issues

MonitorHost not working on non english Windows OS

On non english windows OS MonitorHost will not work as output of ping is language dependant.
Ex, in French:

C:\Users\tlegras>ping -n 1 -w 5000 127.0.0.1

Envoi d'une requête 'Ping'  127.0.0.1 avec 32 octets de données :
Réponse de 127.0.0.1 : octets=32 temps<1ms TTL=128

Statistiques Ping pour 127.0.0.1:
    Paquets : envoyés = 1, reçus = 1, perdus = 0 (perte 0%),
Durée approximative des boucles en millisecondes :
    Minimum = 0ms, Maximum = 0ms, Moyenne = 0ms

C:\Users\tlegras>ping -n 1 -w 5000 128.0.0.1

Envoi d'une requête 'Ping'  128.0.0.1 avec 32 octets de données :
Réponse de 192.168.169.254 : Impossible de joindre le réseau de destination.

Statistiques Ping pour 128.0.0.1:
    Paquets : envoyés = 1, reçus = 1, perdus = 0 (perte 0%),

Unfortunatly I don't have any solution that would work on any windows workstation. solution could be to make MonitorHost.ping_regexp configurable. in my case in network.py line 172 change :
self.ping_regexp = "Reply from "
with
self.ping_regexp = "Réponse de "
(caution to encoding...)

service status monitor flavors

There is variety of flavors how to actuali check nix service status but now only /usr/local/etc/rc.d/ script is run. E.g. me on ubuntu based distro can check service * status or /etc/init.d/* status ...
I would suggest to have list of commands that are tried to run and if neither success, error is raised.

Make the remote alerter stuff more secure

gap ignored

gap doesn't seem to be honoured. Not being a phython coder at all, but this seems to fix it (sorry, I know, should be a pull request, but I'm also not a github guy).

*** Monitors/monitor.py-orig    Thu Sep  1 15:06:41 2016
--- Monitors/monitor.py Thu Sep  1 15:07:34 2016
***************
*** 77,82 ****
--- 77,84 ----
              self.set_remote_alerting(int(config_options["remote_alerts"]))
          if 'recover_command' in config_options:
              self.set_recover_command(config_options["recover_command"])
+         if 'gap' in config_options:
+             self.set_gap(config_options["gap"])
          self.running_on = self.short_hostname()
          self.name = name

(Opt) enhancement: allow alerters to repeat their message ...

I'm a fan of getting flooded with alarms if things go wrong - just to make sure they will not be oversee (e.g. if mailbox is filled up with other stuff).

This quick hack adds the option "repeat" to the alerter. If not 0, the alerter keeps on sending alarms - but not during OOH - honouring the configured limit - the virtual failure count must be an int multiple of it.

Markus

*** Alerters/alerter.py-orig    Fri Sep  2 11:36:53 2016
--- Alerters/alerter.py Fri Sep  2 12:17:53 2016
***************
*** 12,17 ****
--- 12,18 ----
      hostname = gethostname()
      available = False
      limit = 1
+     repeat = 0

      days = range(0, 7)
      times_type = "always"
***************
*** 34,39 ****
--- 35,42 ----
              self.set_dependencies([x.strip() for x in config_options["depend"].split(",")])
          if 'limit' in config_options:
              self.limit = int(config_options["limit"])
+         if 'repeat' in config_options:
+             self.repeat = int(config_options["repeat"])
          if 'times_type' in config_options:
              times_type = config_options["times_type"]
              if times_type == "always":
***************
*** 133,140 ****
                              return "catchup"
                          else:
                              return "failure"
!             if monitor.virtual_fail_count() == self.limit:
!                 # This is the first time we've failed
                  if out_of_hours:
                      if monitor.name not in self.ooh_failures:
                          self.ooh_failures.append(monitor.name)
--- 136,143 ----
                              return "catchup"
                          else:
                              return "failure"
!             if monitor.virtual_fail_count() == self.limit or (self.repeat and (monitor.virtual_fail_count() % self.limit == 0)):
!                 # This is the first time or nth time we've failed
                  if out_of_hours:
                      if monitor.name not in self.ooh_failures:
                          self.ooh_failures.append(monitor.name)

docker support

Hi there

I just wanted to inform you that I created a docker image for this project. It's available on Docker Hub or on github. If you make you own build, you can connect docker hub with github so that a new image gets automatically built upon pushing or tagging a commit.

cheers

Restarting Application with fail_command

I am trying to have the monitor watch a TCP connection and if it fails to restart the application that initially starts the connection. The way I am currently doing this is by running the .py file through command line using the fail_command in the monitor.ini file but once it does this, the monitor gets stuck in that application resulting in the monitor not watching the reconnected connection.

Any recommendations to have the monitor watch that reconnected connection? First thought is dealing with threading but wondering if there is another way.

New user-submitted class

http://jamesoff.net/site/code/simplemonitor/planned-features/#comment-164361

(http monitor) threw exception during run_test(): local variable 'status' referenced before assignment

http monitor's describe() function isn't up-to-date

MonitorHTTP.describe() doesn't accurately reflect all the options configured on the instance (e.g. acceptable HTTP status codes)

Add timeout option to tests

See #33. Add an option to make a test fail if it's taking too long.

Use shared logic to read and validate configuration options

{virtual_fail_count} is always 0

Hi,
I've use simplemonitor since few week, and I saw that into the fail_command and success_command (into the file monitors.ini) keep the variable {virtual_fail_count} to zero.
Do you have an idea how to debug that ?
I'm on the branch feature/python3. (But it was the same on the master branch)

Obtain and log public IP address

I have a situation, where an internet connection has an automatic failover to a 4G/SIM based connection, which takes about a minute to establish a connection, and will provide another public IP address, which may cause connection issues to certain services.

Therefore, I would like to monitor and log such a change of public IP address.

It would be nice to add an option that allows the public IP address to be obtained and logged.

This requires an external web service to be accessed, whose URL needs to be made a parameter.

To extract the public IP address from the service response, you could either simply use a regex to extract the first IPv4-formatted string in the response ('[0-9]{1,3}.{3}[0-9]{1,3}'), or allow a regex parameter to be supplied.
Possible public ip address services:

https://wtfismyip.com/text
http://api.ipify.org/?format=txt
http://whatismyip.org/ (HTML formatted response)

Command Monitor

I think what is missing is a generic monitor that would allow to launch a preconfigured command (in monitors.ini) and check result using a regexp. This would allow to monitor virtually anything that is not offered by the list of monitor. This monitor could also check if the output is balow a max value (to monitor memory, disk, cpu ...)

Ex1: monitor if a some process is running

command="ps auxww | grep mysoftware"
result_regexp="mysoftware -myparam"

Ex2: if command is returning a value, test if the value is below a given value. In this example we count the number of postgres connection and we check it is below max connection

[postgresql-connectionmax-monitor]
command="ps auxww | grep ^postgres | wc -l"
result_max=100

I have already coded such monitor, I can send you if interesting (not under git however, but this is just one file) + 2 lines in monitor.py

https with Python 3

In python3, into you requirement.txt, you need to add pyOpenSSL.

Without this library, requests is really slower with website how have strong cipher.

Error in startup

Hello,
I get this when i run monitor.py
SimpleMonitor v1.7
--> Loading main config from monitor.ini
--> Loading monitor config from monitors.ini
Unable to trap SIGHUP... maybe it doesn't exist on this platform.
No monitors loaded :(

My monitor.ini looks like this 👍
[monitor]
interval=60

[reporting]
loggers=logfile

[logfile]
type=logfile
filename=monitor.log
only_failures=1

[dummyhost-ping]
type=host
host=192.168.1.3
tolerance=2

Am currently using PYTHON 2.7.13

(svc monitor) Exception while executing svok: argument of type 'NoneType' is not iterable

Over-enthusiastic splitting breaks MonitorCommand on Windows

As per #15, the correct solution to this is to build the list of arguments separately to the command name, so I guess the command will need to be a separate parameter in the config

Replace urllib2 with requests for http monitor

Update docs to describe installing requirements

From #48

Update docs to suggest use of bash -c for commands if e.g. piping is required

Reported by @poblabs on IRC, a fail_command such as

/usr/bin/mailx -A gmail -s "Failure: monitor {name} has failed." email@address <<<"The simplemonitor for {name} has failed.\n\nTime: {failed_at}\nInfo: {info}\n"

does not work correctly.

Write an alerter class which executes a configured external command

[Windows] UnboundLocalError: local variable 'certfile' referenced before assignment

When running on Windows 7, I get the following error log on startup:

>python monitor.py
SimpleMonitor v1.7
--> Loading main config from monitor.ini
--> Loading monitor config from monitors.ini
Unable to trap SIGHUP... maybe it doesn't exist on this platform.
Traceback (most recent call last):
  File "monitor.py", line 407, in <module>
    main()
  File "monitor.py", line 307, in main
    m = load_monitors(m, monitors_file, options.quiet)
  File "monitor.py", line 118, in load_monitors
    new_monitor = Monitors.network.MonitorHTTP(monitor, config_options)
  File "C:\Users\me\Downloads\jamesoff-simplemonitor-v1.6-27-g8e5249f\jamesoff-simplemonitor-8e5249f\Monitors\network.py", line 82, in __init__
    self.certfile = certfile
UnboundLocalError: local variable 'certfile' referenced before assignment

I fixed it by moving lines 82-83 in Monitors/network.py to the end of the if 'certfile' in config_options: block starting on line 66.

This allows my non-https monitor to work properly, but I haven't tested it to see if that change breaks https monitors.

ping command is wrong for non-windows systems

For windows you are doing a "ping -n 1 -w 5000 %2" which sends 1 ping with a timeout of 5000ms or 5 seconds.

If it is not a windows system it basically uses "ping -c1 -t5 %s" which sets the TTL to 5, which is NOT the same as timeout. "-t" is the number of hops. I think you probably mean to use -w5 or -W5.

Python 3 support

No idea how compatible with Python 3 this is, but it would be nice it it worked.

HTML logger should use XHR to update page

Instead of reloading the whole page, the HTML logger should write a JSON file which the HTML page fetches with XHR to refresh the page.

support for SSL certificate client authentication with HTTPMonitor

Hi,
Today I needed to add support for client authentication in HTTPMonitor. I did the modifications with a local copy of git repo (basic support, password protected keyfile are not supported). I am ready to deliver it if interested...
If so... just note that I am a begginer with github :)
I tried to take a look at pull request button: it seems a branch must be created before? I have worked in a branch locally on my workstation, is there some kind of sync to propagate the branch creatin on github? or is it the owner of the project who is doing that?
(i am using gitgui, not github desktop)

DNS lookup monitor

Hi, is somehow possible to check result of TXT DNS lookup?

Switch Monitors to use decorated properties

Instead of the ad-hoc mess in place now

Multiple hosts with same monitor definition and network logging

I have an issue with a setup, where I have central monitor host generating a status page and multiple hosts monitoring diskspace (the monitor is called "diskspace" on all the hosts)
It seems that the central monitor uses just the last information it receives from the individual hosts and it displays just a single line with diskspace status, though there is about 20 hosts. The host information in the status page changes from time to time showing different information from different hosts, but never more hosts at the same time.

Do I have to have different monitor names for every host? If yes, would it be possible to use environment variables in the configuration? The motivation behind is, the hosts are automatically provisioned by the docker container of simplemonitor and I'm not able to prepare unique image for every host.

Timestamps in the log file

The current log file contains integer-type timestamps at the beginning of each line:

monitor.log:

1469763249 dr-http: ok
1469763249 dr-ping: ok
1469763260 dr-http: ok
1469763260 dr-ping: ok
1469763270 dr-http: ok
1469763270 dr-ping: ok
1469763280 dr-http: ok
1469763280 dr-ping: ok
...

When using this log file for trouble-shooting events, it is difficult to interpret these timestamps. Would it not be possible to add an option to have the timestamps formatted in a human readable format, such as ISO 8601, like so...

2016-07-29T21:12:49 dr-http: ok
2016-07-29T21:12:49 dr-ping: ok
2016-07-29T21:13:00 dr-http: ok

Provide better feedback if libraries are missing

e.g. in #48

hierarchical tree organisation in HTMLLogger

Improvement: I am working on a modified version of HTMLLogger where monitors are organised in a tree (using CompoungMonitor). The idea is to expand/collapse top level CompoungMonitor. if interested I could propose a pull request once I have tested it for some time. Changes are mainly in the javascript, and a bit in HTMLLogger. State of the tree is preserved even when page reloads each minute :)

Downtime reports and/or failure time are mismatched by (e.g.) an hour

UTC/localtime?

Notification via external SMTP server with password ?

Hello!

How can I configure this script to notify me with email using external SMTP server?

I found this options
[email]
type=email
host=mailserver.domain.local
from=[email protected]
to=[email protected]

but my SMTP server requires login, password for sending email.

success_command is running in excess

As mentioned in #58 where I am using /bin/bash to email me on fail_command and success_command, I am noticing that I get 1 fail command email which is expected, but on success command email I am receiving 4 of them.

I think the expected behavior should be to receive only 1? Am I missing something?

Thanks!

Slack webhook payload configuration

I've used slack webhooks a bit and from what I understand it expects a post with some data with the message you want posted. However I am not seeing how I can configure that in the my slack alerter.

Slack alerter exception on missing channel

Getting this exception while trying to make the Slack alerting working:
exception caught while alerting for mymonitor: SlackAlerter instance has no attribute 'channel'

The same exception is generated for this config (tried this one as the channel attribute is not mandatory)

[slackalert]
type=slack
url=https://hooks.slack.com/services/my/generated/webhook

but for this one as well:

[slackalert]
type=slack
url=https://hooks.slack.com/services/my/generated/webhook
channel=alerts

Python 3 Support

Any interest in me porting the code to run on Python 3 as well? Is there a preferred way to do that? I see a branch for Python3 but it looks like it is just a travis.yml change.

Should I wait for the subprocess branch to merge?

File "monitor.py", line 295 with open(pidfile, "w") as file_handle:

Hello!

I am trying to launch this script on my CentOS.
I've got error:

File "monitor.py", line 295
with open(pidfile, "w") as file_handle:
^
SyntaxError: invalid syntax

Startup error

esoff-simplemonitor-9a863a6>python monitor.py SimpleMonitor v1.7 --> Loading main config from monitor.ini Traceback (most recent call last): File "monitor.py", line 413, in <module> main() File "monitor.py", line 284, in main interval = config.getint("monitor", "interval") File "C:\Python27\lib\ConfigParser.py", line 359, return self._get(section, int, option) File "C:\Python27\lib\ConfigParser.py", line 356, return conv(self.get(section, option)) File "C:\Python27\lib\ConfigParser.py", line 607, raise NoSectionError(section) ConfigParser.NoSectionError: No section: 'monitor'
I get this error ever time i run monitor.py. I couldn`t figure our whats wrong. please provide solution as soon as possible.

Elapsed time in log file

When performing internet-based requests (HTTP, Ping, DNS), it can be important not only to be able to detect failures, but also be able to determine the duration (milliseconds) of a request.

As a minimum, it would be nice to be able to append the duration onto each log line, e.g.

monitor.log:

1469763249 dr-http: ok (150ms)
1469763249 dr-ping: ok (42ms)
1469763260 dr-http: ok (142ms)
...

An additional option would be to be able to set a maximum acceptable duration on the completion of a monitor request, and have a failure reported if this duration is exceeded.

Add recover_command to documentation

Loggers don't understand timezones?

Reported by email; the logger.format_datetime() function and all the things it calls may ignore timezone info, and log in UTC.

Add group support to loggers

As discussed in #78 it would be nice to have the recently-added group feature work for loggers as well as alerters.

Unable to Run simplemonitor

I recently recloned the project and then typed
python2 monitor.py
and it gave me this:

harkishen@harkishen-Aspire-A515-51G:~/Desktop/git works/simplemonitor$ python2 monitor.py 
SimpleMonitor v1.7
--> Loading main config from monitor.ini
--> Loading monitor config from tests/monitors.ini
Adding host monitor test1
Adding fail monitor test2
Adding command monitor command1
Adding command monitor command2
Adding command monitor command3
Adding command monitor command4
Adding http monitor http
--> Loaded 7 monitors.

Adding slack alerter slack

--> Starting... (loop runs every 5s) Hit ^C to stop
error_count = 0, interval = 5 --> 0
1
error_count = 1, interval = 5 --> 1
.error_count = 2, interval = 5 --> 2
error_count = 3, interval = 5 --> 3
.error_count = 4, interval = 5 --> 4
error_count = 5, interval = 5 --> 0
.error_count = 0, interval = 5 --> 0
error_count = 1, interval = 5 --> 1
.error_count = 2, interval = 5 --> 2
error_count = 3, interval = 5 --> 3
.error_count = 4, interval = 5 --> 4
^C
--> Quitting.
--> Finished.

It failed to run. Can anyone tell me whats going on?
This is my python version :

Python 2.7.14 (default, Sep 23 2017, 22:06:14) 
[GCC 7.2.0] on linux2

ping times not displayed

I am testing on win7 with python 3.6.

last_run_duration is logged but the actual ping time is not. I looked into this and may have found something.

in simplemonitor/Monitors/network.py

within MonitorHost class, run_test function is present

    def run_test(self):
        r = re.compile(self.ping_regexp)
        r2 = re.compile(self.time_regexp)
        success = False
        pingtime = 0.0
        try:
            cmd = (self.ping_command % self.host).split(' ')
            output = subprocess.check_output(cmd)
            for line in str(output).split("\n"):
                matches = r.search(line)
                if matches:
                    success = True
                else:
                    matches = r2.search(line)
                    if matches:
                        pingtime = matches.group("ms")
        except Exception as e:
            self.record_fail(e)
            return False
        if success:
            if pingtime > 0:
                self.record_success("%sms" % pingtime)
            else:
                self.record_success()
            return True
        self.record_fail()
return False

~~If the first regex (r) is matched, the next one is not run. The next regex is the one that grabs the actual ping time and stores it in last_result via self.record_success.~~
Edit: As it happens, I overlooked the for loop. ~~Still looking.~~
Edit: Problem is with:

for line in str(output).split("\n"):

output needs to be decoded like this:

for line in output.decode('utf-8').split("\n"):

ultimately, i'd like the actual ping to be logged. If the above is fixed, is the only way to print that is by modifying save_result2 in FileLogger? Or is there an option that I overlooked?

PS: How do I quote code here without literally pasting it?

Use subprocess.call() and friends instead of Popen()

For safer command execution.

recover_command has Shell=True, too :/

Ping is showing as 'passed' when the host is 'unreachable'

Testing this on a windows machine on my home network. I assume it's user error.

I set the monitors.ini to ping my chromecast ip

[chromecast-ping]
type=host
host="ip Address of my chromecast"
tolerance=1

and my monitor.ini to

[monitor]
interval=30

[reporting]
loggers=logfile

[logfile]
type=logfile
filename=monitor.log
only_failures=0

When I run it, the log file says

1518975287 chromecast-ping: ok (2.751s)
1518975320 chromecast-ping: ok (2.996s)

But if I were to ping the IP from the cmd line, I get a 'destination host unreachable.'