Git Product home page Git Product logo

Comments (13)

vernesong avatar vernesong commented on July 23, 2024

For now, there has an Permission denied error with some scripts after build firmware .
I will fix it as soon as possible, you can find the script in the /usr/share/openclash/openclash_watchdog.sh and don't forget ensure the file's executable privilege.

from openclash.

vernesong avatar vernesong commented on July 23, 2024

Would you mind testing the lastest release and feedback?

from openclash.

hsghost avatar hsghost commented on July 23, 2024

On the new release, v0.34.5-beta, I checked the file you mentioned above. It seems the script is actually running, despite the Luci UI reports it as NOT RUNNING.

屏幕快照 2019-09-23 05 12 39

屏幕快照 2019-09-23 05 17 38

Both TOP and Luci Status > Processes are showing its running, too. Although whether or not it's running NORMALLY is unknown. Any ways to check that?

BTY, the sh: bad number prompt during installation persists, too. Don't know where that comes from.

from openclash.

vernesong avatar vernesong commented on July 23, 2024

I use this command ps |grep openclash_watchdog.sh |grep -v grep 2>/dev/null to confirm whether the scripts is working, the luci page just do the same thing. Maybe you should confirm this command work well first.
Also, watchdog will clean the log file when the size reaches limitation(90kb).

from openclash.

hsghost avatar hsghost commented on July 23, 2024

root@GL-B1300:~# ps |grep openclash_watchdog.sh |grep -v grep 2>/dev/null
1008 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
15268 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
17771 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
26653 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
28697 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
30043 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
31233 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh

Is it normal to have so many watchdog instances running?

from openclash.

vernesong avatar vernesong commented on July 23, 2024

This is abnormal, The script should have only one process running.

from openclash.

hsghost avatar hsghost commented on July 23, 2024

After a reboot and a continuous running for several hours, it seems it's normal now.

root@GL-B1300:~# ps |grep openclash_watchdog.sh |grep -v grep 2>/dev/null
4544 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh

I will continue to test its reliability and report any issue I encounter. Thanks for the work @vernesong !

from openclash.

hsghost avatar hsghost commented on July 23, 2024

After an extended running, multiple watchdog instances are appearing again.

root@GL-B1300:~# ps |grep openclash_watchdog.sh |grep -v grep 2>/dev/null
4544 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
6972 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh

Also, two new problems arose.

  1. The Log is not showing any recent entries. The latest one entry was from 8 hours ago. The cleaning up notice on the top IS showing a recent time, while everything below were from 8+ hours ago. Don't know whether it's a logging service malfunction or an erroneous truncation made by the watchdog daemon.

  2. The server appointment I made to the Clash Dashboard hours ago, was gone. The selection, when I checked, was on the Fallback, which is also the starting up default. Not sure whether or not this was caused by a restart of the Clash daemon during the hours' running, as the log was missing and I'm having nowhere to check. Also, whether this is related to the multiple watchdog symptom worth an inspection, I suspect.

A quick peek on the Luci System Log:

屏幕快照 2019-09-24 02 04 38

Seems that there WAS a restart during the previous hours, yet I'm not sure.

Also curious, is it safe to keep on running with multiple watchdog processes co-existing? Restarting my router now to avoid potential troubles.

from openclash.

hsghost avatar hsghost commented on July 23, 2024

root@GL-B1300:~# ps |grep openclash_watchdog.sh |grep -v grep 2>/dev/null
4550 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
22603 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh

Confirmed that, on my router (GL.iNet GL-B1300), the watchdog process is not exiting correctly when a restart of the main service takes place. Again, from my System Log:

屏幕快照 2019-09-24 23 41 29

My observations and suspicions are as follows.

I'm setting an automatic subscription update at 21:00 every day, so by that time today and yesterday, the service was getting a restart to make the config changes to take effort. Yet the restart only kills the previous core process correctly. When it comes to the watchdog process, it fails to keep it in singleton.

From my earlier manually starting test, I could see that the design was to discover the previously running watchdog process, and exit itself on a secondary run, so that the old one keeps to be the only one up and running.

root@GL-B1300:/usr/share/openclash# ./openclash_watchdog.sh
another clash_watchdog.sh is running,exit

Yet, my suspicion is that, this mechanism does not work as planned, when it comes to the automatic restart during a config update. A service restart initiates a new watchdog process, yet for some reason it keeps on running. So, that day when I made several changes and committed the configs, the service got restarted several times. Then, when I checked back there was 7 running instances of the watchdog.

A peek into the watchdog script code reveals that there's a checking for the running status of the core. I suspect that your service design is to having the watchdog process running first, then, when the watchdog discovers that there's no core process running, it creates a new core process. If that's the case, on a service restart occasion, there's a chance that the forking of a new core process marks the watchdog a parent process. And for a parent process, whether or not it is allowed to quit without the termination of all its children is an OS specific design. My suggestion on this is to have a non-watching code in the service to initiate the core process prior to the running of a watchdog, so that the watchdog processes can become standalone and terminate normally when appropriate.

Meanwhile, I'm not seeing any significant side effect of having multiple watchdogs running. Yet still, I'm turning off the automatic update for now, just in case for a potential malfunction.

And still, the reason why the Luci UI on my router is constantly showing NOT RUNNING for the watchdog is another mystery to be probed.

from openclash.

vernesong avatar vernesong commented on July 23, 2024

I can not find the reason so far why openclash_watchdog.sh started others processes while these commands below existing, witch means only one process be allow starting-up.
status=$(ps|grep -c openclash_watchdog.sh)
[ "$status" -gt "3" ] && echo "another OpenClash_watchdog.sh is running, exit "
[ "$status" -gt "3" ] && exit 0
Maybe, multiple processes is the reason why the script can not stop watchdog and the luci state abnormal, so I have changed some commands in the lastest commit that will avoid this situation.
You can confirm these commands below first that work for stopping the watchdog on your router.
kill -9 "$(ps |grep openclash_watchdog.sh |grep -v grep |awk '{print $1}' 2>/dev/null)" >/dev/null 2>&1
By the way, There are few serious side effects when multiple watchdogs running, but unstabitily

from openclash.

hsghost avatar hsghost commented on July 23, 2024

Well, can you explain to me why the number you're testing against is 3? I did tested with a little mod to your code:

ps |grep openclash_watchdog.sh
status=$(ps|grep -c openclash_watchdog.sh)
echo $status
[ "$status" -gt "3" ] && ...

And the output was like

root@GL-B1300:~# /usr/share/openclash/openclash_watchdog.sh
578 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
6651 root 1412 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
6653 root 1412 S grep openclash_watchdog.sh
4
another clash_watchdog.sh is running,exit

There were 3 instances showing yet the returned number is 4. Was it because of the parentheses? A sub-bash process?

Another discovery is that when running in background, $status always gets a number of 1:

root@GL-B1300:# /usr/share/openclash/openclash_watchdog.sh &
8028
root@GL-B1300:# 8030 root 1412 S grep openclash_watchdog.sh
1

The output messed up with the prompt, yet still we can see it was getting a single process detected, which was the grep nested in the testing code. None of the other watchdog processes, including the original one ran by the service, and the newly created watchdog itself, was detected. So the [ "$status" -gt "3" ] test never gets passed, and you can create as many watchdog processes running on the system with & or nohup. I don't know what has caused this, but if you're using a similar method to initiate the watchdog during a service restart, this might be the cause of the multiple watchdog issue. The detection code in the script seems to be only working with non-background executions.

from openclash.

hsghost avatar hsghost commented on July 23, 2024

I did a further test and found the reason. I created a test.sh and put a simple ps command in it. I then tested running it with and without &:

root@GL-B1300:/tmp# ./test.sh
PID USER VSZ STAT COMMAND
1 root 1388 S /sbin/procd
...
30163 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openclash_watchdog.sh
...

root@GL-B1300:/tmp# ./test.sh &
17803
root@GL-B1300:/tmp# PID USER VSZ STAT COMMAND
1 root 1388 S /sbin/procd
...
30163 root 1420 S {openclash_watch} /bin/sh /usr/share/openclash/openc
...

The background running output was truncated after char 78 for each line. All the long cmdlines are getting a truncation. I think this should be related to some environment variable in my router firmware.

Meanwhile, I've changed the code to:

status=$(ps|grep -c {openclash_watch})

And it solved the problem. The truncation won't affect the detection anymore. Confirmed with a config update, no more parallel watchdogs haunting. If there could be a side effect by using {openclash_watch}, please let me know.

And if you're using the same mechanism for the Luci UI, same explanation could be used for the erroneous running status report. This is a minor bug. Yet, it is always a better solution to register the watchdog as a separate service and report its own running status upon inquiry (like openclash_watch status), I feel. A service can record the PID when it creates the process, thus ensures a singleton running without messing up with dynamic detections.

from openclash.

hsghost avatar hsghost commented on July 23, 2024

Another (better) way to ensure singleton is to use flock: How to implement singleton in shell script, FYI.

Closing the issue.

from openclash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.