• 10 dec 2017: forum version update. In case of issues use this topic.
  • 30 nov 2017: pilight moved servers. In case of issues use this topic.
Hello There, Guest! Login Register


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pilight 5.0 Watchdog - General discussion
#1
Information 
Hi,
I'm creating this thread as a general discussion place about the new watchdog feature, and for people to get help when they need it (me Smile ).
If you haven't read it yet, pilight 5.0 includes a new feature, a so called watchdog. Basically, what it does is to constantly monitor the CPU and RAM usage by pilight, and
Quote:Whenever pilight uses more than 70% CPU or RAM, it will check again in 10 seconds. If the CPU or RAM usage is still larger than 70%, pilight will gracefully stop. If the CPU usage is above 90% at the second check, pilight will terminate immediately.

But now I have encountered this problem:
When viewing the web GUI over a VPN and the VPN disconnects for whatever reason, the thread webserver broadcast starts maxing out on the CPU. The watchdog notices this, and terminates pilight, as described above. The corresponding log section looks like this:
Code:
[Aug 13 00:33:05:385851] pilight-daemon: ERROR: ----- Thread Profiling -----
[Aug 13 00:33:05:386426] pilight-daemon: ERROR: - thread lirc: 0.010513%
[Aug 13 00:33:05:386875] pilight-daemon: ERROR: - thread datetime: 0.100148%
[Aug 13 00:33:05:387299] pilight-daemon: ERROR: - thread openweathermap: 0.000669%
[Aug 13 00:33:05:387713] pilight-daemon: ERROR: - thread dht22: 0.032978%
[Aug 13 00:33:05:388125] pilight-daemon: ERROR: - thread rpi_temp: 0.007540%
[Aug 13 00:33:05:397830] pilight-daemon: ERROR: - thread socket: 0.000000%
[Aug 13 00:33:05:398275] pilight-daemon: ERROR: - thread ssdp: 0.000000%
[Aug 13 00:33:05:398669] pilight-daemon: ERROR: - thread sender: 0.000000%
[Aug 13 00:33:05:399062] pilight-daemon: ERROR: - thread broadcaster: 0.173898%
[Aug 13 00:33:05:399483] pilight-daemon: ERROR: - thread updater: 0.074581%
[Aug 13 00:33:05:400004] pilight-daemon: ERROR: - thread 433gpio: 0.803634%
[Aug 13 00:33:05:400482] pilight-daemon: ERROR: - thread receive parser: 0.000000%
[Aug 13 00:33:05:400871] pilight-daemon: ERROR: - thread webserver worker #0: 0.031680%
[Aug 13 00:33:05:401455] pilight-daemon: ERROR: - thread webserver client: 0.020463%
[Aug 13 00:33:05:401944] pilight-daemon: ERROR: - thread webserver broadcast: 0.008067%
[Aug 13 00:33:05:402366] pilight-daemon: ERROR: - thread firmware upgrader: 0.006091%
[Aug 13 00:33:05:402773] pilight-daemon: ERROR: ----- Thread Profiling -----
[Aug 13 00:33:05:403132] pilight-daemon: ERROR: cpu usage way too high 99.630146
[Aug 13 00:33:05:403537] pilight-daemon: ERROR: checking again in 10 seconds
[Aug 13 00:33:16:405403] pilight-daemon: ERROR: ----- Thread Profiling -----
[Aug 13 00:33:16:405948] pilight-daemon: ERROR: - thread lirc: 0.011198%
[Aug 13 00:33:16:406397] pilight-daemon: ERROR: - thread datetime: 0.096157%
[Aug 13 00:33:16:407014] pilight-daemon: ERROR: - thread openweathermap: 0.000000%
[Aug 13 00:33:16:407452] pilight-daemon: ERROR: - thread dht22: 0.029901%
[Aug 13 00:33:16:408031] pilight-daemon: ERROR: - thread rpi_temp: 0.006706%
[Aug 13 00:33:16:408470] pilight-daemon: ERROR: - thread socket: 0.000000%
[Aug 13 00:33:16:408867] pilight-daemon: ERROR: - thread ssdp: 0.000000%
[Aug 13 00:33:16:409246] pilight-daemon: ERROR: - thread sender: 0.000000%
[Aug 13 00:33:16:409746] pilight-daemon: ERROR: - thread broadcaster: 0.162170%
[Aug 13 00:33:16:410177] pilight-daemon: ERROR: - thread updater: 0.066009%
[Aug 13 00:33:16:410588] pilight-daemon: ERROR: - thread 433gpio: 0.864769%
[Aug 13 00:33:16:410995] pilight-daemon: ERROR: - thread receive parser: 0.000000%
[Aug 13 00:33:16:416148] pilight-daemon: ERROR: - thread webserver worker #0: 0.007544%
[Aug 13 00:33:16:416629] pilight-daemon: ERROR: - thread webserver client: 0.000000%
[Aug 13 00:33:16:417006] pilight-daemon: ERROR: - thread webserver broadcast: 98.022847%
[Aug 13 00:33:16:417424] pilight-daemon: ERROR: - thread firmware upgrader: 0.006246%
[Aug 13 00:33:16:417818] pilight-daemon: ERROR: ----- Thread Profiling -----
[Aug 13 00:33:16:418170] pilight-daemon: ERROR: cpu usage still way too high 99.367499%, exiting
(Observation: Why is the watchdog starting the procedure in the first place? According to the first Thread profiling, none of the threads require more than even 1% CPU, but somehow watchdog logs a CPU usage of 99.630146. In the second Thread Profiling, the thread webserver broadcast is indeed at 98% CPU, but not in the First Thread Profiling. Can someone please explain this?)

Back to the VPN problem:
As my VPN automatically restarts and reconnects itself, I would like pilight to keep on running, and 'sit out' the maxed out CPU. This would not matter anyway, as I only run pilight on that RPi, and the VPN is scheduled to disconnect at night, so nobody would notice. It does not even have a closed case, so overheating is not likely at all.
Since the introduction of this feature, pilight has terminated every time the VPN was reset. My question is: Is there a way to disable the watchdog feature?

Another bug(?) I have found:
The watchdog sometimes posts this in the log-file:
Code:
[Aug 14 08:07:53:774151] pilight-daemon: ERROR: ----- Thread Profiling -----
[Aug 14 08:07:53:774748] pilight-daemon: ERROR: - thread lirc: 0.000000%
[Aug 14 08:07:53:775210] pilight-daemon: ERROR: - thread datetime: 0.000000%
[Aug 14 08:07:53:775597] pilight-daemon: ERROR: - thread openweathermap: 0.000000%
[Aug 14 08:07:53:776002] pilight-daemon: ERROR: - thread dht22: 0.000000%
[Aug 14 08:07:53:776437] pilight-daemon: ERROR: - thread rpi_temp: 0.000000%
[Aug 14 08:07:53:776858] pilight-daemon: ERROR: - thread socket: 0.000000%
[Aug 14 08:07:53:777289] pilight-daemon: ERROR: - thread ssdp: 0.000000%
[Aug 14 08:07:53:782032] pilight-daemon: ERROR: - thread sender: 0.000000%
[Aug 14 08:07:53:782489] pilight-daemon: ERROR: - thread broadcaster: 0.000000%
[Aug 14 08:07:53:782911] pilight-daemon: ERROR: - thread updater: 0.000000%
[Aug 14 08:07:53:783335] pilight-daemon: ERROR: - thread 433gpio: 0.000000%
[Aug 14 08:07:53:783754] pilight-daemon: ERROR: - thread receive parser: 0.000000%
[Aug 14 08:07:53:784177] pilight-daemon: ERROR: - thread webserver worker #0: 0.000000%
[Aug 14 08:07:53:784591] pilight-daemon: ERROR: - thread webserver client: 0.000000%
[Aug 14 08:07:53:785018] pilight-daemon: ERROR: - thread webserver broadcast: 0.000000%
[Aug 14 08:07:53:785443] pilight-daemon: ERROR: - thread firmware upgrader: 0.000000%
[Aug 14 08:07:53:785846] pilight-daemon: ERROR: ----- Thread Profiling -----
[Aug 14 08:07:53:786207] pilight-daemon: ERROR: cpu usage too high 63.017406
[Aug 14 08:07:53:786736] pilight-daemon: ERROR: checking again in 10 seconds

According to the Thread Profiling, none of the threads require any CPU load at all, which is weird enough, but somehow, the watchdog gets a total CPU load of 63%. It also states to check again in 10 seconds, but nothing is ever logged. This makes me wonder whether the watchdog only monitors pilight CPU load (like the description), or the CPU load of all processes, which may be the cause of logs like this, where the watchdog complains, even if pilight is using no CPU.

-pilino1234
 
Reply
#2
I can confirm this behaviour, i have made some suggestions at the other CPU usage thread. We should try to keep not too many threads open on the same subject, and i propose to close those here and keep the oldest one open.
 
Reply
#3
The CPU usage thread was mainly about discussing the high usage thread. This is about how and if we need this new feature. What are your suggestions about a proper implementation of it?
 
Reply
#4
I think the feature in itself is a good idea, but there seem to be some issues with the implementation Wink
Also, an option to disable it if you really don't want it might be good. Or an option for openVPN support, that when the VPN disconnects, that it will wait until it reconnects instead of killing the process. High CPU does not matter on my pi as it is not used for anything else, and while the tunnel is down I can't access it anyways.

Another option might be to have the possibility to set the time between the second manually, say 5 minutes. If CPU usage still very high after a that long time, then it is maybe time to think about a kill.

PS, would like a solution to this soon, as I need the remote pilight-daemon to run very stable, and not kill itself every day.
 
Reply
#5
I would opt more for a switch to disable the automatic killing of pilight. The CPU reports are pretty handy. But i still believe that pilight should never use more than about 10% with filter. So, if these error occur, we need to look at the causes of them.
 
Reply
#6
Agreed, I also think that the PU reports showing which thread uses how much CPU are very useful. However, it would be helpful if the logged values are correct as well. For example, the following section from my log just doesn't quite make sense...
Code:
[Aug 14 08:07:53:774151] pilight-daemon: ERROR: ----- Thread Profiling -----
[Aug 14 08:07:53:774748] pilight-daemon: ERROR: - thread lirc: 0.000000%
[Aug 14 08:07:53:775210] pilight-daemon: ERROR: - thread datetime: 0.000000%
[Aug 14 08:07:53:775597] pilight-daemon: ERROR: - thread openweathermap: 0.000000%
[Aug 14 08:07:53:776002] pilight-daemon: ERROR: - thread dht22: 0.000000%
[Aug 14 08:07:53:776437] pilight-daemon: ERROR: - thread rpi_temp: 0.000000%
[Aug 14 08:07:53:776858] pilight-daemon: ERROR: - thread socket: 0.000000%
[Aug 14 08:07:53:777289] pilight-daemon: ERROR: - thread ssdp: 0.000000%
[Aug 14 08:07:53:782032] pilight-daemon: ERROR: - thread sender: 0.000000%
[Aug 14 08:07:53:782489] pilight-daemon: ERROR: - thread broadcaster: 0.000000%
[Aug 14 08:07:53:782911] pilight-daemon: ERROR: - thread updater: 0.000000%
[Aug 14 08:07:53:783335] pilight-daemon: ERROR: - thread 433gpio: 0.000000%
[Aug 14 08:07:53:783754] pilight-daemon: ERROR: - thread receive parser: 0.000000%
[Aug 14 08:07:53:784177] pilight-daemon: ERROR: - thread webserver worker #0: 0.000000%
[Aug 14 08:07:53:784591] pilight-daemon: ERROR: - thread webserver client: 0.000000%
[Aug 14 08:07:53:785018] pilight-daemon: ERROR: - thread webserver broadcast: 0.000000%
[Aug 14 08:07:53:785443] pilight-daemon: ERROR: - thread firmware upgrader: 0.000000%
[Aug 14 08:07:53:785846] pilight-daemon: ERROR: ----- Thread Profiling -----
[Aug 14 08:07:53:786207] pilight-daemon: ERROR: cpu usage too high 63.017406
[Aug 14 08:07:53:786736] pilight-daemon: ERROR: checking again in 10 seconds

According to the watchdog, pilight uses 63% of the CPU, while according to the log, none of the threads require any CPU at all. Either the separate thread CPU usage logging us not working, or the watchdog measures the total CU load of all processes.
The log also says that it will check again in 10 seconds, but nothing is ever logged. Either it is a bug that it does not measure again, or it just doesn't log again.

In general, I am for a further, and more advanced, implementation of the watchdog. My suggestions are:
  • Add some way of disabling the watchdog
  • Fix the logging bugs I have described above
  • Add an option to set a custom interval until the next check
I would like a way of disabling this feature while it is not quite complete as soon as possible, as it is quite annoying to have it crashing every time the VPN disconnects. It has happened 4 times alone today. However, I would like the pilight-daemon to run as stable as possible.
 
Reply
#7
Quote:According to the watchdog, pilight uses 63% of the CPU, while according to the log, none of the threads require any CPU at all. Either the separate thread CPU usage logging us not working, or the watchdog measures the total CU load of all processes.
The thread profiler doesn't run all the time. It takes too much resources so it only starts profiling whenever the CPU usage is high. When the profiler starts, it doesn't know yet what the CPU usage of each thread is. It has to poll this over time. So in the second case, it actually gathered some data. Maybe i can surpress the first profile all together.

Quote:The log also says that it will check again in 10 seconds, but nothing is ever logged. Either it is a bug that it does not measure again, or it just doesn't log again.
This means that the CPU usage is normal again.

Quote:Add some way of disabling the watchdog
Watchdog or the automatic kill switch?

Quote:Add an option to set a custom interval until the next check
I would opt for a better implementation of the interval so it will work for everyone.

Quote:I would like a way of disabling this feature while it is not quite complete as soon as possible, as it is quite annoying to have it crashing every time the VPN disconnects.
You can always manually disable the specific watchdog part of pilight in the code itself:
https://github.com/pilight/pilight/blob/...2123-L2179
 
Reply
#8
Quote:The thread profiler doesn't run all the time. It takes too much resources so it only starts profiling whenever the CPU usage is high. When the profiler starts, it doesn't know yet what the CPU usage of each thread is. It has to poll this over time. So in the second case, it actually gathered some data. Maybe i can surpress the first profile all together.
Yes, I think that would be good. Just logging values which are not even true is kind of unnecessary. Or make it so that there are three (i.e. add one in between) so that the first messy one is not logged (sort of as to 'initialize' the profiler), but the second one has correct values and serves as what the first profiling is doing now, and the third new one will be the one which decides whether pilight should kill itself (and how) or not.

Quote:This means that the CPU usage is normal again.
Ok, It would be great if it would state this in the log Smile , because how are you meant to know unless you were told that it means it is normal again

Quote:Watchdog or the automatic kill switch?
Not quite sure actually. By killswitch you mean what is executed if the CPU/RAM usage stays too high, right? Maybe have a setting in settings.json to disable either none, just the killswitch, or both the watchdog and the killswitch. Disabling the watchdog but not the killswitch sounds kind of like it won't work Tongue

Quote:I would opt for a better implementation of the interval so it will work for everyone.
Sounds good.

Quote:You can always manually disable the specific watchdog part of pilight in the code itself:
https://github.com/pilight/pilight/blob/...2123-L2179
So, what exactly do I edit? Do I set the values on https://github.com/pilight/pilight/blob/...on.c#L2146 and https://github.com/pilight/pilight/blob/...on.c#L2171 to 0 instead of 1 to disable killing?

Also, I have found that if I do not have the tab with the web GUI open all the time, and only open it when I really need it, pilight does not kill itself or even start using large amounts of CPU. Simple but effective Smile
 
Reply
#9
pilight stopped serveral times cause of too much cpu-usage :-(
Could you please build in a config-statement like -> watchdog 0/1.
With fhem, pilight and php sending measurements to a mysql database the load is allowed to be high :-) but I need a running pilight to switch my actors :-)

Regards,
Ingo
 
Reply
#10
Much in line with your other post. That setting already exists. Check the wiki.
 
Reply
  


Possibly Related Threads...
Thread Author Replies Views Last Post
  [Solved] pilight service crashing on first webserver access after reboot VrahoK 20 663 12-21-2019, 09:46 AM
Last Post: curlymo
  pilight-control modify values coolinx 16 731 11-13-2019, 08:02 PM
Last Post: curlymo
  Bug: double free or corruption in pilight-send blackzombie 12 710 10-07-2019, 08:15 PM
Last Post: blackzombie
  [Fixed] High CPU usage when pilight usb nano disconnects DieterK 1 294 08-13-2019, 05:43 PM
Last Post: curlymo
  pilight Nano USB interface curlymo 228 105,633 07-10-2019, 06:14 PM
Last Post: curlymo
  problems compiling pilight on Odroid C2 WitchDoctor 101 13,479 03-14-2019, 09:01 PM
Last Post: curlymo
  pilight 8 what chages for custom protocols? polo 11 3,581 02-15-2019, 06:22 PM
Last Post: polo
  pilight-debug shows nothing minhdomanh 3 742 10-18-2018, 07:01 AM
Last Post: felfert
  pilight-send and pilight-daemon DieterK 0 723 06-20-2018, 12:44 AM
Last Post: DieterK
  Lights not switched on or off by rules in pilight 8.1.0 rorie 22 3,225 06-12-2018, 03:56 PM
Last Post: curlymo

Forum Jump:


Browsing: 1 Guest(s)