• 10 dec 2017: forum version update. In case of issues use this topic.
  • 30 nov 2017: pilight moved servers. In case of issues use this topic.
Hello There, Guest! Login Register

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Chasing a Bug in daemon.c
For months I am trying to catch a segfault condition, that magically disappeared whenever i had pilight-daemon running under gdb.
After todays incorporation of the latest changes of the development branch and integration of #PR317 gdb started to deliver results.

I have to note upfront that I am using my branch test_dev, this branch is supporting a significant number of unmerged PR's and protocols that I do need at my home (the most notable changes are: oregon_21, philipshue, and somfy and a modified daemon.c module).

Two pi's are operating in an adhoc network environment, the Master is a PI2 connected to Ethernet and equipped with an 433MHz Rx board (No Tx), the slave is a PI3 connected via WIFI and equipped with a 433 Tx board (No Rx).

The segfault occurs in daemon.c at the following if statement (standard daemon.c L336)/(L359 in my case):
(I have not made any changes to this part (the JSON configuration) of daemon.c)
} else {
                    /* Update the config */
                    if(devices_update(bcqueue->protoname, bcqueue->jmessage, bcqueue->origin, &jret) == 0) {
                        char *tmp = json_stringify(jret, NULL);
                        struct clients_t *tmp_clients = clients;
                        unsigned short match1 = 0, match2 = 0;

L335/L368:                        while(tmp_clients) {
L336/L369:                            if(tmp_clients->config == 1) {
bcqueue->protoname is pointing to the philipshue protocol
bcqueue->origin has a value of MASTER (i operate two PI's in an ADHOC network, the master receives 433MHz frames with its Rx hardware module and uses the Pi2's Ethernet network adaptor, the client transmits 433MHz frames with its Tx hardware module and uses the Pi3's integrated WLAN network adaptor)

Fact is:
The read operation of the variable tmp_clients in L338/368 delivers a true result, thus the loop is not terminated
match1 and match2 have a value of 1.

The while loop can not proceed as planned due to:
- tmp_clients->config is pointing to an unallocated memory location

- the loop should have been terminated
- someone has freed up the memory structure before above code is executed.

Some hints on what the code and variables are supposed to control is appreciated.
My problem is that the problem rarely occurs while running pilight-daemon under gdb in debug mode and it happens quite frequently running it as a service. (I am using screen to keep the gdb session alive in the background).

1. Could a problem occur due to the usage of 2 different network links in an adhoc setup environment (e.q. a packet gets lost on the WIFI link - or won't travel fast enough (e.q. get's past by a newer packet) ?) ?
2. Could there be a racing condition between tasks ?
3. What can trigger that memory location (tmp_clients->config) to be no longer being allocated ?

The reason i am asking this is the following typical stacktrace in pilight.err (indicating that there is a hidden issue in the broadcast mechanism on multi core machines):
[Nov 25 16:47:19:407242] pilight-daemon: -- STACKTRACE (5 FRAMES) --
[Nov 25 16:47:19:409800] pilight-daemon: __default_sa_restorer_v2       ip = 0x76c62180, sp = 0x70cfeac8
[Nov 25 16:47:19:411992] pilight-daemon: broadcast                      ip =    0x20fd8, sp = 0x70cfedb8
[Nov 25 16:47:19:414090] pilight-daemon: start_thread                   ip = 0x76e46e90, sp = 0x70cfee60
[Nov 25 16:47:19:416033] pilight-daemon: clone                          ip = 0x76d04598, sp = 0x70cfefa0
[Dec 04 18:41:05:310215] pilight-daemon: -- STACKTRACE (5 FRAMES) --
[Dec 04 18:41:05:312879] pilight-daemon: __default_sa_restorer_v2       ip = 0x76c45180, sp = 0x70bfeac8
[Dec 04 18:41:05:314692] pilight-daemon: broadcast                      ip =    0x20fd8, sp = 0x70bfedb8
[Dec 04 18:41:05:316703] pilight-daemon: start_thread                   ip = 0x76e29e90, sp = 0x70bfee60
[Dec 04 18:41:05:318594] pilight-daemon: clone                          ip = 0x76ce7598, sp = 0x70bfefa0
[Dec 09 01:58:16:274618] pilight-daemon: -- STACKTRACE (5 FRAMES) --
[Dec 09 01:58:16:277223] pilight-daemon: __default_sa_restorer_v2       ip = 0x76ce4180, sp = 0x70bfeac8
[Dec 09 01:58:16:282746] pilight-daemon: broadcast                      ip =    0x20fd8, sp = 0x70bfedb8
[Dec 09 01:58:16:284926] pilight-daemon: start_thread                   ip = 0x76ec8e90, sp = 0x70bfee60
[Dec 09 01:58:16:286989] pilight-daemon: clone                          ip = 0x76d86598, sp = 0x70bfefa0
[ Dec 11 15:35:07:35695] pilight-daemon: -- STACKTRACE (5 FRAMES) --
[ Dec 11 15:35:07:38613] pilight-daemon: __default_sa_restorer_v2       ip = 0x76d26180, sp = 0x70afeac8
[ Dec 11 15:35:07:41071] pilight-daemon: broadcast                      ip =    0x21018, sp = 0x70afedb8
[ Dec 11 15:35:07:43497] pilight-daemon: start_thread                   ip = 0x76f0ae90, sp = 0x70afee60
[ Dec 11 15:35:07:45906] pilight-daemon: clone                          ip = 0x76dc8598, sp = 0x70afefa0
The struct clients represent the clients connected to the daemon. So either a webserver, ad-hoc nodes, api clients etc.

The different fields inside the clients struct represent information about that client:
- uuid: the uuid of the client
- id: id of the client to track the different linked list items
- receiver: does this client want to receive incoming codes (pilight-receive)
- config: does this client want to receive config updates
- core: does this client want to receive core messages (firmware version)
- stats: does this client want to receive stats (cpu / ram usage)
- forward: does this client forward messages (ad-hoc)
- media: type of client (web / mobile / ...)
- cpu: cpu usage of the client
- ram: ram usage of the client

I never had this issue so it seems related to the adhoc daemon or with some memory overrun elsewhere.
Originally I thought myself that it is a memory overrun condition, however that does not explain why read access to the struture element config fails.

I keep gdb running in the background and will post more details once it is available.

Can you give me a hint how to locate the counterpart code ?

Possibly Related Threads...
Thread Author Replies Views Last Post
  communication between main daemon and clientized daemon stops tomk 29 7,491 05-06-2019, 08:57 PM
Last Post: curlymo
  pilight-send and pilight-daemon DieterK 0 1,313 06-20-2018, 12:44 AM
Last Post: DieterK
  Service doesn´t stop the daemon Alex 18 8,606 08-08-2017, 08:17 PM
Last Post: curlymo
  pilight-daemon client in other subnet PT-Biker 1 1,655 12-11-2016, 09:42 PM
Last Post: curlymo
  pilight-daemon fails to start 2 of 3 times tietze111 4 3,287 12-11-2016, 09:06 PM
Last Post: tietze111
  pilight-daemon answers json request with html page michib 5 4,432 04-22-2016, 10:25 PM
Last Post: curlymo
Question [SOLVED]pilight-daemon: ERROR: config is not in a valid json format herrmeier 4 9,484 11-28-2015, 11:28 PM
Last Post: herrmeier
  Daemon hanging after sending raw derfrink 4 3,212 03-13-2015, 02:43 PM
Last Post: wo_rasp
  pilight - emulating api?, custom daemon as a data input sirkubax 2 3,860 01-25-2015, 08:24 PM
Last Post: curlymo
  starting daemon pilight mike 1 3,038 12-30-2014, 10:34 PM
Last Post: curlymo

Forum Jump:

Browsing: 1 Guest(s)