[SOLVED] passive service check returns "Service check did not exit properly" #467
-
I have searched high and low for a solution to this. It will most likely turn out to be one more minor error I made while making an update or the way I copied something... Recently, I created a new passive service which I am still developing. An external program generates a spool file, and it looks like naemon is processing it, judging by the debug.log (which I have enabled to investigate this problem). Peculiar that there are event cancellations just prior to the relevant section--I am not sure if these are related to the service in question, or maybe associated with a prior check. I have set my debug to -1, which is supposed to output every type of naemon trace. For this error, I trimmed out just the section I think is relevant, obfuscating the host and service names. I can repeat this and get more or less the same looking logs:
I did find some solutions online for icinga, nagios, and a few others. But all of those pertained to active checks, whereas in my case it is a passive check that is causing this. I have also spent hours trying to generate various configurations of the spool file to see if the problem is stemming from how I create the spool files. However, 3 other passive services are using the identical approach and are not getting this result. I have even searched for "funny" characters in the files (e.g., UTF8 and non-printables); nothing unusual found. Here is a typical example (not related to the above):
Again, I expect this is probably the result of some silly finger check or maybe a mis-reading of the docs. Environment: |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
The only place where this could happen, is line 586 in
|
Beta Was this translation helpful? Give feedback.
-
Heheheh... well, like I said, this is probably something really dumb. (And it did not disappoint this time either!) So, https://www.naemon.io/documentation/developer/spoolfolder.html shows an example of how to build a spool file. The breakdown of each field and value (thanks Sven!) is below that, and it clearly states that exited_ok must be 1 or naemon will toss the result (which is precisely what is happening). However... Looking closely at the example spool file again, exited_ok is set to 0, not 1! And this would be the genesis of this discussion thread. Now, the reason the other services work is because they are passed the value of exited_ok from an external program, whereas this service is hardcoding it (I mean, why not, if it always has to be 0... err, I mean, 1). BTW, thank you @datamuc for your feedback--it actually did help me to run this down because once I started combing through the source and seeing all the references to the exited_ok field numerous times, I suddenly remembered a similar field in the spool files. After making this weensy change to my code, all is well and my new service is functional (and while not complete, also not relevant here). It might be very helpful to future victims of this problem to update the one byte/character of that document. Other than that, it is time to sing one more round of "Rubber Ducky." Ok, Sven, on four... Hit it! |
Beta Was this translation helpful? Give feedback.
-
I found some 10 years old PHP code that I wrote to pass passive check results to Nagios / Naemon back in the day. Today I'm using Naemons Query Handler or the Statusengine Broker for this but I'm leaving this here as reference. switch($params['TYPE']){
case 'SERVICE':
fwrite($file, "### Passive Check Result File ###\n");
fwrite($file, "file_time=".time()."\n\n");
fwrite($file, "### openITCOCKPIT-Injection ###\n");
fwrite($file, "# Time: ".$params['TIMET']."\n");
fwrite($file, "host_name=".$params['HOSTNAME']."\n");
fwrite($file, "service_description=".$params['SERVICEDESC']."\n");
fwrite($file, "check_type=1\n");
fwrite($file, "early_timeout=0\n");
fwrite($file, "exited_ok=1\n");
fwrite($file, "start_time=".$params['TIMET'].".0\n");
fwrite($file, "finish_time=".$params['TIMET'].".0\n");
fwrite($file, "return_code=".$params['SERVICESTATEID']."\n");
fwrite($file, "output=".$params['SERVICEOUTPUT']." | ".$params['SERVICEPERFDATA']."\n\n");
break;
case 'HOST':
fwrite($file, "### Passive Check Result File ###\n");
fwrite($file, "file_time=".time()."\n\n");
fwrite($file, "### openITCOCKPIT-Injection ###\n");
fwrite($file, "# Time: ".$params['TIMET']."\n");
fwrite($file, "host_name=".$params['HOSTNAME']."\n");
fwrite($file, "check_type=1\n");
fwrite($file, "early_timeout=0\n");
fwrite($file, "exited_ok=1\n");
fwrite($file, "start_time=".$params['TIMET'].".0\n");
fwrite($file, "finish_time=".$params['TIMET'].".0\n");
fwrite($file, "return_code=".$params['HOSTSTATEID']."\n");
fwrite($file, "output=".$params['HOSTOUTPUT']."\n\n");
break;
default:
Logger::clog('Unknown Data for Nagios:',3);
Logger::dump($params);
return;
} |
Beta Was this translation helpful? Give feedback.
Heheheh... well, like I said, this is probably something really dumb. (And it did not disappoint this time either!)
So, https://www.naemon.io/documentation/developer/spoolfolder.html shows an example of how to build a spool file. The breakdown of each field and value (thanks Sven!) is below that, and it clearly states that exited_ok must be 1 or naemon will toss the result (which is precisely what is happening). However...
Looking closely at the example spool file again, exited_ok is set to 0, not 1! And this would be the genesis of this discussion thread.
Now, the reason the other services work is because they are passed the value of exited_ok from an external program, whereas this servi…