- The Integrity Checker is a reaction, after the user reported the problem. The latter can take weeks(!!!). You cannot solve an issue before the user even detects it if you have no tools to... detect it.
- A multi-$10k software not using foreign keys and other (not so) advanced RDBMS features? Come on...
Now, we have 2 important tasks:
- Run a SELECT on the Oracle server from a script, and return as a standard Nagios check script.
- Run this command from a Nagios service and make it accept arguments.
- We have a tnsnames.ora, so we only need the service name. We can assemble this based on the application name and the tier code.
- If we hardcode a serviced user for this, we'll only need it's password (as it may vary depending on the tier).
- It is easier to host functions on the server and just call them. So we need a parameter for the currently desired function. Why function? We'll see it later.
#!/bin/sh export ORACLE_HOME=/usr/local/oracle/product/11.2.0/client OK=0 WARN=1 CRIT=2 UNKN=3 usage() { cat << EOF SCRIPT PROBLEM|Called as $0 $@ usage: $0 options This script run the db_check() on the specified Oracle service. OPTIONS: -a application (eg. jira) -t tier (eg. d1, t1) -p nagiossvc password (default: welcome) -f function name (default: DB_CHECK) These will be combined to be <tier><application>, and apps.nagios_<application> inside. EOF } APP= TIER= PASSWD= FUNCTION= OBJECT= while getopts "ht:a:p:f:o:" OPTION do case $OPTION in h) usage exit $UNKN ;; t) TIER=$OPTARG ;; a) APP=$OPTARG ;; p) PASSWD=$OPTARG ;; f) FUNCTION=$OPTARG ;; o) OBJECT=$OPTARG ;; ?) usage exit $UNKN ;; esac done if [ -z $APP ] then usage exit $UNKN fi if [ -z $TIER ] then export TNS_ADMIN=/etc/tnsnames/prod else export TNS_ADMIN=/etc/tnsnames/dev fi if [ -z $FUNCTION ] then FUNCTION=DB_CHECK fi if [ -z $OBJECT ] then OBJECT=NAGIOS_$APP fi if [ -z $PASSWD ] then if [ -z $TIER ] then PASSWD=nagiossvc_passwd else PASSWD=welcome fi fi START=$(date +%s) RESULTSET="$(${ORACLE_HOME}/bin/sqlplus -S -R 3 -L nagiossvc/${PASSWD}@${TIER}${APP} <<OURQUERY set colsep , set pagesize 0 set linesize 10240 set trimspool on set longchunksize 2000000 long 2000000 pages 0 SELECT ${OBJECT}.${FUNCTION}() AS ERRORS FROM DUAL; OURQUERY )" END=$(date +%s) TIMESPAN=$((END-START)) if [[ $RESULTSET == *ORA-* ]] then echo "Script error!|${TIMESPAN}sec" echo "$RESULTSET" exit $CRIT elif [ -n "$RESULTSET" ] then echo "Issues were found.|${TIMESPAN}sec" echo "${RESULTSET}" exit $WARN else echo "OK|${TIMESPAN}sec" exit $OK fi exit $UNKNThat's it for the first part, now we need a package that hosts this function for us. Why a package? Our DBA-s are crazy for packages. So let's create a package!
CREATE OR REPLACE PACKAGE nagios_jira AUTHID DEFINER AS FUNCTION db_check RETURN CLOB; -- Application specific functions: FUNCTION check_workflow_entry_states RETURN CLOB; FUNCTION check_issue_summary_not_null RETURN CLOB; FUNCTION check_invalid_issuelink RETURN CLOB; FUNCTION check_deleted_is_watcher RETURN CLOB; FUNCTION check_fileattachment_nulled RETURN CLOB; END nagios_jira; / CREATE OR REPLACE PACKAGE BODY nagios_jira AS FUNCTION db_check RETURN CLOB IS v_results CLOB := ''; BEGIN v_results := v_results || check_workflow_entry_states; v_results := v_results || check_issue_summary_not_null; v_results := v_results || check_invalid_issuelink; v_results := v_results || check_deleted_is_watcher; v_results := v_results || check_fileattachment_nulled; -- repeat RETURN v_results; END db_check; FUNCTION check_workflow_entry_states RETURN CLOB IS v_errorcount NUMBER(6); BEGIN SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.JIRAISSUE INNER JOIN JIRAUSER.OS_WFENTRY ON JIRAISSUE.WORKFLOW_ID = OS_WFENTRY.ID WHERE OS_WFENTRY.STATE IS NULL OR OS_WFENTRY.STATE = 0; IF (v_errorcount > 0) THEN RETURN 'CHECK_WORKFLOW_ENTRY_STATES('||v_errorcount||')' || CHR(10); ELSE RETURN ''; END IF; END check_workflow_entry_states; FUNCTION check_issue_summary_not_null RETURN CLOB IS v_errorcount NUMBER(6); BEGIN SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.JIRAISSUE WHERE SUMMARY IS NULL; IF (v_errorcount > 0) THEN RETURN 'CHECK_ISSUE_SUMMARY_NOT_NULL('||v_errorcount||')' || CHR(10); ELSE RETURN ''; END IF; END check_issue_summary_not_null; FUNCTION check_invalid_issuelink RETURN CLOB IS v_errorcount NUMBER(6); BEGIN SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.ISSUELINK L, JIRAUSER.JIRAISSUE I1, JIRAUSER.JIRAISSUE I2 WHERE I1.ID(+) = L.SOURCE AND I2.ID(+) = L.DESTINATION AND (I1.ID IS NULL OR I2.ID IS NULL); IF (v_errorcount > 0) THEN RETURN 'CHECK_INVALID_ISSUELINK('||v_errorcount||')' || CHR(10); ELSE RETURN ''; END IF; END check_invalid_issuelink; FUNCTION check_deleted_is_watcher RETURN CLOB IS v_errorcount NUMBER(6); BEGIN SELECT COUNT(*) INTO v_errorcount FROM (SELECT DISTINCT LOWER(SOURCE_NAME) FROM JIRAUSER.USERASSOCIATION MINUS SELECT DISTINCT LOWER_USER_NAME FROM JIRAUSER.CWD_USER); IF (v_errorcount > 0) THEN RETURN 'CHECK_DELETED_USERS_IN_WATCHERS('||v_errorcount||')' || CHR(10); ELSE RETURN ''; END IF; END check_deleted_is_watcher; FUNCTION check_fileattachment_nulled RETURN CLOB IS v_errorcount NUMBER(6); BEGIN SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.FILEATTACHMENT WHERE FILENAME IS NULL; IF (v_errorcount > 0) THEN RETURN 'CHECK_FILEATTACHMENT_WITHOUT_FILENAME('||v_errorcount||')' || CHR(10); ELSE RETURN ''; END IF; END check_fileattachment_nulled; END nagios_jira; / GRANT EXECUTE ON nagios_jira to nagiossvc; CREATE SYNONYM nagiossvc.nagios_jira FOR APPS.nagios_jira;Ugh, those joins are ugly, but our architect is stuck in pre-9i times...
So if we get the return value from this function, it will contain only one cell for us. If that's not empty, then we found issues, and the details are listed.
Now, wire it into Nagios, have a Nagios command definition for this script:
# Run the check on the specified db # ARG1 - application # ARG2 - tier (defaults to prod) # ARG3 - password of nagiossvc - optional (defaults in script) # ARG4 - function name - optional (defaults to DB_CHECK) # ARG5 - plsql package object name - optional (defaults to ARG1) define command { command_name check_jira_integrity command_line /usr/local/whatever/bin/check_db.sh -a "$ARG1$" -t "$ARG2$" -p "$ARG3$" -f "$ARG4$" -o "$ARG5$" }Call this command from a Nagios service:
define service { service_description jira_prod_check_integrity host_name myjira check_command check_jira_integrity!jira check_interval 15 notification_interval 15 retry_interval 5 }(Check every 15 minutes. When problems detected, retry every 5 minutes.)
Now we are playing.