2015-04-27

Nagios: Run query as a service check

Sometimes it is a good idea to check things directly in the database. A few months (years?) ago I ran into an issue with JIRA, that the data integrity is absolutely not enforced in any way, and user-detected issues are repaired with ad-hoc features, like the dreaded Integrity Checker. I have two major issues with Atlassians standpoint with this:
  1. The Integrity Checker is a reaction, after the user reported the problem. The latter can take weeks(!!!). You cannot solve an issue before the user even detects it if you have no tools to... detect it.
  2. A multi-$10k software not using foreign keys and other (not so) advanced RDBMS features? Come on...
So we are already using Nagios, let's see if I can put together a command and a service that reports things to me.

Now, we have 2 important tasks:
  1. Run a SELECT on the Oracle server from a script, and return as a standard Nagios check script.
  2. Run this command from a Nagios service and make it accept arguments.
Let's see, what info does our script need:
  • We have a tnsnames.ora, so we only need the service name. We can assemble this based on the application name and the tier code.
  • If we hardcode a serviced user for this, we'll only need it's password (as it may vary depending on the tier).
  • It is easier to host functions on the server and just call them. So we need a parameter for the currently desired function. Why function? We'll see it later.
Now, our check script looks like this:

#!/bin/sh

export ORACLE_HOME=/usr/local/oracle/product/11.2.0/client

OK=0
WARN=1
CRIT=2
UNKN=3

usage()
{
cat << EOF
SCRIPT PROBLEM|Called as $0 $@
usage: $0 options

This script run the db_check() on the specified Oracle service.

OPTIONS:
   -a           application (eg. jira)
   -t           tier (eg. d1, t1)
   -p           nagiossvc password (default: welcome)
   -f           function name (default: DB_CHECK)

These will be combined to be <tier><application>, and apps.nagios_<application> inside.
EOF
}

APP=
TIER=
PASSWD=
FUNCTION=
OBJECT=

while getopts "ht:a:p:f:o:" OPTION
do
     case $OPTION in
         h)
             usage
             exit $UNKN
             ;;
         t)
             TIER=$OPTARG
             ;;
         a)
             APP=$OPTARG
             ;;
         p)
             PASSWD=$OPTARG
             ;;
         f)
             FUNCTION=$OPTARG
             ;;
         o)
            OBJECT=$OPTARG
            ;;
         ?)
             usage
             exit $UNKN
             ;;
     esac
done

if [ -z $APP ]
then
     usage
     exit $UNKN
fi

if [ -z $TIER ]
then
    export TNS_ADMIN=/etc/tnsnames/prod
else
    export TNS_ADMIN=/etc/tnsnames/dev
fi

if [ -z $FUNCTION ]
then
    FUNCTION=DB_CHECK
fi

if [ -z $OBJECT ]
then
    OBJECT=NAGIOS_$APP
fi

if [ -z $PASSWD ]
then
        if [ -z $TIER ]
        then
            PASSWD=nagiossvc_passwd
        else
            PASSWD=welcome
        fi
fi

START=$(date +%s)
RESULTSET="$(${ORACLE_HOME}/bin/sqlplus -S -R 3 -L nagiossvc/${PASSWD}@${TIER}${APP} <<OURQUERY
set colsep ,
set pagesize 0
set linesize 10240
set trimspool on
set longchunksize 2000000 long 2000000 pages 0
SELECT ${OBJECT}.${FUNCTION}() AS ERRORS FROM DUAL;
OURQUERY
)"
END=$(date +%s)
TIMESPAN=$((END-START))

if [[ $RESULTSET == *ORA-* ]]
then
    echo "Script error!|${TIMESPAN}sec"
    echo "$RESULTSET"
    exit $CRIT
elif [ -n "$RESULTSET" ]
then
    echo "Issues were found.|${TIMESPAN}sec"
    echo "${RESULTSET}"
    exit $WARN
else
    echo "OK|${TIMESPAN}sec"
    exit $OK
fi

exit $UNKN
That's it for the first part, now we need a package that hosts this function for us. Why a package? Our DBA-s are crazy for packages. So let's create a package!
CREATE OR REPLACE PACKAGE nagios_jira AUTHID DEFINER AS
   FUNCTION db_check RETURN CLOB;
   -- Application specific functions:
   FUNCTION check_workflow_entry_states RETURN CLOB;
   FUNCTION check_issue_summary_not_null RETURN CLOB;
   FUNCTION check_invalid_issuelink RETURN CLOB;
   FUNCTION check_deleted_is_watcher RETURN CLOB;
   FUNCTION check_fileattachment_nulled RETURN CLOB;
END nagios_jira;
/
CREATE OR REPLACE PACKAGE BODY nagios_jira AS
FUNCTION db_check RETURN CLOB IS
   v_results CLOB := '';
BEGIN
   v_results := v_results || check_workflow_entry_states;
   v_results := v_results || check_issue_summary_not_null;
   v_results := v_results || check_invalid_issuelink;
   v_results := v_results || check_deleted_is_watcher;
   v_results := v_results || check_fileattachment_nulled;
   -- repeat
   RETURN v_results;
END db_check;

FUNCTION check_workflow_entry_states RETURN CLOB IS
   v_errorcount NUMBER(6);
BEGIN
   SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.JIRAISSUE  
   INNER JOIN JIRAUSER.OS_WFENTRY ON JIRAISSUE.WORKFLOW_ID = OS_WFENTRY.ID
   WHERE OS_WFENTRY.STATE IS NULL OR OS_WFENTRY.STATE = 0;
   IF (v_errorcount > 0) THEN
      RETURN 'CHECK_WORKFLOW_ENTRY_STATES('||v_errorcount||')' || CHR(10);
   ELSE
      RETURN '';
   END IF;
END check_workflow_entry_states;

FUNCTION check_issue_summary_not_null RETURN CLOB IS
   v_errorcount NUMBER(6);
BEGIN
   SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.JIRAISSUE WHERE SUMMARY IS NULL;
   IF (v_errorcount > 0) THEN
      RETURN 'CHECK_ISSUE_SUMMARY_NOT_NULL('||v_errorcount||')' || CHR(10);
   ELSE
      RETURN '';
   END IF;
END check_issue_summary_not_null;

FUNCTION check_invalid_issuelink RETURN CLOB IS
   v_errorcount NUMBER(6);
BEGIN
   SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.ISSUELINK L, JIRAUSER.JIRAISSUE I1, JIRAUSER.JIRAISSUE I2 WHERE I1.ID(+) = L.SOURCE AND I2.ID(+) = L.DESTINATION AND (I1.ID IS NULL OR I2.ID IS NULL);
   IF (v_errorcount > 0) THEN
      RETURN 'CHECK_INVALID_ISSUELINK('||v_errorcount||')' || CHR(10);
   ELSE
      RETURN '';
   END IF;
END check_invalid_issuelink;

FUNCTION check_deleted_is_watcher RETURN CLOB IS
   v_errorcount NUMBER(6);
BEGIN
   SELECT COUNT(*) INTO v_errorcount FROM (SELECT DISTINCT LOWER(SOURCE_NAME) FROM JIRAUSER.USERASSOCIATION MINUS SELECT DISTINCT LOWER_USER_NAME FROM JIRAUSER.CWD_USER);
   IF (v_errorcount > 0) THEN
      RETURN 'CHECK_DELETED_USERS_IN_WATCHERS('||v_errorcount||')' || CHR(10);
   ELSE
      RETURN '';
   END IF;
END check_deleted_is_watcher;

FUNCTION check_fileattachment_nulled RETURN CLOB IS
   v_errorcount NUMBER(6);
BEGIN
   SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.FILEATTACHMENT WHERE FILENAME IS NULL;
   IF (v_errorcount > 0) THEN
      RETURN 'CHECK_FILEATTACHMENT_WITHOUT_FILENAME('||v_errorcount||')' || CHR(10);
   ELSE
      RETURN '';
   END IF;
END check_fileattachment_nulled;

END nagios_jira;
/
GRANT EXECUTE ON nagios_jira to nagiossvc;
CREATE SYNONYM nagiossvc.nagios_jira FOR APPS.nagios_jira;
Ugh, those joins are ugly, but our architect is stuck in pre-9i times...

So if we get the return value from this function, it will contain only one cell for us. If that's not empty, then we found issues, and the details are listed.

Now, wire it into Nagios, have a Nagios command definition for this script:
# Run the check on the specified db
# ARG1 - application
# ARG2 - tier (defaults to prod)
# ARG3 - password of nagiossvc - optional (defaults in script)
# ARG4 - function name - optional (defaults to DB_CHECK)
# ARG5 - plsql package object name - optional (defaults to ARG1)
define command {
    command_name check_jira_integrity
    command_line /usr/local/whatever/bin/check_db.sh -a "$ARG1$" -t "$ARG2$" -p "$ARG3$" -f "$ARG4$" -o "$ARG5$"
}
Call this command from a Nagios service:
define service {
        service_description jira_prod_check_integrity
        host_name myjira
        check_command check_jira_integrity!jira
        check_interval 15
        notification_interval 15
        retry_interval 5
}
(Check every 15 minutes. When problems detected, retry every 5 minutes.)

Now we are playing.

2015-04-23

JIRA: Additional indexes for performance

Some indices for purrformance, that piled up during the years:
-- Fix Kepler idiocies:
ALTER INDEX PK_KISSUESTATE RENAME TO ISSUE_STATE_IDX;
ALTER INDEX PK_KPLUGINSCFG RENAME TO PLUGIN_CFG_IDX;
ALTER INDEX PK_KSTATEVALUES RENAME TO STATE_VALUES_IDX;

-- Standard additions
CREATE INDEX IX_NOTIF_INST_MSGID ON NOTIFICATIONINSTANCE (MESSAGEID ASC);
CREATE INDEX IX_CHANGEGROUP_AUTHOR ON CHANGEGROUP (AUTHOR ASC); -- can be big
CREATE INDEX IX_CHANGEGROUP_AUTHOR_CREATED ON CHANGEGROUP (AUTHOR ASC, CREATED ASC); -- can be big
CREATE INDEX IX_CHANGEGROUP_AUTH_CREA_ISSUE ON CHANGEGROUP (AUTHOR ASC, CREATED DESC, ISSUEID ASC);
CREATE INDEX IX_FILEATTACHMENT_FILENAME ON FILEATTACHMENT (FILENAME ASC, '1'); -- See Note
CREATE INDEX IX_JIRAACTION_AUTHOR ON JIRAACTION (AUTHOR ASC);
CREATE INDEX IX_JIRAACTION_AUTHOR_CREATED ON JIRAACTION (AUTHOR ASC, CREATED ASC); 
CREATE INDEX IX_JIRAACTION_AUTH_CREAT_ISSUE ON JIRAACTION (AUTHOR ASC, CREATED DESC, ISSUEID ASC);
CREATE INDEX IX_JIRAACTION_TYPE_LEVEL ON JIRAACTION (ID ASC, ACTIONTYPE ASC, ACTIONLEVEL ASC); 
CREATE INDEX IX_JIRAISSUE_ISSUESTATUS ON JIRAISSUE (ISSUESTATUS ASC, '1');
CREATE INDEX IX_JIRAISSUE_PROJ_ASSIGNEE ON JIRAISSUE (PROJECT ASC, ASSIGNEE ASC);
CREATE INDEX IX_JIRAISSUE_PROJECT ON JIRAISSUE (PROJECT ASC, '1');
CREATE INDEX IX_JIRAISSUE_PROJECT_UPDATED ON JIRAISSUE (PROJECT ASC, UPDATED ASC);
CREATE INDEX issue_reporter ON JIRAISSUE (REPORTER ASC); -- lately introduced by Atlassian in JIRA 6.3
CREATE INDEX IX_JIRAISSUE_SUMMARY ON JIRAISSUE (SUMMARY ASC, '1'); -- See Note
CREATE INDEX IX_JIRAISSUE_UPDATED_PROJ_ID ON JIRAISSUE (UPDATED DESC, PROJECT ASC, ID ASC); -- for idiotic pagination queries
CREATE INDEX IX_JIRAISSUE_WF_PKEY_1 ON JIRAISSUE (WORKFLOW_ID ASC, PKEY ASC, '1');
CREATE INDEX IX_OS_WFENTRY_ID_NAME ON OS_WFENTRY (ID ASC, NAME ASC);

-- customfieldvalue is large. we need it to be created quickly, then turn off parallelism.
CREATE INDEX IX_CUSTOMFIELDVALUE_ISSUE ON CUSTOMFIELDVALUE (ISSUE ASC) PARALLEL 12; ALTER INDEX IX_CUSTOMFIELDVALUE_ISSUE NOPARALLEL; -- can be _really_ big depending on your CF usage

-- bitmap indexes:
CREATE BITMAP INDEX IX_OS_WFENTRY_STATE ON OS_WFENTRY (STATE ASC); -- degree of cardinality is typically below 1 ppm.

-- recreate these indexes:
DROP INDEX PRMSSN_SCHEME; CREATE BITMAP INDEX PRMSSN_SCHEME ON SCHEMEPERMISSIONS(SCHEME ASC);
DROP INDEX WF_ENTRYID; CREATE INDEX WF_ENTRYID ON OS_CURRENTSTEP (ENTRY_ID, STEP_ID);
DROP INDEX ISSUE_WORKFLOW; CREATE INDEX ISSUE_WORKFLOW ON JIRAISSUE (WORKFLOW_ID, ISSUESTATUS);

Note: On multi column indices containing a '1': The '1' here is actually a static char(1) column, always containing '1', and even if the first (the real) column's value is null, it will be stored in the index, and thus will be searchable.

JIRA: When were the watchers added to an issue?

SELECT
   SOURCE_NAME AS WATCHER,
   A.CREATED             ,
   TO_CHAR(A.CREATED, 'yyyy-mm-dd hh24:mi:ss') FANCY_CREATED
FROM
   USERASSOCIATION A
INNER JOIN JIRAISSUE I
ON
   I.ID = A.SINK_NODE_ID
INNER JOIN PROJECT P
ON
   P.ID = I.PROJECT
WHERE
   P.PKEY     = 'TICKT' AND
   I.ISSUENUM = '1114';