Showing posts with label shell. Show all posts
Showing posts with label shell. Show all posts

2015-05-12

Shell: Get threaddump directly from the java process

Inspecting the Java source, I found a pretty easy way to skip java in the process of extracting info from another java process :-)

PID=`pgrep java`
SCKT=/tmp/.java_pid$PID
SGNL=/tmp/.attach_pid$PID
CMD='1\0threaddump\0\0\0\0'

if [ ! -r $SCKT ]; then
 touch $SGNL || exit 2
 kill -s SIGQUIT $PID
 sleep 5
 rm $SGNL
 if [ ! -r $SCKT ]; then
  echo Cannot read $SCKT ... either you are not the correct user for this, or the java process does not 'see' our attach request.
  exit 1
 fi
 echo Done
fi

echo -ne "$CMD" | nc -U "$SCKT"

Possible options and variations I know about:
  • 1\0threaddump\0-l\0\0\0 small L for the jstack -L option
  • 1\0inspectheap\0\0\0\0
  • 1\0inspectheap\0-live\0\0\0
For others see the attachListener.cpp (JDK7, JDK8)

    2015-04-27

    Nagios: Run query as a service check

    Sometimes it is a good idea to check things directly in the database. A few months (years?) ago I ran into an issue with JIRA, that the data integrity is absolutely not enforced in any way, and user-detected issues are repaired with ad-hoc features, like the dreaded Integrity Checker. I have two major issues with Atlassians standpoint with this:
    1. The Integrity Checker is a reaction, after the user reported the problem. The latter can take weeks(!!!). You cannot solve an issue before the user even detects it if you have no tools to... detect it.
    2. A multi-$10k software not using foreign keys and other (not so) advanced RDBMS features? Come on...
    So we are already using Nagios, let's see if I can put together a command and a service that reports things to me.

    Now, we have 2 important tasks:
    1. Run a SELECT on the Oracle server from a script, and return as a standard Nagios check script.
    2. Run this command from a Nagios service and make it accept arguments.
    Let's see, what info does our script need:
    • We have a tnsnames.ora, so we only need the service name. We can assemble this based on the application name and the tier code.
    • If we hardcode a serviced user for this, we'll only need it's password (as it may vary depending on the tier).
    • It is easier to host functions on the server and just call them. So we need a parameter for the currently desired function. Why function? We'll see it later.
    Now, our check script looks like this:

    #!/bin/sh
    
    export ORACLE_HOME=/usr/local/oracle/product/11.2.0/client
    
    OK=0
    WARN=1
    CRIT=2
    UNKN=3
    
    usage()
    {
    cat << EOF
    SCRIPT PROBLEM|Called as $0 $@
    usage: $0 options
    
    This script run the db_check() on the specified Oracle service.
    
    OPTIONS:
       -a           application (eg. jira)
       -t           tier (eg. d1, t1)
       -p           nagiossvc password (default: welcome)
       -f           function name (default: DB_CHECK)
    
    These will be combined to be <tier><application>, and apps.nagios_<application> inside.
    EOF
    }
    
    APP=
    TIER=
    PASSWD=
    FUNCTION=
    OBJECT=
    
    while getopts "ht:a:p:f:o:" OPTION
    do
         case $OPTION in
             h)
                 usage
                 exit $UNKN
                 ;;
             t)
                 TIER=$OPTARG
                 ;;
             a)
                 APP=$OPTARG
                 ;;
             p)
                 PASSWD=$OPTARG
                 ;;
             f)
                 FUNCTION=$OPTARG
                 ;;
             o)
                OBJECT=$OPTARG
                ;;
             ?)
                 usage
                 exit $UNKN
                 ;;
         esac
    done
    
    if [ -z $APP ]
    then
         usage
         exit $UNKN
    fi
    
    if [ -z $TIER ]
    then
        export TNS_ADMIN=/etc/tnsnames/prod
    else
        export TNS_ADMIN=/etc/tnsnames/dev
    fi
    
    if [ -z $FUNCTION ]
    then
        FUNCTION=DB_CHECK
    fi
    
    if [ -z $OBJECT ]
    then
        OBJECT=NAGIOS_$APP
    fi
    
    if [ -z $PASSWD ]
    then
            if [ -z $TIER ]
            then
                PASSWD=nagiossvc_passwd
            else
                PASSWD=welcome
            fi
    fi
    
    START=$(date +%s)
    RESULTSET="$(${ORACLE_HOME}/bin/sqlplus -S -R 3 -L nagiossvc/${PASSWD}@${TIER}${APP} <<OURQUERY
    set colsep ,
    set pagesize 0
    set linesize 10240
    set trimspool on
    set longchunksize 2000000 long 2000000 pages 0
    SELECT ${OBJECT}.${FUNCTION}() AS ERRORS FROM DUAL;
    OURQUERY
    )"
    END=$(date +%s)
    TIMESPAN=$((END-START))
    
    if [[ $RESULTSET == *ORA-* ]]
    then
        echo "Script error!|${TIMESPAN}sec"
        echo "$RESULTSET"
        exit $CRIT
    elif [ -n "$RESULTSET" ]
    then
        echo "Issues were found.|${TIMESPAN}sec"
        echo "${RESULTSET}"
        exit $WARN
    else
        echo "OK|${TIMESPAN}sec"
        exit $OK
    fi
    
    exit $UNKN
    
    That's it for the first part, now we need a package that hosts this function for us. Why a package? Our DBA-s are crazy for packages. So let's create a package!
    CREATE OR REPLACE PACKAGE nagios_jira AUTHID DEFINER AS
       FUNCTION db_check RETURN CLOB;
       -- Application specific functions:
       FUNCTION check_workflow_entry_states RETURN CLOB;
       FUNCTION check_issue_summary_not_null RETURN CLOB;
       FUNCTION check_invalid_issuelink RETURN CLOB;
       FUNCTION check_deleted_is_watcher RETURN CLOB;
       FUNCTION check_fileattachment_nulled RETURN CLOB;
    END nagios_jira;
    /
    CREATE OR REPLACE PACKAGE BODY nagios_jira AS
    FUNCTION db_check RETURN CLOB IS
       v_results CLOB := '';
    BEGIN
       v_results := v_results || check_workflow_entry_states;
       v_results := v_results || check_issue_summary_not_null;
       v_results := v_results || check_invalid_issuelink;
       v_results := v_results || check_deleted_is_watcher;
       v_results := v_results || check_fileattachment_nulled;
       -- repeat
       RETURN v_results;
    END db_check;
    
    FUNCTION check_workflow_entry_states RETURN CLOB IS
       v_errorcount NUMBER(6);
    BEGIN
       SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.JIRAISSUE  
       INNER JOIN JIRAUSER.OS_WFENTRY ON JIRAISSUE.WORKFLOW_ID = OS_WFENTRY.ID
       WHERE OS_WFENTRY.STATE IS NULL OR OS_WFENTRY.STATE = 0;
       IF (v_errorcount > 0) THEN
          RETURN 'CHECK_WORKFLOW_ENTRY_STATES('||v_errorcount||')' || CHR(10);
       ELSE
          RETURN '';
       END IF;
    END check_workflow_entry_states;
    
    FUNCTION check_issue_summary_not_null RETURN CLOB IS
       v_errorcount NUMBER(6);
    BEGIN
       SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.JIRAISSUE WHERE SUMMARY IS NULL;
       IF (v_errorcount > 0) THEN
          RETURN 'CHECK_ISSUE_SUMMARY_NOT_NULL('||v_errorcount||')' || CHR(10);
       ELSE
          RETURN '';
       END IF;
    END check_issue_summary_not_null;
    
    FUNCTION check_invalid_issuelink RETURN CLOB IS
       v_errorcount NUMBER(6);
    BEGIN
       SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.ISSUELINK L, JIRAUSER.JIRAISSUE I1, JIRAUSER.JIRAISSUE I2 WHERE I1.ID(+) = L.SOURCE AND I2.ID(+) = L.DESTINATION AND (I1.ID IS NULL OR I2.ID IS NULL);
       IF (v_errorcount > 0) THEN
          RETURN 'CHECK_INVALID_ISSUELINK('||v_errorcount||')' || CHR(10);
       ELSE
          RETURN '';
       END IF;
    END check_invalid_issuelink;
    
    FUNCTION check_deleted_is_watcher RETURN CLOB IS
       v_errorcount NUMBER(6);
    BEGIN
       SELECT COUNT(*) INTO v_errorcount FROM (SELECT DISTINCT LOWER(SOURCE_NAME) FROM JIRAUSER.USERASSOCIATION MINUS SELECT DISTINCT LOWER_USER_NAME FROM JIRAUSER.CWD_USER);
       IF (v_errorcount > 0) THEN
          RETURN 'CHECK_DELETED_USERS_IN_WATCHERS('||v_errorcount||')' || CHR(10);
       ELSE
          RETURN '';
       END IF;
    END check_deleted_is_watcher;
    
    FUNCTION check_fileattachment_nulled RETURN CLOB IS
       v_errorcount NUMBER(6);
    BEGIN
       SELECT COUNT(*) INTO v_errorcount FROM JIRAUSER.FILEATTACHMENT WHERE FILENAME IS NULL;
       IF (v_errorcount > 0) THEN
          RETURN 'CHECK_FILEATTACHMENT_WITHOUT_FILENAME('||v_errorcount||')' || CHR(10);
       ELSE
          RETURN '';
       END IF;
    END check_fileattachment_nulled;
    
    END nagios_jira;
    /
    GRANT EXECUTE ON nagios_jira to nagiossvc;
    CREATE SYNONYM nagiossvc.nagios_jira FOR APPS.nagios_jira;
    
    Ugh, those joins are ugly, but our architect is stuck in pre-9i times...

    So if we get the return value from this function, it will contain only one cell for us. If that's not empty, then we found issues, and the details are listed.

    Now, wire it into Nagios, have a Nagios command definition for this script:
    # Run the check on the specified db
    # ARG1 - application
    # ARG2 - tier (defaults to prod)
    # ARG3 - password of nagiossvc - optional (defaults in script)
    # ARG4 - function name - optional (defaults to DB_CHECK)
    # ARG5 - plsql package object name - optional (defaults to ARG1)
    define command {
        command_name check_jira_integrity
        command_line /usr/local/whatever/bin/check_db.sh -a "$ARG1$" -t "$ARG2$" -p "$ARG3$" -f "$ARG4$" -o "$ARG5$"
    }
    
    Call this command from a Nagios service:
    define service {
            service_description jira_prod_check_integrity
            host_name myjira
            check_command check_jira_integrity!jira
            check_interval 15
            notification_interval 15
            retry_interval 5
    }
    
    (Check every 15 minutes. When problems detected, retry every 5 minutes.)

    Now we are playing.

    2013-12-18

    Sending emails to addresses and subjects in a CSV with a fixed message in a textfile

    The following script was generated to notice approvers that an audit initiated the deletion of some users they did not approved (again). The Subject contains the usernames, and the message is a template with some PC blahblah.

    #!/bin/bash
    
    if [ $# -lt 2 ]; then
     echo "No arguments passed! I need a 'mailaddr;subject' .csv and a txt containing the fixed message."
     echo "csvmailer.sh addresses.csv message.txt"
     exit 1
    fi
    
    while read p; do
     IFS=';'
     TOKENS=($p) 
     EMAIL=${TOKENS[0]}
     SUBJ=${TOKENS[1]}
     IFS=' '
     mail -s "$SUBJ" "$EMAIL" < $2
    done < $1
    

    The CSV is like "jane.doe@company.com;This is a message in the subject about deleting the account of 'johndoe@company.com'\n". Do NOT put a semicolon into the subject column :-D

    IFS sets the Internal Field Separator to semicolon (';'), instead of the default whitespace. This way the array generator/tokenizer is dead simple in the next line.

    $1 is the CSV file, read into $p line-by-line.

    $2 is the fixed message piped into mail.

    2013-12-05

    Linux: Send files in e-mail from console

    So I wanted to send some files, but my mailx package did not have support for the famous -a parameter.

    #!/bin/bash
    
    function create_attachment_block()
    {
            echo -ne "--$BOUNDARY\r\nContent-Transfer-Encoding: base64\r\n"
            echo -ne "Content-Type: $(file -bi "$1"); name=\"$1\"\r\n"
            echo -ne "Content-Disposition: attachment; filename=\"$1\"\r\n\r\n$(base64 -w 0 "$1")\r\n\r\n"
    }
    
    if [ $# -lt 2 ]; then
            echo No files specified...
            exit 1;
    fi
    
    BOUNDARY="==combine-autogun==_$(date +%Y%m%d%H%M%S)_$$_=="
    BODY=""
    
    for a in "$@"
    do
            if [ -s "$a" -a -f "$a" -a -r "$a" ]; then
                    BODY="$BODY""`create_attachment_block "$a"`"
            fi
    done
    
    /usr/sbin/sendmail -oi -t << COMPLEX_MAIL
    To: $1
    Subject: Please see files attached
    MIME-Version: 1.0
    User-Agent: $0
    Content-Type: multipart/mixed; boundary="$BOUNDARY"
    
    $BODY
    --${BOUNDARY}--
    COMPLEX_MAIL
    

    2013-11-18

    Logging IO activity of a process

    Okay, your 3rd party app sucks. It sucks big time. Generates heavy disk traffic at seemingly random times, and you just can't think of anything, anymore. The users are revolting, the website is lagging, your boss is raging.

    It is time to check what the hell the app is actually doing.

    The following script is using strace to catch all IO-related syscalls done by the given process, and dump them in a CSV manner. Later, you can aggregate by seconds, minutes, file systems, or subsystems (like Lucene, etc), create charts, graphs, and pivots.

    #!/bin/sh
    
    if [ "x$1" == "x-h" ]; then
     echo "Usage: ./iotrace.sh <pid>"
     exit 0
    fi
    
    if [ "$(id -u)" != "0" ]; then
       echo "This script must be run as root " 1>&2
       exit 1
    fi
    
    if [ $# -gt 0 ]; then
    PID=$1
    else
    echo "Hey, I need a parameter!"
    exit 1
    fi
    
    ps -eL | grep $PID | awk '{print"-p " $2}' | xargs strace -q -f -v -ttt -T -s 0 -e trace=open,close,read,write 2>&1 | awk -v pid=$PID '
    function output(a, f, r, t)
    {
     # a - action
     # f - file descriptor
     # r - result
     # t - time as unix epoch
     if (f in fd)
      file = fd[f];
     else
     {
      ("readlink /proc/" pid "/fd/" f) | getline file;
      fd[f] = file;
     }
     if (file !~ /^(socket|pipe|\/dev|\/proc)/ || r ~ /\d+/)
      print a, file, r, strftime("%Y-%m-%d %H:%M:%S"); #substr(t, 0, index(t, ".")-1));
    }
    
    BEGIN { OFS=";"; print "op;path;bytes;epoch";}
    {
     if($6 ~ /resumed>/)
     {
      if ($5 ~ /open/){fd[$(NF-1)] = pending[$2];}
      else if ($5 ~ /close/){match($4, /([0-9]+)/, a);delete fd[a[1]];}
      else if ($5 ~ /write/){match($4, /([0-9]+)/, a);output("write", pending[$2], $(NF-1), $3);}
      else if ($5 ~ /read/) {match($4, /([0-9]+)/, a);output("read", pending[$2], $(NF-1), $3);}
      
      delete pending[$2];
     }
     else if ($4 ~ /^open\(/)
     {
      match($4, /\"(.+)\"/, a);
      f = a[1];
      if ($(NF-1) == "<unfinished")
      {
       pending[$2] = f;
      } else {
       fd[$(NF-1)] = f;
      }
     }
     else if ($4 ~ /^close\(/)
     {
      match($4, /([0-9]+)/, a);
      f = a[1];
      if ($(NF-1) == "<unfinished")
      {
       pending[$2] = f;
      } else {
       delete fd[f];
      }
     }
     else if ($4 ~ /^write\(/)
     {
      match($4, /([0-9]+)/, a);
      f = a[1];
      if ($(NF-1) == "<unfinished")
      {
       pending[$2] = f;
      } else {
       output("write", f, $(NF-1), $3);
      }
     }
     else if ($4 ~ /^read\(/)
     {
      match($4, /([0-9]+)/, a);
      f = a[1];
      if ($(NF-1) == "<unfinished")
      {
       pending[$2] = f;
      } else {
       output("read", f, $(NF-1), $3);
      }
     }
    }'
    

    What it does?
    1. Takes your input of a process ID
    2. Reads all the child processes of this process
    3. Feeds these into xargs to make strace to attach to all of them, also make strace to only print the four syscalls we are interested in (open, close, read, write), these are used by normal java IO methods
    4. Make a dictionary of file descriptors and filenames, and pretty-print the filenames with the acutal number of processed bytes
    5. Also, take care of the interrupted syscall printouts.

    An average java webapp with Lucene can produce 3.5M rows in an hour. Note, that it cannot be opened in Excel ;-)