2013-11-18

Get per process IO activity

So you want to see a live average, how your processes spin your disks (in fact, any mounts, including NFS).

Well, procfs exports an io counter, maintained by the read*/write* syscalls.

You can find the source here.

Logging IO activity of a process

Okay, your 3rd party app sucks. It sucks big time. Generates heavy disk traffic at seemingly random times, and you just can't think of anything, anymore. The users are revolting, the website is lagging, your boss is raging.

It is time to check what the hell the app is actually doing.

The following script is using strace to catch all IO-related syscalls done by the given process, and dump them in a CSV manner. Later, you can aggregate by seconds, minutes, file systems, or subsystems (like Lucene, etc), create charts, graphs, and pivots.

#!/bin/sh

if [ "x$1" == "x-h" ]; then
 echo "Usage: ./iotrace.sh <pid>"
 exit 0
fi

if [ "$(id -u)" != "0" ]; then
   echo "This script must be run as root " 1>&2
   exit 1
fi

if [ $# -gt 0 ]; then
PID=$1
else
echo "Hey, I need a parameter!"
exit 1
fi

ps -eL | grep $PID | awk '{print"-p " $2}' | xargs strace -q -f -v -ttt -T -s 0 -e trace=open,close,read,write 2>&1 | awk -v pid=$PID '
function output(a, f, r, t)
{
 # a - action
 # f - file descriptor
 # r - result
 # t - time as unix epoch
 if (f in fd)
  file = fd[f];
 else
 {
  ("readlink /proc/" pid "/fd/" f) | getline file;
  fd[f] = file;
 }
 if (file !~ /^(socket|pipe|\/dev|\/proc)/ || r ~ /\d+/)
  print a, file, r, strftime("%Y-%m-%d %H:%M:%S"); #substr(t, 0, index(t, ".")-1));
}

BEGIN { OFS=";"; print "op;path;bytes;epoch";}
{
 if($6 ~ /resumed>/)
 {
  if ($5 ~ /open/){fd[$(NF-1)] = pending[$2];}
  else if ($5 ~ /close/){match($4, /([0-9]+)/, a);delete fd[a[1]];}
  else if ($5 ~ /write/){match($4, /([0-9]+)/, a);output("write", pending[$2], $(NF-1), $3);}
  else if ($5 ~ /read/) {match($4, /([0-9]+)/, a);output("read", pending[$2], $(NF-1), $3);}
  
  delete pending[$2];
 }
 else if ($4 ~ /^open\(/)
 {
  match($4, /\"(.+)\"/, a);
  f = a[1];
  if ($(NF-1) == "<unfinished")
  {
   pending[$2] = f;
  } else {
   fd[$(NF-1)] = f;
  }
 }
 else if ($4 ~ /^close\(/)
 {
  match($4, /([0-9]+)/, a);
  f = a[1];
  if ($(NF-1) == "<unfinished")
  {
   pending[$2] = f;
  } else {
   delete fd[f];
  }
 }
 else if ($4 ~ /^write\(/)
 {
  match($4, /([0-9]+)/, a);
  f = a[1];
  if ($(NF-1) == "<unfinished")
  {
   pending[$2] = f;
  } else {
   output("write", f, $(NF-1), $3);
  }
 }
 else if ($4 ~ /^read\(/)
 {
  match($4, /([0-9]+)/, a);
  f = a[1];
  if ($(NF-1) == "<unfinished")
  {
   pending[$2] = f;
  } else {
   output("read", f, $(NF-1), $3);
  }
 }
}'

What it does?
  1. Takes your input of a process ID
  2. Reads all the child processes of this process
  3. Feeds these into xargs to make strace to attach to all of them, also make strace to only print the four syscalls we are interested in (open, close, read, write), these are used by normal java IO methods
  4. Make a dictionary of file descriptors and filenames, and pretty-print the filenames with the acutal number of processed bytes
  5. Also, take care of the interrupted syscall printouts.

An average java webapp with Lucene can produce 3.5M rows in an hour. Note, that it cannot be opened in Excel ;-)

Cisco PCF encrypted group password

Cisco invented the PCF format to store VPN configuration for its clients.
It contains the group password (that lets you through to the individual authentication). Encrypted.

If you need that password, for like: using on your smartphone, setting it up in your SOHO router, or other completely legal reasons, here is a cute website that decrypts that for you:

https://www.unix-ag.uni-kl.de/~massar/bin/cisco-decode

2013-11-11

warning: passing argument <n> of '<some function>’ with different width due to prototype

You receive warnings, like
warning: passing argument 1 of ‘ptrace’ with different width due to prototype
but you do not have any magic there?

You are using -fshort-enums, and the referenced function accepts a member of an enum there (enum __ptrace_request in this example). By -fshort-enums, all the enums will be short int-s, but the function declaration remains a normal int, as the compiler knows it.

Solution: Remove -fshort-enums from your compiler flags.

2013-11-07

Makefile automatic variables

$@
The file name of the target of the rule. If the target is an archive member, then `$@' is the name of the archive file. In a pattern rule that has multiple targets (see section Introduction to Pattern Rules), `$@' is the name of whichever target caused the rule's commands to be run.
$%
The target member name, when the target is an archive member. For example, if the target is `foo.a(bar.o)' then `$%' is `bar.o' and `$@' is `foo.a'. `$%' is empty when the target is not an archive member.
$<
The name of the first dependency. If the target got its commands from an implicit rule, this will be the first dependency added by the implicit rule.
$?
The names of all the dependencies that are newer than the target, with spaces between them. For dependencies which are archive members, only the member named is used.
$^
The names of all the dependencies, with spaces between them. For dependencies which are archive members, only the member named is used. A target has only one dependency on each other file it depends on, no matter how many times each file is listed as a dependency. So if you list a dependency more than once for a target, the value of $^ contains just one copy of the name.
$+
This is like `$^', but dependencies listed more than once are duplicated in the order they were listed in the makefile. This is primarily useful for use in linking commands where it is meaningful to repeat library file names in a particular order.
$*
The stem with which an implicit rule matches. If the target is `dir/a.foo.b' and the target pattern is `a.%.b' then the stem is `dir/foo'. The stem is useful for constructing names of related files. In a static pattern rule, the stem is part of the file name that matched the `%' in the target pattern. In an explicit rule, there is no stem; so `$*' cannot be determined in that way. Instead, if the target name ends with a recognized suffix, `$*' is set to the target name minus the suffix. For example, if the target name is `foo.c', then `$*' is set to `foo', since `.c' is a suffix. GNU make does this bizarre thing only for compatibility with other implementations of make. You should generally avoid using `$*' except in implicit rules or static pattern rules. If the target name in an explicit rule does not end with a recognized suffix, `$*' is set to the empty string for that rule.