Showing posts with label linux. Show all posts
Showing posts with label linux. Show all posts

2022-05-04

Java Memory Usage Optimization

So there is this not really well-known but existing memory usage optimization that changes how Glibc allocated thread-specific memory.

There is this guy who wrote the best roundup I found on the net so far: Major Bug in glibc is Killing Applications With a Memory Limit. I strongly suggest reading it.

For now, let me just quote the important part:

Long story short, this is due to a bug in malloc(). Well, it’s not a bug it’s a feature.

malloc() preallocates large chunks of memory, per thread. This is meant as a performance optimization, to reduce memory contention in highly threaded applications.

In 32 bits runtime, it can preallocate up to 2 * 64 MB * cores.

In 64 bits runtime, it can preallocate up to 8 * 64 MB * cores.

So the math is like: _NPROCESSORS_ONLN * $MALLOC_ARENA_MAX * Arena Size

Bonus content: As getconf _NPROCESSORS_ONLN returns the same as nproc output (well, almost, because nproc returns sysconf(_SC_NPROCESSORS_CONF)), if you are using a container engine like Kubernetes, this equation will use the node's core count, not the CPU shares allowed by cgroups to the pod.

Where do those numbers come from? check here: https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html

Arena Size is usually 64MB. Why is this a problem?

The first malloc in each thread triggers a 128MB mmap which typically is the initialization of thread-local storage.

-- https://bugs.openjdk.java.net/browse/JDK-8193521

For every thread created, a new arena is allocated. But even if you don't make any threads, the preallocation happens using the equation above. Huge memory waste.

If creating more arenas is denied, the thread instead writes to "main" arena or the native program heap, which is unbounded.

(Main arena can grow via brk()/sbrk())

So the most useful solution is to set the environment variable MALLOC_ARENA_MAX to a small value, like 4.

2016-08-02

C: Linux sysinfo loads interpretation

There's this handy syscall in Linux, the sysinfo (kernel source), which returns some metrics about the current process.

What immediately stands out, is the load averages array. They are long values, how to make them to be our beloved fractional numbers?

These three values are scaled up by 65536, or more precisely by SI_LOAD_SHIFT. So we just divide them:

#include <sys/sysinfo.h>

// ...

struct sysinfo memInfo;
sysinfo(&memInfo);
printf("sysinfo: load1 = %2.2f\n", (float)memInfo.loads[0] / (float)(1 << SI_LOAD_SHIFT) );

2016-08-01

C: How to Detect LD_PRELOAD

As I'm writing a lib to be preloaded under something heavy, I tried to detect if LD_PRELOAD is set or not. For fun.

The easiest way to check is...get the environment variable! It is so straightforward, that I feel the urge to write it down :-)

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
 char *env_val = getenv("LD_PRELOAD");
 if (env_val[0] != 0) // May be set but value is empty.
  printf("LD_PRELOAD active: \"%s\"\n", env_val);
 return 0;
}

Note, this only checks the existence of the variable, not that if it is actually loaded...

2016-07-27

C: Sorted Walk Through Directory Contents

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>

void walk_dir(const char *path)
{
 struct dirent **namelist;
 int i, n;

 n = scandir(path, &namelist, NULL/*filter*/, alphasort);
 if (n < 0)
 {
  perror("scandir");
 } else {
  for (i = 0; i < n; i++) {
   printf("Entry: %s\n", namelist[i]->d_name);
   free(namelist[i]);
  }
  free(namelist);
 }
}

int main(void)
{
 walk_dir("/tmp");
 return EXIT_SUCCESS;
}

C: Simple Walk Through Directory Contents

You can easily walk through the entries in a directory. Note, that it's unordered.

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>

void walk_dir(const char *path)
{
 struct dirent *entry;
 DIR *dir = opendir(path);

 while ((entry = readdir(dir)) != NULL) {
  printf("Entry: %s\n", entry->d_name);
 }
}

int main(void)
{
 walk_dir("/tmp");
 return EXIT_SUCCESS;
}

2016-05-09

c: How not to code #1

(gdb) bt
#0  0x00007f8ac115bebe in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007f8ac10f22be in _L_lock_9876 () from /lib64/libc.so.6
#2  0x00007f8ac10f05c1 in free () from /lib64/libc.so.6
#3  0x0000000000402649 in handle_sig (signo=<optimized out>, info=<optimized out>, context=<optimized out>) at lol.c:158
#4  <signal handler called>
#5  0x00007f8ac10edf03 in _int_malloc () from /lib64/libc.so.6
#6  0x00007f8ac10f06b7 in malloc () from /lib64/libc.so.6
#7  0x00000000004015ba in do_the_boogie (fd=3, gp=<optimized out>) at lol.c:715
#8  0x00000000004023cc in main (argc=5, argv=<optimized out>) at lol.c:810

LOL, it is deadlocked. The signal handler (frame #4) was invoked inside a malloc (frame #5), and the signal handler calls free in frame #3. Of course, the heap lock is held in frame #5 and both are in the same memory arena (see: break space), hence we are screwed.

2016-04-22

Note to self #3

Task:
  • Measure network throughput of <your favorite dipshit 3rd party app>
  • Report traffic over granularity periods
  • C99, _GNU_SOURCE
Actions taken:
  • Using libpcap in main thread to capture packets (also filtering, etc)
  • Piping info into another thread (simple pthread usage) that does the math
Problem encountered:
  1. When emitting all the info I read from the pipe to stdout, the average packet count is 2700..3000 packet/s
  2. When only the per GP statistics, the packet capture drops to 3-11 packet/s
Extensive head scratching intensifies.

[few days had passed]

Turned out, if I print all the info, of course it captures more tcp traffic, as I'm over an SSH connection! I'm generating that traffic!!!111one

/me idiot

Don't do Java, kids. It's bad for your brains.

2016-03-09

linux: fun: Detect debugger and interrupt conditionally

I wanted to place breakpoints to a binary written in C, so I don't have to type in all the breakpoints to gdb, and generally just to learn new tricks :-) Because it has no real life benefits...but it's fun!

For starters, I've found out that I can send a SIGTRAP multiple ways. The easier is to just raise it:
#include <signal.h>
raise(SIGTRAP); // or SIGINT if you also like to live dangerously
The funnier way is to emit it directly in asm:
__asm__("int $3");
// or
__asm__("int3");
To be honest, the latter is a software interrupt, not a signal, but has the same effect. The IA32 Book ("Intel® 64 and IA-32 Architectures Software Developer’s Manual, Chapter 6.4.4") says:
The INT 3 instruction explicitly calls the breakpoint exception (#BP) handler.
So I'm gonna stay with "does the same", as it is a deliberately tricky one.

But if you're not running through a debugger, it will leak to the kernel and that eventually stops your program (like, saying "Trace/BPT trap". BPT --> breakpoint, see?). We need to emit this signal conditionally, which leads us to the next problem: at any given time, are we debugged/traced or not?

Time to look up some anti-debug techniques.

The first idea that got me is to simply try to have the program to debug itself [terms and conditions apply], with PTRACE_TRACEME. The trick is that this call will fail if we are already debugged/traced. Then we can just flip a flag and have our signals protected with a condition.
#include <stdio.h>
#include <stdlib.h>
#include <signal.h> // raise, SIG*
#include <errno.h> // errno
#include <string.h> // strerror

#include <sys/ptrace.h> // ptrace
#include <sys/types.h>

int has_dbg()
{
    long rc = ptrace(PTRACE_TRACEME, 0, 0, 0);
    if (rc < 0) {
        printf("traceme resulted in %ld, errno: %d, %s\n", rc, errno, strerror(errno));
        return 1;
    } else {
        return 0;
    }
}
Oopsie, there is a problem: you can't release TRACEME from the tracee, and while TRACEME is active, you cannot attach from the outside. While this anti debug technique is awesome for blocking out debuggers, we don't want this side effect. We have to release it, but we can't do it from here. So we have to do it somewhere else :-)

I've poked around with fork-and-traceme, but it's cumbersome. But I found an elegant solution on stackoverflow (where else): Just check /proc/self/status if TracerPid is zero or not. Genius.

Caveats: ps also reads /proc/* for status info so it must be pretty standard.

So our check looks like the following:
#include <stdio.h>
#include <stdlib.h> //atoi
#include <signal.h> // raise, SIG*
#include <string.h> // strstr, strerror
#include <fcntl.h> // open
#include <unistd.h> // read, close

int has_dbg()
{
    char buf[2048], *tracer_pid;
    int debugger_present = 0; // We say "no" if not found or error, as we are cowards.
    static const char TracerPid[] = "TracerPid:";
    ssize_t num_read;

    int status_fd = open("/proc/self/status", O_RDONLY);

    if (status_fd == -1)
        return 0;

    num_read = read(status_fd, buf, sizeof(buf));
    close(status_fd);

    if (num_read > 0)
    {
        buf[num_read] = 0;
        tracer_pid = strstr(buf, TracerPid); // Look for "TracerPid"
        if (tracer_pid)
            debugger_present = !!atoi(tracer_pid + sizeof(TracerPid) - 1); // parse an int, and bool-ify it.
    }

    return debugger_present;
}

int main(int argc, char **argv)
{
    if(has_dbg()) {
        printf("can has\n");
    } else {
        printf("no no\n");
    }

    return 0;
}
And the best part is, we can call this function at any time we want.

Now, we can do things like:
#define HALP do{if(has_dbg()){__asm__("int $3");}}while(0)

// ...

printf("foo\n");
if (3.14 >= 3.14)
{
    printf("bar\n");
    HALP;
    printf("baz\n");
}
printf("quux\n");
Isn't it fancy?

Note:

If you love to cut on characters you have to type, I suggest the raise version in the long run:
  • raise variant uses 20 + n*14 chars. (headers, you know)
  • asm variant uses n*17 chars.
Okay, just kidding.

2016-02-19

bash: homemade timeout replacement

So I was young and reckless, and didn't know there is a command called timeout in coreutils. This is how I managed to do it:
#!/bin/sh
{
    ./time_consuming_binary -a param -a notherparam --pleaseblock
} &
CHILDPID=$!
# Kill it after 30 sec
sleep 30
kill -9 $CHILDPID 2>&1 /dev/null
How to check if it had to be killed or not? Measure the wall time of the execution :) I've used it in a Nagios service check, the thing it watched either returned under 5 sec or blocked indefinitely (thanks, NFS), hence the 30 secs.

2015-05-12

Shell: Get threaddump directly from the java process

Inspecting the Java source, I found a pretty easy way to skip java in the process of extracting info from another java process :-)

PID=`pgrep java`
SCKT=/tmp/.java_pid$PID
SGNL=/tmp/.attach_pid$PID
CMD='1\0threaddump\0\0\0\0'

if [ ! -r $SCKT ]; then
 touch $SGNL || exit 2
 kill -s SIGQUIT $PID
 sleep 5
 rm $SGNL
 if [ ! -r $SCKT ]; then
  echo Cannot read $SCKT ... either you are not the correct user for this, or the java process does not 'see' our attach request.
  exit 1
 fi
 echo Done
fi

echo -ne "$CMD" | nc -U "$SCKT"

Possible options and variations I know about:
  • 1\0threaddump\0-l\0\0\0 small L for the jstack -L option
  • 1\0inspectheap\0\0\0\0
  • 1\0inspectheap\0-live\0\0\0
For others see the attachListener.cpp (JDK7, JDK8)

    2015-02-13

    Java To Trust GoDaddy Class2 G2 Certs

    Sadly, GoDaddy's "Class 2" and "Class 2 - G2" root cacerts are not included in the Java7 packages, so they are not trusted by default.

    So we'll have to add the CA certificates to the Java TrustStore.

    I use Apache Tomcat, so I'll need another step: configuring where to look for the truststore file, which I'll add to the CATALINA_OPTS, but you should use setenv.sh or whatever your deployment process forces you to. For some reason I haven't got the urge to look up, Tomcat does NOT pick up the truststore by default. Maybe it's just some PEBKAC in the setup somewhere else.

    Align the cacerts file location to your needs.

    -Djavax.net.ssl.trustStore=/opt/apps/jira/jdk/jre/lib/security/cacerts -Djavax.net.ssl.trustStorePassword=changeit

    Firstly, fetch the certificates we want to trust:
    1. https://certs.godaddy.com/repository/gd_bundle.crt
    2. https://certs.godaddy.com/repository/gdroot-g2.crt
    3. https://certs.godaddy.com/repository/gd-class2-root.crt
    4. https://certs.godaddy.com/repository/gdig2.crt

    Sometimes they interleave (eg. gdig2 is an intermediate cert, connecting Class2 and G2), if it throws an error about already being imported, ignore the error.

    Secondly, import the certs to trust them (do it on all four):
    keytool -import -trustcacerts -alias gd_bundle -file gd_bundle.crt -keystore /opt/apps/jira/jdk/jre/lib/security/cacerts
    Here, I use the same alias as the filename. Again, align the cacerts's file location to your needs.
    The keytool binary will ask for the password, which is changeit by default. If you change it, remember to change the tomcat parameter trustStorePassword also.

    Lastly, restart Tomcat.

    2014-06-17

    kmem russian roulette

    sudo dd if=/dev/urandom of=/dev/kmem bs=1 count=1 seek=$RANDOM

    I already forgot where and when I read about this first... The theory is that every weekend you shoot this on one of your servers, that should eventually fail. Now, if your HA/failover solution does not compensate inside your SLA, you can fix it outside business time.

    About this game on bashorg :-)

    2014-05-28

    gcc does not link unused lib it used to before

    How to force gcc to link an otherwise unused library to your executable:

    Add -Wl,--no-as-needed before your lib, and -Wl,--as-needed after.

    Example: LDFLAGS=-L/opt/informix/lib/esql/ -lm -Wl,--no-as-needed -lifgls -Wl,--as-needed -liodbc
    This example will force linking libifgls.so but libiodbc.so will be left out, if not used.

    2014-03-11

    C: Get uid by loginname

    #include <sys/types.h>
    #include <pwd.h>
    uid_t getuid_by_loginname(const char *name)
    {
     struct passwd *pwd;
     if(name) {
      pwd = getpwnam(name); /* don't free, see getpwnam() for details */
      if(pwd)
       return pwd->pw_uid;
     }
     
     return (uid_t)-1;
    }
    

    2013-12-18

    Sending emails to addresses and subjects in a CSV with a fixed message in a textfile

    The following script was generated to notice approvers that an audit initiated the deletion of some users they did not approved (again). The Subject contains the usernames, and the message is a template with some PC blahblah.

    #!/bin/bash
    
    if [ $# -lt 2 ]; then
     echo "No arguments passed! I need a 'mailaddr;subject' .csv and a txt containing the fixed message."
     echo "csvmailer.sh addresses.csv message.txt"
     exit 1
    fi
    
    while read p; do
     IFS=';'
     TOKENS=($p) 
     EMAIL=${TOKENS[0]}
     SUBJ=${TOKENS[1]}
     IFS=' '
     mail -s "$SUBJ" "$EMAIL" < $2
    done < $1
    

    The CSV is like "jane.doe@company.com;This is a message in the subject about deleting the account of 'johndoe@company.com'\n". Do NOT put a semicolon into the subject column :-D

    IFS sets the Internal Field Separator to semicolon (';'), instead of the default whitespace. This way the array generator/tokenizer is dead simple in the next line.

    $1 is the CSV file, read into $p line-by-line.

    $2 is the fixed message piped into mail.

    2013-12-15

    Cisco VPN vs. Linux, KDE

    So you have a Cisco proprietary PCF file, and you want to use it in your KDE, but the Network Manager's VPN Import says it cannot do it?
    Here is what to do:
    # apt-get install network-manager-vpnc
    (This should also install vpnc if you haven't had it already.) Now, try importing the PCF file again.

    Many thanks to: http://thomas.zumbrunn.name/blog/2013/04/04/479/

    2013-11-18

    Get per process IO activity

    So you want to see a live average, how your processes spin your disks (in fact, any mounts, including NFS).

    Well, procfs exports an io counter, maintained by the read*/write* syscalls.

    You can find the source here.

    Logging IO activity of a process

    Okay, your 3rd party app sucks. It sucks big time. Generates heavy disk traffic at seemingly random times, and you just can't think of anything, anymore. The users are revolting, the website is lagging, your boss is raging.

    It is time to check what the hell the app is actually doing.

    The following script is using strace to catch all IO-related syscalls done by the given process, and dump them in a CSV manner. Later, you can aggregate by seconds, minutes, file systems, or subsystems (like Lucene, etc), create charts, graphs, and pivots.

    #!/bin/sh
    
    if [ "x$1" == "x-h" ]; then
     echo "Usage: ./iotrace.sh <pid>"
     exit 0
    fi
    
    if [ "$(id -u)" != "0" ]; then
       echo "This script must be run as root " 1>&2
       exit 1
    fi
    
    if [ $# -gt 0 ]; then
    PID=$1
    else
    echo "Hey, I need a parameter!"
    exit 1
    fi
    
    ps -eL | grep $PID | awk '{print"-p " $2}' | xargs strace -q -f -v -ttt -T -s 0 -e trace=open,close,read,write 2>&1 | awk -v pid=$PID '
    function output(a, f, r, t)
    {
     # a - action
     # f - file descriptor
     # r - result
     # t - time as unix epoch
     if (f in fd)
      file = fd[f];
     else
     {
      ("readlink /proc/" pid "/fd/" f) | getline file;
      fd[f] = file;
     }
     if (file !~ /^(socket|pipe|\/dev|\/proc)/ || r ~ /\d+/)
      print a, file, r, strftime("%Y-%m-%d %H:%M:%S"); #substr(t, 0, index(t, ".")-1));
    }
    
    BEGIN { OFS=";"; print "op;path;bytes;epoch";}
    {
     if($6 ~ /resumed>/)
     {
      if ($5 ~ /open/){fd[$(NF-1)] = pending[$2];}
      else if ($5 ~ /close/){match($4, /([0-9]+)/, a);delete fd[a[1]];}
      else if ($5 ~ /write/){match($4, /([0-9]+)/, a);output("write", pending[$2], $(NF-1), $3);}
      else if ($5 ~ /read/) {match($4, /([0-9]+)/, a);output("read", pending[$2], $(NF-1), $3);}
      
      delete pending[$2];
     }
     else if ($4 ~ /^open\(/)
     {
      match($4, /\"(.+)\"/, a);
      f = a[1];
      if ($(NF-1) == "<unfinished")
      {
       pending[$2] = f;
      } else {
       fd[$(NF-1)] = f;
      }
     }
     else if ($4 ~ /^close\(/)
     {
      match($4, /([0-9]+)/, a);
      f = a[1];
      if ($(NF-1) == "<unfinished")
      {
       pending[$2] = f;
      } else {
       delete fd[f];
      }
     }
     else if ($4 ~ /^write\(/)
     {
      match($4, /([0-9]+)/, a);
      f = a[1];
      if ($(NF-1) == "<unfinished")
      {
       pending[$2] = f;
      } else {
       output("write", f, $(NF-1), $3);
      }
     }
     else if ($4 ~ /^read\(/)
     {
      match($4, /([0-9]+)/, a);
      f = a[1];
      if ($(NF-1) == "<unfinished")
      {
       pending[$2] = f;
      } else {
       output("read", f, $(NF-1), $3);
      }
     }
    }'
    

    What it does?
    1. Takes your input of a process ID
    2. Reads all the child processes of this process
    3. Feeds these into xargs to make strace to attach to all of them, also make strace to only print the four syscalls we are interested in (open, close, read, write), these are used by normal java IO methods
    4. Make a dictionary of file descriptors and filenames, and pretty-print the filenames with the acutal number of processed bytes
    5. Also, take care of the interrupted syscall printouts.

    An average java webapp with Lucene can produce 3.5M rows in an hour. Note, that it cannot be opened in Excel ;-)

    2013-10-29

    MCC DAQ miniLAB 1008 (USB) under Linux

    Most of the MCC DAQ's USB and PCI devices have 3rd party Linux kernel drivers. For details, see: http://www.mccdaq.com/daq-software/Linux-Support.aspx
    1. Fetch all stuff from here: ftp://lx10.tx.ncsu.edu/pub/Linux/drivers/USB/
    2. Read the README_mcc_usb file CAREFULLY. You have two options:
      1. hiddev
        1. + Easy, works unprivileged.
        2. - May not be present in linux kernel v3.0+, and is not working in 2.4 at all.
        3. - SLOW: max 200Hz (which is not enough for a big setup)
        4. - No documentation and examples exists.
      2. libhid/libusb
        1. + Uses libusb, works under all linux kernels
        2. + No theoretical ceiling for max throughput.
        3. + libusb is portable through Linux, Windows, Mac, etc
        4. - Needs to be run as root, or set the executable to suid.
        5. - Need USB programming knowledge.
        6. - Compiling libhid manually.
    3. I assure you, option 2 (libhid+libusb) will be better.
    4. Step 0: Deploy dependencies
      1. sudo apt-get install libusb-1.0-0 libusb-1.0-0-dev
    5. Step 1: Deploy libhid
      1. Fetch libhid from http://libhid.alioth.debian.org/
      2. tar -xzf libhid*.tar.gz
      3. cd libhid*
      4. ./configure --prefix=/usr --sysconfdir=/etc --disable-werror
      5. make
      6. sudo make install
    6. Step 2: Deploy the MCC-specific libhid-based usb library
      1. wget ftp://lx10.tx.ncsu.edu/pub/Linux/drivers/USB/60-mcc.rules
      2. sudo cp 60-mcc.rules /dev/.udev/rules.d/
      3. sudo udevadm control --reload-rules
      4. tar xzf MCCLIBHID*.tgz
      5. cd libhid (this is another directory!)
      6. Edit the Makefile
        1. Add " -lm" to the end of the line (after "-o $@") in the libmcchid.so target.
        2. Replace all occurence of "/usr/local/" to "/usr/" (there will be 33)
      7. make
      8. sudo make install
      9. run your device's test as root, like "sudo ./test-minilab1008"
        1. Good: "miniLAB 1008 Device is found!" and a menu-driven test interface
        2. Bad: 
          1. "hid_force_open failed" - you're not root
          2. Any other error message: well, hack :-)

    2013-10-13

    BCM43xx vs. (k)ubuntu 10.04+

    1. sudo bash
    2. apt-get purge bcmwl-kernel-source
    3. apt-get purge bcm-kernel-source
    4. apt-get install linux-firmware-nonfree
    5. reboot
    6. Make sure to turn ON the Wi-Fi button/latch/whatever
    7. Enjoy