AIX / Linux agent

OS agent is a solution for those of you who wish to get additional metrics that can be obtained only from the Operating System level.
CPU
OS agent CPU
CPU Queue
OS agent CPU Queue
Memory
OS agent Memory
LAN
OS agent LAN
SAN
OS agent SAN
SAN IOPS
OS agent SAN IOPS
SAN Latency
OS agent SAN response time

OS agent metrics and features

  • OS CPU utilization of user/sys/IO wait/idle in %
  • CPU queue: load average, blocked processes / raw / direct IO
  • Memory utilization of used/FS cache/free memory in MB
  • Paging rate in MB/sec
  • Paging space utilization in %
  • SAN (FC & vSCSI) throughput per adapter
    • data in MB/sec
    • IO/sec
    • response time (latency)
    • error
  • LAN (ethernet) throughput per adapter
    • data in MB/sec
    • packet count
    • error
  • Total IO throughput (Linux)
    • IOPS
    • Data in MB/sec
    • response time (latency)
  • Filesystem capacity utilization
  • AIX SEA (Shared Ethernet Adapter) throughput per adapter in MB/sec (IBM Power only)
  • SAN multipath monitoring
  • JOB TOP, CPU and Memory tracking of running processes visually over time

Operating systems

  • AIX 5.1+
  • Linux on Power
  • Linux x86

Implementation

it is implemented as a simple client/server application.
There is XorMon NG daemon listening on the host where XorMon NG server is running on port 8162.
Each LPAR has a simple Perl-based agent installed. This agent is started every minute from the crontab and saves memory and paging statistics into a temporary file.
The agent contacts the server every 15-25 minutes and sends all locally stored data for that period.

Agent prerequisites

  • Perl interpreter. All Unix/Linux systems contain Perl in basic installation.
  • It may run under any user account, it does not need any special privileges in the OS.
  • Opened TCP communication between each LPAR and XorMon NG server on port 8162.
  • Connections are initiated from the monitored AIX / Linux only.

Usage

perl lpar2rrd-agent.pl [-s ] [-d] [-c] [-n  ] [-b  ] [-i  ] <XorMon NG server hostname/IP>[:<PORT>]

 -d  forces sending out data immediately to check communication channel (DEBUG purposes)
 -c  agent collects & sends only internal HMC data
 -n  agent sends only NMON data from NMON directory <NMON_DIR>
 -b  path to Hitachi HvmSh API
 -i  IP address of HVM (Hitachi Virtualization Manager)
 -t  <max send time in seconds>
 -s  <step in seconds>, do not set < 60, do not forget to update crontab line accordingly e.g. -s 300 means in crontab */5 for minutes
 -m  using sudo for multipath (only root can run it): sudo multipath -l", put this into sudoers: lpar2rrd  ALL = (root) NOPASSWD: /usr/sbin/multipath -ll

 options -c and -n are mutual exclusive
 options -b and -i are both required for Hitachi agent
 no option - agent collects & sends standard OS agent data
Crontab entry for scheduling, use non-admin account preferably
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <XorMon NG server hostname/IP> > /var/tmp/lpar2rrd-agent.out 2>&1
The agent collects data and sends them every 5 - 20 minutes to the XorMon NG server
If you use other than standard XorMon NG port, then add it after SERVER, separated by the ':' delimiter
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <XorMon NG server hostname/IP>:<PORT> > /var/tmp/lpar2rrd-agent.out 2>&1
If you want to send data to more XorMon NG server instances (number is not restricted)
* * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <XorMon NG server 1 hostname/IP> <XorMon NG server 2 hostname/IP> > /var/tmp/lpar2rrd-agent.out 2>&1

Enhanced setting

  • The default behaviour is such that the agent tries to send data to the XorMon NG server at random 5 - 20 mins intervals
    you can specify max time limit for sending data, minimum is 5 minutes
    * * * * * /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -t <max send time in seconds> <XorMon NG server hostname/IP>
    
  • How to avoid SAN checks via fcstat (those may cause some problems, it should not happen in v4.50+ though)
    * * * * * FCSTAT=/bin/true /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <XorMon NG server hostname/IP> > /var/tmp/lpar2rrd-agent.out 2>&1
    
  • By default, only interfaces that have an IP address assigned are reported; this be skipped by using an env variable and selection is done based on XorMon NG_LAN_INT env var, it allows regex only for Linux, be careful here to do not stack in 1 graph interfaces from different virtualization level what might lead to creasing of presented traffic by counting some traffic more times
    * * * * * XorMon NG_LAN_INT="eth.*0$,bond.*,rhevm,9.*" /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl <XorMon NG server hostname/IP>  > /var/tmp/lpar2rrd-agent.out 2>&1
    

Debug

  • option -d forces sending out data immediately to check communication channel
      /usr/bin/perl /opt/lpar2rrd-agent/lpar2rrd-agent.pl -d <XorMon NG server hostname/IP>
    
  • error log: /var/tmp/lpar2rrd-agent-*.err
  • output log, last run: /var/tmp/lpar2rrd-agent-*.out
  • collected data waiting for sending: /var/tmp/lpar2rrd-agent--.txt