Checkpoint: Monitoring HA Failover – WIP

This is an attempt to try and find a good way of monitoring and logging what is going on in the HA module. It’s a work-in-progress, please feel free to contribute.

Smartcenter

The first script and alert below uses a custom alert for a trigger and writes to a log file in the /var/tmp/clusterxl_alert directory on the smartcenter. Using the cron job, a daily email can be sent with the day’s alerts summary. This was posted to CPUG by yheffen – https://www.cpug.org/forums/clustering-security-gateway-ha-clusterxl/9992-ha-failover-log-files.html. Originally written using the korn shell,  it works equally well in bash.

#!/bin/bash

DIR="/var/tmp/clusterxl_alert"
DAILY_LOG="$DIR/alert_daily.log"
LOG="$DIR/alert.log"

mklog () {
        if [ ! -f "$1" ]; then
                touch "$1"
                chmod 644 "$1"
        fi
}

mklog "$LOG"

while read ALERT; do
        echo "$ALERT" >> "$DAILY_LOG"
        echo "$ALERT" >> "$LOG"
done

The path to the script is one of the “UserDefined scripts” defined in the “Policy> Global Properties> Log and Alert> Alert Commands” window. Then in the cluster object’s properties in the “ClusterXL” window, specify this User Defined Alert down in the “Tracking” section.

Cron job code:

0 5 * * * [ -f /var/tmp/clusterxl_alert/alert_daily.log ] && mailx -s "ClusterXL Alerts" me@example.com < /var/tmp/clusterxl_alert/alert_daily.log && rm /var/tmp/clusterxl_alert/alert_daily.log

Security Gateway

This next script, which is very quick and dirty, monitors the interfaces using the “cpaprobstat -a if”. It polls every 2 seconds and writes the result to a file (ha_poll.txt) and compares the result against a reference file (ha_ref.txt) which is created when the script is run initially. If a difference is found, it is logged to the ha_alert.log file. There are better ways to do this but as I said, it’s quick and dirty 🙂

#!/bin/bash

# variables
DIR="/var/tmp"
REFERENCE="$DIR/ha_ref.txt"
POLLED="$DIR/ha_polled.txt"
LOG="$DIR/ha_alert.log"

# functions

mkref () {
	echo `cphaprob -a if` > $REFERENCE
}

mkpoll () {
	echo `cphaprob -a if` > $POLLED
}

# main process

# make reference file
mkref

echo "Entering polling loop, use ctrl-c or"
echo "\"kill \$(pgrep ${0##*/})\" from a different terminal to exit"
echo
# Poll every 2 seconds and compare until ctrl-c. 
# If status changes log and then make new reference data
while true; do
	mkpoll
	DIFF=$(diff $REFERENCE $POLLED)
	if [ "$DIFF" != "" ]; then
		echo "Change logged to $LOG"
		echo "" >> $LOG
		echo $DIFF >> $LOG
		mkref
		sleep 2
	fi
done

Running this as admin in expert mode with an ampersand keeps the process running in the background even if the terminal is disconnected:

[expert@gw]# ./ha_monitor.sh &

One issue here is that if an interface is down, “cphaprob -a if” shows the number of seconds it has been down for:

[Expert@gw]# cphaprob -a if

Required interfaces: 4
Required secured interfaces: 2

eth0 UP sync(secured), multicast
eth1 Inbound: DOWN (4.7 secs)  Outbound: DOWN (5 secs) sync(secured), multicast
eth2 UP non sync(non secured), multicast
eth3 UP non sync(non secured), multicast

It will therefore see a discrepancy on every poll as the seconds number increases and will create a log entry every 2 seconds until the interface comes back up. Like I said, quick, dirty and a work-in-progress 🙂

 

EDIT:

New script now:

#!/bin/bash

# variables
HOSTNAME=`hostname`
DIR="/var/tmp"
LOG=$DIR"/"$HOSTNAME"_hamon.log"

# functions

mkref () {
	echo "Making new reference  .." >> $LOG
	REFERENCE="`cphaprob stat`" 
	echo "Done" >> $LOG
	echo "" >> $LOG
}

mkpoll () {
	POLLED="`cphaprob stat`"
}

getAndLogVals () {
	CPHAPROBSTAT=`cphaprob stat`
	CPHAPROBLIST=`cphaprob list | grep -v "Time since" | grep -v "Registration number" | grep -v "Timeout: none"`
	CPHAPROBAIF=`cphaprob -a if`
	echo "" >> $LOG
	echo "cphaprob stat:" >> $LOG
	echo "--------------" >> $LOG
	echo "$CPHAPROBSTAT" >> $LOG
	echo "" >> $LOG
	echo "cphaprob list:" >> $LOG
	echo "--------------" >> $LOG
	echo "$CPHAPROBLIST" >> $LOG
	echo "" >> $LOG
	echo "cphaprob -a if:" >> $LOG
	echo "---------------" >> $LOG
	echo "$CPHAPROBAIF" >> $LOG
	echo "" >> $LOG
}

# main []

if [ -f $LOG ]; then
    echo "Removing old log file .."
	`rm $LOG`
fi

echo "Starting logging at "`date` >> $LOG
echo "" >> $LOG

# Record original vals to the log 
getAndLogVals

# get reference vals
mkref

echo "Monitoring Failover status, use ctrl-c or \"kill \$(pgrep ${0##*/})\" from a different terminal to exit"

# Poll continuously and compare until ctrl-c. If status changes, log and get new reference data
while true; do
	mkpoll
	if [ "$POLLED" != "$REFERENCE" ]; then
		DIFF="$REFERENCE / $POLLED"
		echo "" >> $LOG
		echo "=============================================================================" >> $LOG
		echo "" >> $LOG
		echo `date` >> $LOG
		echo "" >> $LOG
		echo "HA Status Change detected, logged to $LOG"
		echo "$DIFF" >> $LOG
		echo "" >> $LOG
		getAndLogVals
		mkref
	fi
done

Linux: Broken sudoers file in Ubuntu

I’ve done this twice now, sometimes lessons need repeating. If you are going to edit /etc/sudoers in Ubuntu then set a root password or you risk locking yourself out.

If you edit the sudoers file and the syntax is incorrect then the system can no longer read the sudoers file. Now you can’t fix the file because

sudo vi /etc/sudoers

returns an error.

You need to reboot, holding shift before the grub menu, and choose recovery mode. Now go to the command line as root, mount the filesystem as read/write and give yourself permission to edit the sudoers file:

mount -n -o remount,rw /
chmod u+x /etc/sudoers

Now:

vi /etc/sudoers

and fix that mistake.

Really though .. if you had set a secure root password you could have avoided the pain with

su -

VMWare: Enable SSH Daemon on ESXi 3.5

Taken from the vm-help.com website:

ESXi 3.5 does ship with the ability to run SSH, but this is disabled by default (and is not supported). If you just need to access the console of ESXi, then you only need to perform steps 1 – 3.

1) At the console of the ESXi host, press ALT-F1 to access the console window.
2) Enter unsupported in the console and then press Enter. You will not see the text you type in.
3) If you typed in unsupported correctly, you will see the Tech Support Mode warning and a password prompt. Enter the password for the root login.
4) You should then see the prompt of ~ #. Edit the file inetd.conf (enter the command vi /etc/inetd.conf).
5) Find the line that begins with #ssh and remove the #. Then save the file. If you’re new to using vi, then move the cursor down to #ssh line and then press the Insert key. Move the cursor over one space and then hit backspace to delete the #. Then press ESC and type in :wq to save the file and exit vi. If you make a mistake, you can press the ESC key and then type it :q! to quit vi without saving the file.
6) Once you’ve closed the vi editor, run the command /sbin/services.sh restart to restart the management services. You’ll now be able to connect to the ESXi host with a SSH client.

Tip – with some applications like WinSCP, the default encryption cipher used is AES. If you change that to Blowfish you will likely see significantly faster transfers.

Update for ESXi 3.5 Update 2 – With Update 2 the service.sh command no longer restarts the inetd process which enables SSH access. You can either restart your host or run ps | grep inetd to determine the process ID for the inetd process. The output of the command will be something like 1299 1299 busybox      inetd, and the process ID is 1299. Then run kill -HUP <process_id> (kill -HUP 1299 in this example) and you’ll then be able to access the host via SSH.

You can also download an oem.tgz file which will enable SSH (and FTP). Copy the file to a datastore with the VI client and then to bootbank with the command cp /vmfs/volumes/<datastore>/oem.tgz /bootbank/oem.tgz and then reboot.

Exit mobile version
%%footer%%