对于非daemon 的启动程序,可以用进程管理工具,之前一直使用的是 daemontool 的 supervise。例如目前用于 twemproxy,memcached,
ttserver,ktserver,redis 等进程。有个不好的地方是没有相关的日志,另外对于单机多实例(比如多个 memcached 的启动) 需要逐个启动,在自
动化安装配置的时候,不算很科学。同样是基于 python 的 supervisord,对进程的管理相对比较科学。好早之前就了解过,一直没有空研究。今天
得闲,试一试先。
1. 首先是安装,由于是 python 东东,我用 easy_install 来安装,当然也可以通过源码。yum 安装 setuptools 后。直接输入命令:
easy_install supervisor
echo_supervisord_conf > /etc/supervisord.conf
mkdir /etc/supervisord.conf.d
对于遇到的错误:
# easy_install supervisor
/usr/bin/easy_install: line 3: __requires__: command not found
/usr/bin/easy_install: line 4: import: command not found
/usr/bin/easy_install: line 5: from: command not found
修改 /usr/bin/easy_insall 文件 第一行, 就一个 !,没有指定 python 路径。改成 !/usr/bin/python 后重试,继续
2. 接下来就是使用,它有两个命令需要我们记住,一个就是启动进程的命令 supervisord,另外一个是维护进程的命令 supervisorctl。修改一下
默认的配置文件,我这里分别用原来的 supervise 的程序,对一个 memcached 和 redis 做管理,这是 /etc/supervisord.conf 文件:
[unix_http_server]
file=/tmp/supervisor.sock ; (the path to the socket file)
[supervisord]
logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10 ; (num of main logfile rotation backups;default 10)
loglevel=info ; (log level;default info; others: debug,warn,trace)
pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false ; (start in foreground if true;default false)
minfds=1024 ; (min. avail startup file descriptors;default 1024)
minprocs=200 ; (min. avail process descriptors;default 200)
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket
[program:memcached]
command=/home/bin/memcached/run
user=nobody ; setuid to this UNIX account to run the program
[program:redis]
command=/home/bin/redis/run
[group:memcached]
programs=memcached
两个run 的内容就是启动服务的命令,其中 memcached 改成以 nobody 的用户执行。(这个避免了通过启动程序来su 切换用户的tty require 问题)
启动 supervisord :
# supervisord -c /etc/supervisord.conf
pstree 查看进程的关系:
# pstree init-+-_plutorun-+-_plutoload | `-_plutorun---pluto-+-_pluto_adns |-sshd---sshd---sshd---bash---su---bash---pstree |-supervisord-+-memcached---5*[{memcached}] | `-redis-server---2*[{redis-server}] |-syslogd `-udevd
我们看到 memcached 和 redis 父进程都是 supervisord 。
一般情况 redis 和 memcached 如果挂了, supervisord 肯定可以起来,这里不担心这个问题,如果 supervisord 进程挂了,memcached
和 redis 会怎么办? 我手动 kill 掉 supervisord 进程。再用 pstree 看:
init-+-_plutorun-+-_plutoload | `-_plutorun---pluto-+-_pluto_adns |-sshd---sshd---sshd---bash---su---bash---pstree |-memcached---5*[{memcached}] |-redis-server---2*[{redis-server}] |-syslogd `-udevd
看来这些进程还在,父进程已经是 init 。如果 supervisord 进程又手动起来了,接下来会怎么样呢 ? 先看看当前进程的 pid 号:
# ps -ef |grep -E 'redis|memcached' root 22256 1 0 16:35 ? 00:00:00 /home/redis/bin/redis-server /home/redis/redis.conf nobody 22257 1 0 16:35 ? 00:00:00 /home/memcached/bin/memcached
再次启动:
# supervisord -c /etc/supervisord.conf
观察发现这两个程序的 pid 没变:
# ps -ef |grep -E 'redis|memcached' root 22256 1 0 16:35 ? 00:00:00 /home/redis/bin/redis-server /home/redis/redis.conf nobody 22257 1 0 16:35 ? 00:00:00 /home/memcached/bin/memcached
再次用 pstree 看,supervisord 是独立的。
init-+-_plutorun-+-_plutoload | `-_plutorun---pluto-+-_pluto_adns |-sshd---sshd---sshd---bash---su---bash---pstree |-supervisord |-memcached---5*[{memcached}] |-redis-server---2*[{redis-server}] |-syslogd `-udevd
看一下相关的log,提示启动冲突,因为这两个进程已经不受其管制。部分如下:
[object Object]
接下来我们继续探究 supervisorctl 这个东东。有 console 和 web 两种方式。至于 web,为了安全起见,认为暂时无多用处。
看看交互命令先:
# supervisorctl -i
memcached RUNNING pid 25830, uptime 0:00:09
redis FATAL Exited too quickly (process log may have details)
supervisor> help
default commands (type help <topic>):
=====================================
add clear fg open quit remove restart start stop update
avail exit maintail pid reload reread shutdown status tail version
supervisor>
刚连进去就可以看到状态,各命令用途:
add 添加服务到进程中
avail 当前用到的(包括启动失败的)
clear 清除日志,指定文件,clear all 清除所有日志文件
exit 不多说
fg 让在后台跑
maintail tail 方式查看主日志文件,这里是 /tmp/supervisord.log
open 新打开一个,如 open unix:///tmp/supervisor.sock,例如 ssh 跳转,但quit 是直接退出。
pid 当前supervisord 的进程号
quit 不多说
reload 重新启动 supervisord 进程
remove 移除一个服务
reread 重读配置文件
restart 重启一个服务/组,或者是 all, restart memcached
shutdown 关闭supervisord 进程
start 启动一个服务/组,all
stats 当前所有组的状态
stop 停止一个服务/组,all
tail 观察日志用
update 更新配置,有点动态加载配置文件的意思,比如新增某个服务
version 版本号
不一个一个贴出来,举一单例说明:
supervisor> avail memcached in use auto 999:999 redis in use auto 999:999 supervisor>
3. 小结:
1. supervisord 部分的参数调整,目前没正式投入使用,眼过一遍,没问题。
2. 有提供相应的 api,可以给开通提供监控接口。
#!/bin/bash
#
# supervisord This scripts turns supervisord on
#
# Author: Mike McGrath <mmcgrath@redhat.com> (based off yumupdatesd)
# Jason Koppe <jkoppe@indeed.com> adjusted to read sysconfig,
# use supervisord tools to start/stop, conditionally wait
# for child processes to shutdown, and startup later
# Mikhail Mingalev <mingalevme@gmail.com> Merged
# redhat-init-jkoppe and redhat-sysconfig-jkoppe, and
# made the script "simple customizable".
#
# chkconfig: 345 83 04
#
# description: supervisor is a process control utility. It has a web based
# xmlrpc interface as well as a few other nifty features.
# Script was originally written by Jason Koppe <jkoppe@indeed.com>.
#
# source function library
. /etc/rc.d/init.d/functions
set -a
PREFIX=/usr
SUPERVISORD=$PREFIX/bin/supervisord
SUPERVISORCTL=$PREFIX/bin/supervisorctl
PIDFILE=/var/run/supervisord.pid
LOCKFILE=/var/lock/subsys/supervisord
OPTIONS="-c /etc/supervisord.conf"
# unset this variable if you don't care to wait for child processes to shutdown before removing the $LOCKFILE-lock
WAIT_FOR_SUBPROCESSES=yes
# remove this if you manage number of open files in some other fashion
ulimit -n 96000
RETVAL=0
running_pid()
{
# Check if a given process pid's cmdline matches a given name
pid=$1
name=$2
[ -z "$pid" ] && return 1
[ ! -d /proc/$pid ] && return 1
(cat /proc/$pid/cmdline | tr "\000" "\n"|grep -q $name) || return 1
return 0
}
running()
{
# Check if the process is running looking at /proc
# (works for all users)
# No pidfile, probably no daemon present
[ ! -f "$PIDFILE" ] && return 1
# Obtain the pid and check it against the binary name
pid=`cat $PIDFILE`
running_pid $pid $SUPERVISORD || return 1
return 0
}
start() {
echo "Starting supervisord: "
if [ -e $PIDFILE ]; then
echo "ALREADY STARTED"
return 1
fi
# start supervisord with options from sysconfig (stuff like -c)
$SUPERVISORD $OPTIONS
# show initial startup status
$SUPERVISORCTL $OPTIONS status
# only create the subsyslock if we created the PIDFILE
[ -e $PIDFILE ] && touch $LOCKFILE
}
stop() {
echo -n "Stopping supervisord: "
$SUPERVISORCTL $OPTIONS shutdown
if [ -n "$WAIT_FOR_SUBPROCESSES" ]; then
echo "Waiting roughly 60 seconds for $PIDFILE to be removed after child processes exit"
for sleep in 2 2 2 2 4 4 4 4 8 8 8 8 last; do
if [ ! -e $PIDFILE ] ; then
echo "Supervisord exited as expected in under $total_sleep seconds"
break
else
if [[ $sleep -eq "last" ]] ; then
echo "Supervisord still working on shutting down. We've waited roughly 60 seconds, we'll let it do its thing from here"
return 1
else
sleep $sleep
total_sleep=$(( $total_sleep + $sleep ))
fi
fi
done
fi
# always remove the subsys. We might have waited a while, but just remove it at this point.
rm -f $LOCKFILE
}
restart() {
stop
start
}
case "$1" in
start)
start
RETVAL=$?
;;
stop)
stop
RETVAL=$?
;;
restart|force-reload)
restart
RETVAL=$?
;;
reload)
$SUPERVISORCTL $OPTIONS reload
RETVAL=$?
;;
condrestart)
[ -f $LOCKFILE ] && restart
RETVAL=$?
;;
status)
$SUPERVISORCTL status
if running ; then
RETVAL=0
else
RETVAL=1
fi
;;
*)
echo $"Usage: $0 {start|stop|status|restart|reload|force-reload|condrestart}"
exit 1
esac
exit $RETVAL
4. 错误:
Error: .ini file does not include supervisord section
解决方法: 重新整个配置文件
supervisor> availmemcached in use auto 999:999redis in use auto 999:999supervisor>
没有评论:
发表评论