Commit d14dc814 authored by Kirill Smelkov's avatar Kirill Smelkov

gitlab: watcher should take care of sidekiq killed by SIGTERM

The watcher should also watch for signals like SIGTERM killing sidekiq, which
are trapped by sidekiq, with sidekiq exiting successfully (with exit code 0).

To achieve this we rework our watcher-sigkill to be a generic watcher -
that can be given a set of restart exit codes including signal names and
monitors whether child process terminated with matching for restart exit
code.

Example usage:

	watcher 0,SIGKILL prog ...

Based on patch by @iv.
Discussion: https://lab.nexedi.com/lab.nexedi.com/lab.nexedi.com/issues/25#note_22085
parent e7e37398
...@@ -632,11 +632,11 @@ log = ${sidekiq-dir:log} ...@@ -632,11 +632,11 @@ log = ${sidekiq-dir:log}
recipe = slapos.cookbook:wrapper recipe = slapos.cookbook:wrapper
wrapper-path = ${directory:service}/sidekiq wrapper-path = ${directory:service}/sidekiq
command-line = command-line =
# NOTE Sidekiq memory killer just makes sidekiq processes to be SIGKILL # NOTE Sidekiq memory killer makes sidekiq processes to exit, or if exit request
# terminated and relies on managing service to restart it. In slapos we don't # not handled in time, to be SIGKILL terminated, and relies on managing service
# have mechanism to set autorestart=true, nor bang/watchdog currently work with # to restart it. In slapos we don't have mechanism to set autorestart=true, nor
# slapproxy, so we do the monitoring ourselves. # bang/watchdog currently work with slapproxy, so we do the monitoring ourselves.
{{ watcher_sigkill }} {{ watcher }} 0,SIGKILL
${gitlab-sidekiq:wrapper-path} ${gitlab-sidekiq:wrapper-path}
# XXX -q runner ? (present in gitlab-ce/Procfile but not in omnibus) # XXX -q runner ? (present in gitlab-ce/Procfile but not in omnibus)
......
...@@ -55,7 +55,7 @@ context = ...@@ -55,7 +55,7 @@ context =
raw redis_binprefix ${redis28:location}/bin raw redis_binprefix ${redis28:location}/bin
raw ruby_location ${bundler-4gitlab:ruby-location} raw ruby_location ${bundler-4gitlab:ruby-location}
raw tar_location ${tar:location} raw tar_location ${tar:location}
raw watcher_sigkill ${watcher-sigkill:rendered} raw watcher ${watcher:rendered}
raw xnice_repository_location ${xnice-repository:location} raw xnice_repository_location ${xnice-repository:location}
# config files # config files
......
...@@ -53,7 +53,7 @@ parts = ...@@ -53,7 +53,7 @@ parts =
bash bash
curl curl
watcher-sigkill watcher
gitlab-export gitlab-export
gzip gzip
dcron-output dcron-output
...@@ -256,7 +256,7 @@ eggs = ...@@ -256,7 +256,7 @@ eggs =
recipe = slapos.recipe.template recipe = slapos.recipe.template
url = ${:_profile_base_location_}/instance.cfg.in url = ${:_profile_base_location_}/instance.cfg.in
output = ${buildout:directory}/instance.cfg output = ${buildout:directory}/instance.cfg
md5sum = b99a99b161c0b292845002fc3fee50cd md5sum = 2329ddc4934e900785aa669adc214c23
# macro: download a shell script and put it rendered into <software>/bin/ # macro: download a shell script and put it rendered into <software>/bin/
[binsh] [binsh]
...@@ -267,9 +267,9 @@ mode = 0755 ...@@ -267,9 +267,9 @@ mode = 0755
context = context =
section bash bash section bash bash
[watcher-sigkill] [watcher]
<= binsh <= binsh
md5sum = 2986dcb006dc9e8508ff81f646656131 md5sum = 90690e1351637f20ff2df57a6c3e85b4
[gitlab-export] [gitlab-export]
<= binsh <= binsh
...@@ -319,7 +319,7 @@ md5sum = 176939a6428a7aca4767a36421b0af2b ...@@ -319,7 +319,7 @@ md5sum = 176939a6428a7aca4767a36421b0af2b
[instance-gitlab.cfg.in] [instance-gitlab.cfg.in]
<= download-file <= download-file
md5sum = 89914e4a225f6cdebfa196d46359f6f2 md5sum = b05fad928ffbb689b4415837525c62d1
[instance-gitlab-export.cfg.in] [instance-gitlab-export.cfg.in]
<= download-file <= download-file
......
#!{{ bash.location }}/bin/bash #!{{ bash.location }}/bin/bash
# run program under SIGKILL watchdog # run program under watchdog
# watcher-sigkill <prog> [<progargs> ...] # watcher <restart-codes> <prog> [<progargs> ...]
# #
# if the program terminates with SIGKILL - it is restarted after grace period. # <restart-codes> = code1,code2,...
#
# if the program terminates with status in <restart-codes> - it is restarted after grace period.
# if the program terminates otherwise - whole process terminates. # if the program terminates otherwise - whole process terminates.
#
# code can be numeric or symbolic - refering to a signal name. example:
#
# watcher 0,SIGKILL <prog> ...
if [ "$#" -lt 1 ]; then die() {
echo "Usage: watcher-sigkill <prog> [<progargs> ...]" 1>&2 echo "$@" 1>&2
exit 1 exit 1
}
if [ "$#" -lt 2 ]; then
die "Usage: watcher <restart-codes> <prog> [<progargs> ...]"
fi fi
restart_codes="$1"; shift
prog="$@" prog="$@"
# signumber <signame> -> #sig
signumber() {
signame=$1
# "11) SIGSEGV "
sigentry=`kill -l |grep -o "[0-9]\+) $signame\(\s\|$\)"` ||
die "E: $signame is not a signal"
echo "$sigentry" | grep -o "[0-9]\+"
}
# restart codes as set
declare -A restarts
for code in `echo "$restart_codes" |sed 's/,/ /g'`; do
case $code in
*[!0-9]*)
# non-number - treat it as signal name
signo=`signumber $code` || exit 1
code=$((128 + $signo)) # exit code of process terminated by signal #signo
;;
*)
# already number
;;
esac
restarts[$code]=y
done
progpid="" progpid=""
killexit="137" # = 128 + 9 (exit code of process terminated by SIGKILL)
# make sure to terminate children, when we exit. # make sure to terminate children, when we exit.
# needed for e.g. when `slapos node stop ...` kills us. # needed for e.g. when `slapos node stop ...` kills us.
...@@ -32,8 +68,8 @@ while true; do ...@@ -32,8 +68,8 @@ while true; do
status=$? status=$?
echo "-> $status" echo "-> $status"
# if program terminated not by SIGKILL - exit # if program terminated not with expected status - exit
if [ "$status" != "$killexit" ] ; then if [ "${restarts[$status]}" != y ] ; then
echo "exit $status" echo "exit $status"
exit "$status" exit "$status"
fi fi
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment